Blocking poor quality content areas with robots.txt
-
I found an interesting discussion on seoroundtable where Barry Schwartz and others were discussing using robots.txt to block low quality content areas affected by Panda.
http://www.seroundtable.com/google-farmer-advice-13090.html
The article is a bit dated. I was wondering what current opinions are on this.
We have some dynamically generated content pages which we tried to improve after panda. Resources have been limited and alas, they are still there. Until we can officially remove them I thought it may be a good idea to just block the entire directory. I would also remove them from my sitemaps and resubmit. There are links coming in but I could redirect the important ones (was going to do that anyway). Thoughts?
-
If the page no longer exists and you remove the robots command for that directory it shouldn't make much difference. Google could start reporting it as a 404 since it knows that the files used to exist and there's no longer a robots command to ignore the directory. I don't see any harm in leaving it there, but I also don't see many issues arising from removing the robots command.
-
Hey Mark - Thank you, this is really helpful.
This is really great advice for deindexing the pages when they still actually do exist.
One more question though. Once we actually remove them, once the directory no longer actually exists, there's no point in using the robots.txt disallow, right? At that point if they're still in the index only the tool will be useful.
I read these: https://support.google.com/webmasters/answer/59819?hl=en
While the webmaster guidelines say you need to use robots.txt, I don't see how that's a requirement for pages which don't actually exist anymore. Google shouldn't be able to crawl the pages once they no longer exist. Also, if the directory is in robots.txt but there are a few redirects within it, they redirects would not work. I also don't think adding a line to robots.txt every time we remove something is a good practice. Thoughts?
-
When you block a page or folder in robots.txt, it doesn't remove the page from the search engine's index, it just prevents them from recrawling the page. For pages/folders/sites that were never crawled by the search engines, robots.txt can prevent them from being crawled and read. But blocking pages already crawled by robots.txt will not be enough on its own to remove them from the index.
To remove this low quality content, you can do one of two things:
- Add a meta robots noindex tag to the content you want to remove - this tells the engine to remove the page from the index and that the content to them shouldn't be there - in effect, it's dead to them
- After blocking the folder via robots.txt, going in to Webmaster Tools and using the URL removal tool on the folder or domain.
I usually recommend option number 1, because it works for multiple engines, doesn't require webmaster tools for each engine separately, and is easier to manage and a lot more customizable exactly which pages you want removed.
But you are on the right track with the sitemaps - don't include links to the no index pages in the sitemap.
Good luck,
Mark
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What happens to crawled URLs subsequently blocked by robots.txt?
We have a very large store with 278,146 individual product pages. Since these are all various sizes and packaging quantities of less than 200 product categories my feeling is that Google would be better off making sure our category pages are indexed. I would like to block all product pages via robots.txt until we are sure all category pages are indexed, then unblock them. Our product pages rarely change, no ratings or product reviews so there is little reason for a search engine to revisit a product page. The sales team is afraid blocking a previously indexed product page will result in in it being removed from the Google index and would prefer to submit the categories by hand, 10 per day via requested crawling. Which is the better practice?
Intermediate & Advanced SEO | | AspenFasteners1 -
Sitemap and content question
This is our primary sitemap https://www.samhillbands.com/sitemaps/sitemap.xml We have a about 750 location based URL's that aren't currently linked anywhere on the site. https://www.samhillbands.com/sitemaps/locations.xml Google is indexing most of the URL because we submitted the locations sitemap directly for indexing. Thoughts on that? Should we just create a page that contains all of the location links and make it live on the site? Should we remove the locations sitemap from separate indexing...because of duplicate content? # Sitemap Type Processed Issues Items Submitted Indexed --- --- --- --- --- --- --- --- --- 1 /sitemaps/locations.xml Sitemap May 10, 2016 - Web 771 648 2 /sitemaps/sitemap.xml Sitemap index May 8, 2016 - Web 862 730
Intermediate & Advanced SEO | | brianvest0 -
Would you consider this thin content?
Just wondering what the community thinks about the following URLS and whether they are essentially thin content that should be handled through a canonical, noindex or a parameter filtering system: https://www.adversetdisplay.co.uk/products/3x1-popup-exhibition-stand https://www.adversetdisplay.co.uk/products/3x2-popup-exhibition-stand https://www.adversetdisplay.co.uk/products/3x3-popup-exhibition-stand https://www.adversetdisplay.co.uk/products/3x4-popup-exhibition-stand https://www.adversetdisplay.co.uk/products/3x5-popup-exhibition-stand
Intermediate & Advanced SEO | | ColinDocherty0 -
Do low quality subdomains affect the ranking performance/quality of a root domain?
Hi, Late last year the company I work for launched two new websites that, at the time, we believed were completely separate from our main website. The two new websites were set up externally and were not well-planned from an SEO perspective (LOTS of duplicate content) - hence, they have struggled to rank on Google. Since the launch of the new websites we have also noticed that our main website (that previously ranked very well) has suffered a decline in visitation and search engine rank. We initially attributed this to a number of factors, including the state of the market, and ramped up our SEO efforts (seeing minor improvement). We have since realised that these two new websites have been set up as subdomains of our main website, with MOZ displaying the same domain authority and root domain backlink profile. My question is, do poor quality subdomains affect the ranking performance of a root domain? I have not yet managed to find a definitive answer. Please let me know if more information is required - I am quite new to the whole SEO concept. Thanks! Amy
Intermediate & Advanced SEO | | paulissai0 -
Blocking some countries and redirecting that traffic
Hi there, I have a video site, which is on CDN and is really expensive to run. So I want to block most of the countries and only keep HQ ones. I wonder if there's a difference if I just block them and show blank page, or if I show them a page with text and let's say a link to a different site or if I just simply redirect to some other site. Do you think I can still get good ranking on google on countries that I don't block?
Intermediate & Advanced SEO | | melbog0 -
Wordpress Duplicate Content
We have recently moved our company's blog to Wordpress on a subdomain (we utilize the Yoast SEO plugin). We are now experiencing an ever-growing volume of crawl errors (nearly 300 4xx now) for pages that do not exist to begin with. I believe it may have something to do with having the blog on a subdomain and/or our yoast seo plugin's indexation archives (author, category, etc) --- we currently have Subpages of archives and taxonomies, and category archives in use. I'm not as familiar with Wordpress and the Yoast SEO plugin as I am with other CMS' so any help in this matter would be greatly appreciated. I can PM further info if necessary. Thank you for the help in advance.
Intermediate & Advanced SEO | | BethA0 -
Removing Duplicate Page Content
Since joining SEOMOZ four weeks ago I've been busy tweaking our site, a magento eCommerce store, and have successfully removed a significant portion of the errors. Now I need to remove/hide duplicate pages from the search engines and I'm wondering what is the best way to attack this? Can I solve this in one central location, or do I need to do something in the Google & Bing webmaster tools? Here is a list of duplicate content http://www.unitedbmwonline.com/?dir=asc&mode=grid&order=name http://www.unitedbmwonline.com/?dir=asc&mode=list&order=name
Intermediate & Advanced SEO | | SteveMaguire
http://www.unitedbmwonline.com/?dir=asc&order=name http://www.unitedbmwonline.com/?dir=desc&mode=grid&order=name http://www.unitedbmwonline.com/?dir=desc&mode=list&order=name http://www.unitedbmwonline.com/?dir=desc&order=name http://www.unitedbmwonline.com/?mode=grid http://www.unitedbmwonline.com/?mode=list Thanks in advance, Steve0 -
Duplicate Content from Article Directories
I have a small client with a website PR2, 268 links from 21 root domains with mozTrusts 5.5, MozRank 4.5 However whenever I check in google for the amount of link: Google always give the response none. My client has a blog and many articles on the blog. However they have submitted their blog article every time to article directories as well, plain and simle creating duplicate and content. Is this the reason why their link: is coming up as none? Is there something to correct the situation?
Intermediate & Advanced SEO | | danielkamen0