Blocking poor quality content areas with robots.txt
-
I found an interesting discussion on seoroundtable where Barry Schwartz and others were discussing using robots.txt to block low quality content areas affected by Panda.
http://www.seroundtable.com/google-farmer-advice-13090.html
The article is a bit dated. I was wondering what current opinions are on this.
We have some dynamically generated content pages which we tried to improve after panda. Resources have been limited and alas, they are still there. Until we can officially remove them I thought it may be a good idea to just block the entire directory. I would also remove them from my sitemaps and resubmit. There are links coming in but I could redirect the important ones (was going to do that anyway). Thoughts?
-
If the page no longer exists and you remove the robots command for that directory it shouldn't make much difference. Google could start reporting it as a 404 since it knows that the files used to exist and there's no longer a robots command to ignore the directory. I don't see any harm in leaving it there, but I also don't see many issues arising from removing the robots command.
-
Hey Mark - Thank you, this is really helpful.
This is really great advice for deindexing the pages when they still actually do exist.
One more question though. Once we actually remove them, once the directory no longer actually exists, there's no point in using the robots.txt disallow, right? At that point if they're still in the index only the tool will be useful.
I read these: https://support.google.com/webmasters/answer/59819?hl=en
While the webmaster guidelines say you need to use robots.txt, I don't see how that's a requirement for pages which don't actually exist anymore. Google shouldn't be able to crawl the pages once they no longer exist. Also, if the directory is in robots.txt but there are a few redirects within it, they redirects would not work. I also don't think adding a line to robots.txt every time we remove something is a good practice. Thoughts?
-
When you block a page or folder in robots.txt, it doesn't remove the page from the search engine's index, it just prevents them from recrawling the page. For pages/folders/sites that were never crawled by the search engines, robots.txt can prevent them from being crawled and read. But blocking pages already crawled by robots.txt will not be enough on its own to remove them from the index.
To remove this low quality content, you can do one of two things:
- Add a meta robots noindex tag to the content you want to remove - this tells the engine to remove the page from the index and that the content to them shouldn't be there - in effect, it's dead to them
- After blocking the folder via robots.txt, going in to Webmaster Tools and using the URL removal tool on the folder or domain.
I usually recommend option number 1, because it works for multiple engines, doesn't require webmaster tools for each engine separately, and is easier to manage and a lot more customizable exactly which pages you want removed.
But you are on the right track with the sitemaps - don't include links to the no index pages in the sitemap.
Good luck,
Mark
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Application & understanding of robots.txt
Hello Moz World! I have been reading up on robots.txt files, and I understand the basics. I am looking for a deeper understanding on when to deploy particular tags, and when a page should be disallowed because it will affect SEO. I have been working with a software company who has a News & Events page which I don't think should be indexed. It changes every week, and is only relevant to potential customers who want to book a demo or attend an event, not so much search engines. My initial thinking was that I should use noindex/follow tag on that page. So, the pages would not be indexed, but all the links will be crawled. I decided to look at some of our competitors robots.txt files. Smartbear (https://smartbear.com/robots.txt), b2wsoftware (http://www.b2wsoftware.com/robots.txt) & labtech (http://www.labtechsoftware.com/robots.txt). I am still confused on what type of tags I should use, and how to gauge which set of tags is best for certain pages. I figured a static page is pretty much always good to index and follow, as long as it's public. And, I should always include a sitemap file. But, What about a dynamic page? What about pages that are out of date? Will this help with soft 404s? This is a long one, but I appreciate all of the expert insight. Thanks ahead of time for all of the awesome responses. Best Regards, Will H.
Intermediate & Advanced SEO | | MarketingChimp100 -
Our parent company has included their sitemap links in our robots.txt file - will that have an impact on the way our site is crawled?
Our parent company has included their sitemap links in our robots.txt file. All of their sitemap links are on a different domain and I'm wondering if this will have any impact on our searchability or potential rankings.
Intermediate & Advanced SEO | | tsmith1310 -
Content question please help
Would content behind a drop down on this site Https://www.homeleisuredirect.com/pool_tables/english_pool_tables/ you have to click the - more about English pool tables text under the video Work just as well for SEO as content on the page like this site http://www.pooltablesonline.co.uk/uk-slate-bed-pool-tables.asp
Intermediate & Advanced SEO | | BobAnderson0 -
How to deal with URLs and tabbed content
Hi All, We're currently redesigning a website for a new home developer and we're trying to figure out the best way to deal with tabbed content in the URL structure. The design of the site at the moment will have a page for a development and within that you can select your house type, then when on the house type page there will be tabs displayed for the user to see things like the plot map, availability and pricing, specifications, etc. The way our development team are looking at handling this is for the URL to use a hashtag or a query string at the end of it so we can still land users on these specific tabs for PPC for example. My question is really, has anyone had any experience with this? Any recommendations on how to best display the urls for SEO? Thanks
Intermediate & Advanced SEO | | J_Sinclair0 -
Is our robots.txt file correct?
Could you please review our robots.txt file and let me know if this is correct. www.faithology.com/robots.txt Thank you!
Intermediate & Advanced SEO | | BMPIRE0 -
Robots.txt Question
For our company website faithology.com we are attempting to block out any urls that contain a ? mark to keep google from seeing some pages as duplicates. Our robots.txt is as follows: User-Agent: * Disallow: /*? User-agent: rogerbot Disallow: /community/ Is the above correct? We are wanting them to not crawl any url with a "?" inside, however we don't want to harm ourselves in seo. Thanks for your help!
Intermediate & Advanced SEO | | BMPIRE0 -
Duplicate Content on Product Pages
I'm getting a lot of duplicate content errors on my ecommerce site www.outdoormegastore.co.uk mainly centered around product pages. The products are completely different in terms of the title, meta data, product descriptions and images (with alt tags)but SEOmoz is still identifying them as duplicates and we've noticed a significant drop in google ranking lately. Admittedly the product descriptions are a little bit thin but I don't understand why the pages would be viewed as duplicates and therefore can be ranked lower? The content is definitely unique too. As an example these three pages have been identified as being duplicates of each other. http://www.outdoormegastore.co.uk/regatta-landtrek-25l-rucksack.html http://www.outdoormegastore.co.uk/canyon-bryce-adult-cycling-helmet-9045.html http://www.outdoormegastore.co.uk/outwell-minnesota-6-carpet-for-green-07-08-tent.html
Intermediate & Advanced SEO | | gavinhoman0