Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Should comments and feeds be disallowed in robots.txt?
-
Hi
My robots file is currently set up as listed below.
From an SEO point of view is it good to disallow feeds, rss and comments?
I feel allowing comments would be a good thing because it's new content that may rank in the search engines as the comments left on my blog often refer to questions or companies folks are searching for more information on. And the comments are added regularly.
What's your take? I'm also concerned about the /page being blocked. Not sure how that benefits my blog from an SEO point of view as well. Look forward to your feedback.
Thanks.
Eddy
User-agent: Googlebot Crawl-delay: 10 Allow: /* User-agent: * Crawl-delay: 10 Disallow: /wp- Disallow: /feed/ Disallow: /trackback/ Disallow: /rss/ Disallow: /comments/feed/ Disallow: /page/ Disallow: /date/ Disallow: /comments/ # Allow Everything Allow: /* -
If I were going to disallow something I would go with noindex tags. The robots file is perfect with just those 2 lines.
Then, there are some plugins that will help you avoid any SEO issue like SEO by Yoast. Personally I like to noindex,follow tags, categories, and archive pages, that's it. But again, noindex, follow with a robots tag on the page, not using the robots.txt. SEO by Yoast will make that as easy as it can ever be with just a small configuration steps.
Give it a try, you can always disable plugins

Wish you the best!
-
Wordpress is a funny platform, you would think that there isn't much to disallow but there probably is quite a bit. I agree with Federico - you should allow comments, feed, and rss.
I'm not going to make blind assumptions here, so you should check your log files to see what's being constantly crawled, feel free to read this http://moz.com/blog/server-log-essentials-for-seo.
FYI - This is a big job. Shout if you need help.
P.S - Hostgator's Cpanel will allow you to archive raw server logs, make sure you check that option from now on or they'll be overwritten!
-
Thanks for the info!
I contacted Hostgator to fix the robots file because it had been blocking Google's bot for some time now. So that's the robot file they uploaded.
Yes I use wordpress, and apparently some stupid plugin had originally blocked google before hostgator fixed the robots file yesterday.
So to confirm you don't think anything else should be disallowed except for the /wp-admin directory. With the feeds, comments, etc, there isn't any SEO concerns like duplicate content or anything else that may work against me that should be blocked.
Is this safe to assume?
Thanks again!
Eddy
-
Who wrote that robots.txt?
You shouldn't disallow the comments, or feed or almost anything.
I notice you are using wordpress, so if you just want to avoid the admin being indexed (which will isn't going to be as Google does not have access anyway), your robots.txt should look like this:
User-Agent:*
Disallow: /wp-admin/
That's it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What happens to crawled URLs subsequently blocked by robots.txt?
We have a very large store with 278,146 individual product pages. Since these are all various sizes and packaging quantities of less than 200 product categories my feeling is that Google would be better off making sure our category pages are indexed. I would like to block all product pages via robots.txt until we are sure all category pages are indexed, then unblock them. Our product pages rarely change, no ratings or product reviews so there is little reason for a search engine to revisit a product page. The sales team is afraid blocking a previously indexed product page will result in in it being removed from the Google index and would prefer to submit the categories by hand, 10 per day via requested crawling. Which is the better practice?
Intermediate & Advanced SEO | | AspenFasteners1 -
Can you disallow links via Search Console?
Hey guys, Is it possible in anyway to nofollow links via search console (not disavow) but just nofollow external links pointing to your site? Cheers.
Intermediate & Advanced SEO | | lohardiu90 -
Will disallowing URL's in the robots.txt file stop those URL's being indexed by Google
I found a lot of duplicate title tags showing in Google Webmaster Tools. When I visited the URL's that these duplicates belonged to, I found that they were just images from a gallery that we didn't particularly want Google to index. There is no benefit to the end user in these image pages being indexed in Google. Our developer has told us that these urls are created by a module and are not "real" pages in the CMS. They would like to add the following to our robots.txt file Disallow: /catalog/product/gallery/ QUESTION: If the these pages are already indexed by Google, will this adjustment to the robots.txt file help to remove the pages from the index? We don't want these pages to be found.
Intermediate & Advanced SEO | | andyheath0 -
What do you add to your robots.txt on your ecommerce sites?
We're looking at expanding our robots.txt, we currently don't have the ability to noindex/nofollow. We're thinking about adding the following: Checkout Basket Then possibly: Price Theme Sortby other misc filters. What do you include?
Intermediate & Advanced SEO | | ThomasHarvey0 -
If Robots.txt have blocked an Image (Image URL) but the other page which can be indexed has this image, how is the image treated?
Hi MOZers, This probably is a dumb question but I have a case where the robots.tags has an image url blocked but this image is used on a page (lets call it Page A) which can be indexed. If the image on Page A has an Alt tags, then how is this information digested by crawlers? A) would Google totally ignore the image and the ALT tags information? OR B) Google would consider the ALT tags information? I am asking this because all the images on the website are blocked by robots.txt at the moment but I would really like website crawlers to crawl the alt tags information. Chances are that I will ask the webmaster to allow indexing of images too but I would like to understand what's happening currently. Looking forward to all your responses 🙂 Malika
Intermediate & Advanced SEO | | Malika11 -
Should I use meta noindex and robots.txt disallow?
Hi, we have an alternate "list view" version of every one of our search results pages The list view has its own URL, indicated by a URL parameter I'm concerned about wasting our crawl budget on all these list view pages, which effectively doubles the amount of pages that need crawling When they were first launched, I had the noindex meta tag be placed on all list view pages, but I'm concerned that they are still being crawled Should I therefore go ahead and also apply a robots.txt disallow on that parameter to ensure that no crawling occurs? Or, will Googlebot/Bingbot also stop crawling that page over time? I assume that noindex still means "crawl"... Thanks 🙂
Intermediate & Advanced SEO | | ntcma0 -
Any SEO Penalties from Removing RSS Feed?
Hi, I have a site that has a Feedburner feed that has been in place for 5+ years. I am considering getting rid of the feed or starting a new one to combat content scraping. Google continues to rank thieves' sites ahead of mine. Google and Bing have no issue and always get it right. I use Wordpress and have the plugin PubSubHubb, but that is no guarantee. Nonetheless, there is no monetary value of my subscribers whereas the content not being accredited to me takes money out of my pocket as my model is advertising. Is there any SEO issue if I do any of the following: Delete the feed and not have one? Change the feed address and drop all subscribers? Attachments: DMCA Dashboard; example of being outranked by scrapers. My site: www.furniturefashion.com Thanks for your time and hopefully I did not vent too much. OWmou6k f6W3xkq.png
Intermediate & Advanced SEO | | will21121 -
Do you add 404 page into robot file or just add no index tag?
Hi, got different opinion on this so i wanted to double check with your comment is. We've got /404.html page and I was wondering if you would add this page to robot text so it wouldn't be indexed or would you just add no index tag? What would be the best approach? Thanks!
Intermediate & Advanced SEO | | Rubix0