Should comments and feeds be disallowed in robots.txt?
-
Hi
My robots file is currently set up as listed below.
From an SEO point of view is it good to disallow feeds, rss and comments?
I feel allowing comments would be a good thing because it's new content that may rank in the search engines as the comments left on my blog often refer to questions or companies folks are searching for more information on. And the comments are added regularly.
What's your take? I'm also concerned about the /page being blocked. Not sure how that benefits my blog from an SEO point of view as well. Look forward to your feedback.
Thanks.
Eddy
User-agent: Googlebot Crawl-delay: 10 Allow: /* User-agent: * Crawl-delay: 10 Disallow: /wp- Disallow: /feed/ Disallow: /trackback/ Disallow: /rss/ Disallow: /comments/feed/ Disallow: /page/ Disallow: /date/ Disallow: /comments/ # Allow Everything Allow: /*
-
If I were going to disallow something I would go with noindex tags. The robots file is perfect with just those 2 lines.
Then, there are some plugins that will help you avoid any SEO issue like SEO by Yoast. Personally I like to noindex,follow tags, categories, and archive pages, that's it. But again, noindex, follow with a robots tag on the page, not using the robots.txt. SEO by Yoast will make that as easy as it can ever be with just a small configuration steps.
Give it a try, you can always disable plugins
Wish you the best!
-
Wordpress is a funny platform, you would think that there isn't much to disallow but there probably is quite a bit. I agree with Federico - you should allow comments, feed, and rss.
I'm not going to make blind assumptions here, so you should check your log files to see what's being constantly crawled, feel free to read this http://moz.com/blog/server-log-essentials-for-seo.
FYI - This is a big job. Shout if you need help.
P.S - Hostgator's Cpanel will allow you to archive raw server logs, make sure you check that option from now on or they'll be overwritten!
-
Thanks for the info!
I contacted Hostgator to fix the robots file because it had been blocking Google's bot for some time now. So that's the robot file they uploaded.
Yes I use wordpress, and apparently some stupid plugin had originally blocked google before hostgator fixed the robots file yesterday.
So to confirm you don't think anything else should be disallowed except for the /wp-admin directory. With the feeds, comments, etc, there isn't any SEO concerns like duplicate content or anything else that may work against me that should be blocked.
Is this safe to assume?
Thanks again!
Eddy
-
Who wrote that robots.txt?
You shouldn't disallow the comments, or feed or almost anything.
I notice you are using wordpress, so if you just want to avoid the admin being indexed (which will isn't going to be as Google does not have access anyway), your robots.txt should look like this:
User-Agent:*
Disallow: /wp-admin/
That's it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt Help
I need help to create robots.txt file. Please let me know what to add in the file. any real example or working example.?
Intermediate & Advanced SEO | | Michael.Leonard0 -
Default Robots.txt in WordPress - Should i change it??
I have a WordPress site as using theme Genesis i am using default robots.txt. that has a line Allow: /wp-admin/admin-ajax.php, is it okay or any problem. Should i change it?
Intermediate & Advanced SEO | | rootwaysinc0 -
Dealing with thin comment
Hi again! I've got a site where around 30% of URLs have less than 250 words of copy. It's big though, so that is roughly 5,000 pages. It's an ecommerce site and not feasible to bulk up each one. I'm wondering if noindexing them is a good idea, and then measuring if this has an effect on organic search?
Intermediate & Advanced SEO | | Blink-SEO1 -
Should I use meta noindex and robots.txt disallow?
Hi, we have an alternate "list view" version of every one of our search results pages The list view has its own URL, indicated by a URL parameter I'm concerned about wasting our crawl budget on all these list view pages, which effectively doubles the amount of pages that need crawling When they were first launched, I had the noindex meta tag be placed on all list view pages, but I'm concerned that they are still being crawled Should I therefore go ahead and also apply a robots.txt disallow on that parameter to ensure that no crawling occurs? Or, will Googlebot/Bingbot also stop crawling that page over time? I assume that noindex still means "crawl"... Thanks 🙂
Intermediate & Advanced SEO | | ntcma0 -
Is an RSS feed considered duplicate content?
I have a large client with satellite sites. The large site produces many news articles and they want to put an RSS feed on the satellite sites that will display the articles from the large site. My question is, will the rss feeds on the satellite sites be considered duplicate content? If yes, do you have a suggestion to utilize the data from the large site without being penalized? If no, do you have suggestions on what tags should be used on the satellite pages? EX: wrapped in tags? THANKS for the help. Darlene
Intermediate & Advanced SEO | | gXeSEO0 -
Will an RSS feed help new product get indexed? How to create one for product?
Hi I've read that creating an RSS feed for one of our ecommerce sites will help the products get indexed faster. Currently it takes google 4-5 days to index our new products, we want to speed that up. Will an RSS feed of the new products we have help? How do you create an RSS feed for this? Our blog gets indexed within minutes, but our main website, 4 days. Help!
Intermediate & Advanced SEO | | xoffie0 -
Should I robots block this directory?
There's about 43k pages indexed in this directory, and while helpful to end users, I don't see it being a great source of unique content for search engines. Would you robots block or meta noindex nofollow these pages in the /blissindex/ directory? ie. http://www.careerbliss.com/blissindex/petsmart-index-980481/ http://www.careerbliss.com/blissindex/att-index-1043730/ http://www.careerbliss.com/blissindex/facebook-index-996632/
Intermediate & Advanced SEO | | CareerBliss0 -
Meta Description In Blog Feed
The SEOmoz crawl tool is giving me a lot of crawl errors because my blog feed and my blog tags do not have meta descriptions. Can you even give this type of content meta descriptions? If so how can you do it, as this content is created dynamically by Wordpress?
Intermediate & Advanced SEO | | MyNet0