Should comments and feeds be disallowed in robots.txt?
-
Hi
My robots file is currently set up as listed below.
From an SEO point of view is it good to disallow feeds, rss and comments?
I feel allowing comments would be a good thing because it's new content that may rank in the search engines as the comments left on my blog often refer to questions or companies folks are searching for more information on. And the comments are added regularly.
What's your take? I'm also concerned about the /page being blocked. Not sure how that benefits my blog from an SEO point of view as well. Look forward to your feedback.
Thanks.
Eddy
User-agent: Googlebot Crawl-delay: 10 Allow: /* User-agent: * Crawl-delay: 10 Disallow: /wp- Disallow: /feed/ Disallow: /trackback/ Disallow: /rss/ Disallow: /comments/feed/ Disallow: /page/ Disallow: /date/ Disallow: /comments/ # Allow Everything Allow: /*
-
If I were going to disallow something I would go with noindex tags. The robots file is perfect with just those 2 lines.
Then, there are some plugins that will help you avoid any SEO issue like SEO by Yoast. Personally I like to noindex,follow tags, categories, and archive pages, that's it. But again, noindex, follow with a robots tag on the page, not using the robots.txt. SEO by Yoast will make that as easy as it can ever be with just a small configuration steps.
Give it a try, you can always disable plugins
Wish you the best!
-
Wordpress is a funny platform, you would think that there isn't much to disallow but there probably is quite a bit. I agree with Federico - you should allow comments, feed, and rss.
I'm not going to make blind assumptions here, so you should check your log files to see what's being constantly crawled, feel free to read this http://moz.com/blog/server-log-essentials-for-seo.
FYI - This is a big job. Shout if you need help.
P.S - Hostgator's Cpanel will allow you to archive raw server logs, make sure you check that option from now on or they'll be overwritten!
-
Thanks for the info!
I contacted Hostgator to fix the robots file because it had been blocking Google's bot for some time now. So that's the robot file they uploaded.
Yes I use wordpress, and apparently some stupid plugin had originally blocked google before hostgator fixed the robots file yesterday.
So to confirm you don't think anything else should be disallowed except for the /wp-admin directory. With the feeds, comments, etc, there isn't any SEO concerns like duplicate content or anything else that may work against me that should be blocked.
Is this safe to assume?
Thanks again!
Eddy
-
Who wrote that robots.txt?
You shouldn't disallow the comments, or feed or almost anything.
I notice you are using wordpress, so if you just want to avoid the admin being indexed (which will isn't going to be as Google does not have access anyway), your robots.txt should look like this:
User-Agent:*
Disallow: /wp-admin/
That's it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt blocked internal resources Wordpress
Hi all, We've recently migrated a Wordpress website from staging to live, but the robots.txt was deleted. I've created the following new one: User-agent: *
Intermediate & Advanced SEO | | Mat_C
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Allow: /wp-admin/admin-ajax.php However, in the site audit on SemRush, I now get the mention that a lot of pages have issues with blocked internal resources in robots.txt file. These blocked internal resources are all cached and minified css elements: links, images and scripts. Does this mean that Google won't crawl some parts of these pages with blocked resources correctly and thus won't be able to follow these links and index the images? In other words, is this any cause for concern regarding SEO? Of course I can change the robots.txt again, but will urls like https://example.com/wp-content/cache/minify/df983.js end up in the index? Thanks for your thoughts!2 -
Set Robots.txt file to crawl my website at specific times
Our website provider has stated that they can only 'lift' their block on our website in order for it to be crawled as specific times. Is there any way to amend a robots.txt to ensure that it crawls our website at a specific time of day/night in order to coincide with the block being lifted? Many Thanks, Charlene
Intermediate & Advanced SEO | | CharleneKennedy120 -
Robots.txt Disallowed Pages and Still Indexed
Alright, I am pretty sure I know the answer is "Nothing more I can do here." but I just wanted to double check. It relates to the robots.txt file and that pesky "A description for this result is not available because of this site's robots.txt". Typically people want the URL indexed and the normal Meta Description to be displayed but I don't want the link there at all. I purposefully am trying to robots that stuff outta there.
Intermediate & Advanced SEO | | DRSearchEngOpt
My question is, has anybody tried to get a page taken out of the Index and had this happen; URL still there but pesky robots.txt message for meta description? Were you able to get the URL to no longer show up or did you just live with this? Thanks folks, you are always great!0 -
Disallow URLs ENDING with certain values in robots.txt?
Is there any way to disallow URLs ending in a certain value? For example, if I have the following product page URL: http://website.com/category/product1, and I want to disallow /category/product1/review, /category/product2/review, etc. without disallowing the product pages themselves, is there any shortcut to do this, or must I disallow each gallery page individually?
Intermediate & Advanced SEO | | jmorehouse0 -
If other websites implement our RSS feed sidewide on there website, can that hurt our own website?
Think about the switching anchors from the backlinks and the 100s of sidewide inlinks... I gues Google will understand that it's just a RSS feed right?
Intermediate & Advanced SEO | | Zanox0 -
If i disallow unfriendly URL via robots.txt, will its friendly counterpart still be indexed?
Our not-so-lovely CMS loves to render pages regardless of the URL structure, just as long as the page name itself is correct. For example, it will render the following as the same page: example.com/123.html example.com/dumb/123.html example.com/really/dumb/duplicative/URL/123.html To help combat this, we are creating mod rewrites with friendly urls, so all of the above would simply render as example.com/123 I understand robots.txt respects the wildcard (*), so I was considering adding this to our robots.txt: Disallow: */123.html If I move forward, will this block all of the potential permutations of the directories preceding 123.html yet not block our friendly example.com/123? Oh, and yes, we do use the canonical tag religiously - we're just mucking with the robots.txt as an added safety net.
Intermediate & Advanced SEO | | mrwestern0 -
Effect duration of robots.txt file.
in my web site there is demo site in that also, index in Google but no need it now.so i have created robots file and upload to server yesterday.in the demo folder there are some html files,and i wanna remove all these in demo file from Google.but still in web master tools it showing User-agent: *
Intermediate & Advanced SEO | | innofidelity
Disallow: /demo/ How long this will take to remove from Google ? And are there any alternative way doing that ?0 -
Will an RSS feed help new product get indexed? How to create one for product?
Hi I've read that creating an RSS feed for one of our ecommerce sites will help the products get indexed faster. Currently it takes google 4-5 days to index our new products, we want to speed that up. Will an RSS feed of the new products we have help? How do you create an RSS feed for this? Our blog gets indexed within minutes, but our main website, 4 days. Help!
Intermediate & Advanced SEO | | xoffie0