Google Webmaster Tools is saying "Sitemap contains urls which are blocked by robots.txt" after Https move...

vetofunk

Hi Everyone,

I really don't see anything wrong with our robots.txt file after our https move that just happened, but Google says all URLs are blocked. The only change I know we need to make is changing the sitemap url to https. Anything you all see wrong with this robots.txt file?

robots.txt

This file is to prevent the crawling and indexing of certain parts

of your site by web crawlers and spiders run by sites like Yahoo!

and Google. By telling these "robots" where not to go on your site,

you save bandwidth and server resources.

This file will be ignored unless it is at the root of your host:

Used: http://example.com/robots.txt

Ignored: http://example.com/site/robots.txt

For more information about the robots.txt standard, see:

http://www.robotstxt.org/wc/robots.html

For syntax checking, see:

http://www.sxw.org.uk/computing/robots/check.html

Website Sitemap

Sitemap: http://www.bestpricenutrition.com/sitemap.xml

Crawlers Setup

User-agent: *

Allowable Index

Allow: /*?p=
Allow: /index.php/blog/
Allow: /catalog/seo_sitemap/category/

Directories

Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /includes/
Disallow: /lib/
Disallow: /magento/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /stats/
Disallow: /var/

Paths (clean URLs)

Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/
Disallow: /aitmanufacturers/index/view/
Disallow: /blog/tag/
Disallow: /advancedreviews/abuse/reportajax/
Disallow: /advancedreviews/ajaxproduct/
Disallow: /advancedreviews/proscons/checkbyproscons/
Disallow: /catalog/product/gallery/
Disallow: /productquestions/index/ajaxform/

Files

Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt

Paths (no clean URLs)

Disallow: /.php$
Disallow: /?SID=
disallow: /?cat=
disallow: /?price=
disallow: /?flavor=
disallow: /?dir=
disallow: /?mode=
disallow: /?list=
disallow: /?limit=5
disallow: /?limit=10
disallow: /?limit=15
disallow: /?limit=20
disallow: /*?limit=25

vetofunk

Thanks again for the response. Looks like it just took a little more time for Google to resolve the issue. No more errors. Didn't do anything but resubmit Sitemap and Robots.txt.

Thanks for the tips as well. I am going to post one more question in another thread.

Gaston Riera

Jeff,

I was only able to find only ONE URL in the sitemap that is blocked by the robots.txt that you've posted in this question.
Check the image attached.
The URL is: https://www.bestpricenutrition.com/catalog/product/view/id/15650.html

What did I do? A manual search of all the disallowed terms in the sitemap.

Also, you might want to take a comprehensive read at this article about robots.txt. It helped me to find that mistake.
The complete guide to Robots.txt - Portent.com

Best Luck.
GR.

22901473c0a7ba7fc6d7dbad6b3ab319

vetofunk

Thanks for the quick response.

Yes...Google Webmaster Tools is giving examples...and they are basically all the product pages.
Did the Add Site under Google Webmaster Tools yes...this is from that new 'account'.
Yes...we are fixing that.

You see anything in that robots.text above that would indicate we are blocking https product pages?

Gaston Riera

Hello Jeff,

Just some routine questions to establish a base line:

Have you checked that the sitemap doesnt include any of the disallowed URLs?
You said that there was a movement to HTTPS, have you created a new account for the new domain?
Im seing that the robots.txt has the old URL for the sitemap, without the HTTPS correction.

Let me know.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.