Google Webmaster Tools is saying "Sitemap contains urls which are blocked by robots.txt" after Https move...
-
Hi Everyone,
I really don't see anything wrong with our robots.txt file after our https move that just happened, but Google says all URLs are blocked. The only change I know we need to make is changing the sitemap url to https. Anything you all see wrong with this robots.txt file?
robots.txt
This file is to prevent the crawling and indexing of certain parts
of your site by web crawlers and spiders run by sites like Yahoo!
and Google. By telling these "robots" where not to go on your site,
you save bandwidth and server resources.
This file will be ignored unless it is at the root of your host:
Used: http://example.com/robots.txt
Ignored: http://example.com/site/robots.txt
For more information about the robots.txt standard, see:
http://www.robotstxt.org/wc/robots.html
For syntax checking, see:
http://www.sxw.org.uk/computing/robots/check.html
Website Sitemap
Sitemap: http://www.bestpricenutrition.com/sitemap.xml
Crawlers Setup
User-agent: *
Allowable Index
Allow: /*?p=
Allow: /index.php/blog/
Allow: /catalog/seo_sitemap/category/Directories
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /includes/
Disallow: /lib/
Disallow: /magento/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /stats/
Disallow: /var/Paths (clean URLs)
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/
Disallow: /aitmanufacturers/index/view/
Disallow: /blog/tag/
Disallow: /advancedreviews/abuse/reportajax/
Disallow: /advancedreviews/ajaxproduct/
Disallow: /advancedreviews/proscons/checkbyproscons/
Disallow: /catalog/product/gallery/
Disallow: /productquestions/index/ajaxform/Files
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txtPaths (no clean URLs)
Disallow: /.php$
Disallow: /?SID=
disallow: /?cat=
disallow: /?price=
disallow: /?flavor=
disallow: /?dir=
disallow: /?mode=
disallow: /?list=
disallow: /?limit=5
disallow: /?limit=10
disallow: /?limit=15
disallow: /?limit=20
disallow: /*?limit=25 -
Thanks again for the response. Looks like it just took a little more time for Google to resolve the issue. No more errors. Didn't do anything but resubmit Sitemap and Robots.txt.
Thanks for the tips as well. I am going to post one more question in another thread.
-
Jeff,
I was only able to find only ONE URL in the sitemap that is blocked by the robots.txt that you've posted in this question.
Check the image attached.
The URL is: https://www.bestpricenutrition.com/catalog/product/view/id/15650.htmlWhat did I do? A manual search of all the disallowed terms in the sitemap.
Also, you might want to take a comprehensive read at this article about robots.txt. It helped me to find that mistake.
The complete guide to Robots.txt - Portent.comBest Luck.
GR. -
Thanks for the quick response.
-
Yes...Google Webmaster Tools is giving examples...and they are basically all the product pages.
-
Did the Add Site under Google Webmaster Tools yes...this is from that new 'account'.
-
Yes...we are fixing that.
You see anything in that robots.text above that would indicate we are blocking https product pages?
-
-
Hello Jeff,
Just some routine questions to establish a base line:
- Have you checked that the sitemap doesnt include any of the disallowed URLs?
- You said that there was a movement to HTTPS, have you created a new account for the new domain?
- Im seing that the robots.txt has the old URL for the sitemap, without the HTTPS correction.
Let me know.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
URL Indexed But Not Submitted to Sitemap
Hi guys, In Google's webmaster tool it says that the URL has been indexed but not submitted to the sitemap. Is it necessary that the URL be submitted to the sitemap if it has already been indexed? Appreciate your help with this. Mark
Technical SEO | | marktheshark100 -
Strange Webmaster Tools Crawl Report
Up until recently I had robots.txt blocking the indexing of my pdf files which are all manuals for products we sell. I changed this last week to allow indexing of those files and now my webmaster tools crawl report is listing all my pdfs as not founds. What is really strange is that Webmaster Tools is listing an incorrect link structure: "domain.com/file.pdf" instead of "domain.com/manuals/file.pdf" Why is google indexing these particular pages incorrectly? My robots.txt has nothing else in it besides a disallow for an entirely different folder on my server and my htaccess is not redirecting anything in regards to my manuals folder either. Even in the case of outside links present in the crawl report supposedly linking to this 404 file when I visit these 3rd party pages they have the correct link structure. Hope someone can help because right now my not founds are up in the 500s and that can't be good 🙂 Thanks is advance!
Technical SEO | | Virage0 -
Webmaster tools
Hello, My sites are showing odd "links to your site" data in WMT. Its not showing any links to the homepages and reduced links for other pages. Anyone else seeing this? Penguin refresh maybe?
Technical SEO | | jwdl0 -
Google Webmasters News Errors ressolution
Hello to the community, i had a sudden increase from just a couple to 50 someting Google Webmaster News Errors. The two areas affected are Content of article and date of article.I found a very good article in SEOMoz about Google Webmasters, but it was published before the changes early last year were done in Google Webmasters. http://www.seomoz.org/blog/how-to-fix-crawl-errors-in-google-webmaster-tools The people that have been asking the same question in the internet have not yet received replies from Google and the Google support replies dont make it really clear. http://support.google.com/webmasters/bin/answer.py?hl=en&answer=93994 Any views experiences with this. My site is in Google News, but we do not have a Google News Sitemap. Thanks, Polar
Technical SEO | | Polarstar0 -
Using Robots.txt
I want to Block or prevent pages being accessed or indexed by googlebot. Please tell me if googlebot will NOT Access any URL that begins with my domain name, followed by a question mark,followed by any string by using Robots.txt below. Sample URL http://mydomain.com/?example User-agent: Googlebot Disallow: /?
Technical SEO | | semer0 -
Robots.txt not working?
Hello This is my robots.txt file http://www.theprinterdepo.com/Robots.txt However I have 8000 warnings on my dashboard like this:4 What am I missing on the file¿ Crawl Diagnostics Report On-Page Properties <dl> <dt>Title</dt> <dd>Not present/empty</dd> <dt>Meta Description</dt> <dd>Not present/empty</dd> <dt>Meta Robots</dt> <dd>Not present/empty</dd> <dt>Meta Refresh</dt> <dd>Not present/empty</dd> </dl> URL: http://www.theprinterdepo.com/catalog/product_compare/add/product/100/uenc/aHR0cDovL3d3dy50aGVwcmludGVyZGVwby5jb20vaHAtbWFpbnRlbmFjZS1raXQtZm9yLTQtbGo0LWxqNS1mb3ItZXhjaGFuZ2UtcmVmdWJpc2hlZA,,/ 0 Errors No errors found! 1 Warning 302 (Temporary Redirect) Found about 5 hours ago <a class="more">Read More</a>
Technical SEO | | levalencia10 -
"To keyword or not to keyword" in the URL string?
We are debating on whether to use primary keywords in the URL for every page for a new client for the sake of SEO. What is the feeling in the Community on which version is smarter? Version 1: www.abccompany.com/miami-moving-company/about-us www.abccompany.com/miami-moving-company/contact-us etc. etc. Version 2: www.abccompany.com/about-us Thank you for your thoughts!
Technical SEO | | theideapeople0 -
Robots.txt and robots meta
I have an odd situation. I have a CMS that has a global robots.txt which has the generic User-Agent: *
Technical SEO | | Highland
Allow: / I also have one CMS site that needs to not be indexed ever. I've read in various pages (like http://www.jesterwebster.com/robots-txt-vs-meta-tag-which-has-precedence/22 ) that robots.txt always wins over meta, but I have also read that robots.txt indicates spiderability whereas meta can control indexation. I just want the site to not be indexed. Can I leave the robots.txt as is and still put NOINDEX in the robots meta?0