Google Webmaster Tools is saying "Sitemap contains urls which are blocked by robots.txt" after Https move...
-
Hi Everyone,
I really don't see anything wrong with our robots.txt file after our https move that just happened, but Google says all URLs are blocked. The only change I know we need to make is changing the sitemap url to https. Anything you all see wrong with this robots.txt file?
robots.txt
This file is to prevent the crawling and indexing of certain parts
of your site by web crawlers and spiders run by sites like Yahoo!
and Google. By telling these "robots" where not to go on your site,
you save bandwidth and server resources.
This file will be ignored unless it is at the root of your host:
Used: http://example.com/robots.txt
Ignored: http://example.com/site/robots.txt
For more information about the robots.txt standard, see:
http://www.robotstxt.org/wc/robots.html
For syntax checking, see:
http://www.sxw.org.uk/computing/robots/check.html
Website Sitemap
Sitemap: http://www.bestpricenutrition.com/sitemap.xml
Crawlers Setup
User-agent: *
Allowable Index
Allow: /*?p=
Allow: /index.php/blog/
Allow: /catalog/seo_sitemap/category/Directories
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /includes/
Disallow: /lib/
Disallow: /magento/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /stats/
Disallow: /var/Paths (clean URLs)
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/
Disallow: /aitmanufacturers/index/view/
Disallow: /blog/tag/
Disallow: /advancedreviews/abuse/reportajax/
Disallow: /advancedreviews/ajaxproduct/
Disallow: /advancedreviews/proscons/checkbyproscons/
Disallow: /catalog/product/gallery/
Disallow: /productquestions/index/ajaxform/Files
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txtPaths (no clean URLs)
Disallow: /.php$
Disallow: /?SID=
disallow: /?cat=
disallow: /?price=
disallow: /?flavor=
disallow: /?dir=
disallow: /?mode=
disallow: /?list=
disallow: /?limit=5
disallow: /?limit=10
disallow: /?limit=15
disallow: /?limit=20
disallow: /*?limit=25 -
Thanks again for the response. Looks like it just took a little more time for Google to resolve the issue. No more errors. Didn't do anything but resubmit Sitemap and Robots.txt.
Thanks for the tips as well. I am going to post one more question in another thread.
-
Jeff,
I was only able to find only ONE URL in the sitemap that is blocked by the robots.txt that you've posted in this question.
Check the image attached.
The URL is: https://www.bestpricenutrition.com/catalog/product/view/id/15650.htmlWhat did I do? A manual search of all the disallowed terms in the sitemap.
Also, you might want to take a comprehensive read at this article about robots.txt. It helped me to find that mistake.
The complete guide to Robots.txt - Portent.comBest Luck.
GR. -
Thanks for the quick response.
-
Yes...Google Webmaster Tools is giving examples...and they are basically all the product pages.
-
Did the Add Site under Google Webmaster Tools yes...this is from that new 'account'.
-
Yes...we are fixing that.
You see anything in that robots.text above that would indicate we are blocking https product pages?
-
-
Hello Jeff,
Just some routine questions to establish a base line:
- Have you checked that the sitemap doesnt include any of the disallowed URLs?
- You said that there was a movement to HTTPS, have you created a new account for the new domain?
- Im seing that the robots.txt has the old URL for the sitemap, without the HTTPS correction.
Let me know.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Desktop & Mobile XML Sitemap Submitted But Only Desktop Sitemap Indexed On Google Search Console
Hi! The Problem We have submitted to GSC a sitemap index. Within that index there are 4 XML Sitemaps. Including one for the desktop site and one for the mobile site. The desktop sitemap has 3300 URLs, of which Google has indexed (according to GSC) 3,000 (approx). The mobile sitemap has 1,000 URLs of which Google has indexed 74 of them. The pages are crawlable, the site structure is logical. And performing a Landing Page URL search (showing only Google/Organic source/medium) on Google Analytics I can see that hundreds of those mobile URLs are being landed on. A search on mobile for a longtail keyword from a (randomly selected) page shows a result in the SERPs for the mobile page that judging by GSC has not been indexed. Could this be because we have recently added rel=alternate tags on our desktop pages (and of course corresponding canonical ones on mobile). Would Google then 'not index' rel=alternate page versions? Thanks for any input on this one. PmHmG
Technical SEO | | AlisonMills0 -
Mass HTTP to HTTPs move
Hi, As as part of an on-site SEO optimisation process, we've identified moving over from http to https - this is also in part to ensure our on-site forms are secure. In our industry our website has a high traffic volume (top 2 in the industry), we are concerned what impact the 301-redirecting from http to https would have on our organic traffic, both in terms of how Google would react to this mass-301 redirect plus the loss of 'search value' of inbound links. Privacy issues aside, would the minor quality-signal improvement be worth the move? Anyone have experience with such a move - was the outcome positive? Many thanks, Jason
Technical SEO | | Clickmetrics0 -
Is there a limit to how many URLs you can put in a robots.txt file?
We have a site that has way too many urls caused by our crawlable faceted navigation. We are trying to purge 90% of our urls from the indexes. We put no index tags on the url combinations that we do no want indexed anymore, but it is taking google way too long to find the no index tags. Meanwhile we are getting hit with excessive url warnings and have been it by Panda. Would it help speed the process of purging urls if we added the urls to the robots.txt file? Could this cause any issues for us? Could it have the opposite effect and block the crawler from finding the urls, but not purge them from the index? The list could be in excess of 100MM urls.
Technical SEO | | kcb81780 -
Google displaying "Items 1-9" before the description in the Search Results
We see our pages coming up in Google with the category page/product numbers in front of our descriptions. For example: Items 1 - 24 of 86 (and than the descriptions follows). Our website is magento based. Is there a fix for this that anyone knows of? Is there method of stopping Google from adding this on to the front of our Meta Description?
Technical SEO | | DutchG0 -
Change of address form in Webmaster Tools
I have changed my domain from .co.uk to .com and tried to submit a change of address form in Google Webmaster Tools. However it seems because I redirect my home page onto www.domain.com/index.php I cannot submit the form as it is not a domain name. Is there a way round this? It is not currently an option to move away from www.domain.com/index.php Thanks in advance
Technical SEO | | TheHutGroup0 -
How often should I upload a new sitemap in google webmasters?
So I have a real estate website that is regularly changing listings, photos, data. Every time a new listing is added it creates a page for that listing. My question is how frequently should I be recreating a new xml sitemap and uploading it to google webmasters? Thanks in advance.
Technical SEO | | jackaveli0 -
Question about Robot.txt
I just started my own e-commerce website and I hosted it to one of the popular e-commerce platform Pinnacle Cart. It has a lot of functions like, page sorting, mobile website, etc. After adjusting the URL parameters in Google webmaster last 3 weeks ago, I still get the same duplicate errors on meta titles and descriptions based from Google Crawl and SEOMOZ crawl. I am not sure if I made a mistake of choosing pinnacle cart because it is not that flexible in terms of editing the core website pages. There is now way to adjust the canonical, to insert robot.txt on every pages etc. however it has a function to submit just one page of robot.txt. and edit the .htcaccess. The website pages is in PHP format. For example this URL: www.mycompany.com has a duplicate title and description with www.mycompany.com/site-map.html (there is no way of editing the title and description of my sitemap) Another error is www.mycompany.com has a duplicate title and description with http://www.mycompany.com/brands?url=brands Is it possible to exclude those website with "url=" and my "sitemap.html" in the robot.txt? or the URL parameters from Google is enough and it just takes a lot of time. Can somebody help me on the format of Robot.txt. Please? thanks
Technical SEO | | paumer800 -
Robots.txt
Hi there, My question relates to the robots.txt file. This statement: /*/trackback Would this block domain.com/trackback and domain.com/fred/trackback ? Peter
Technical SEO | | PeterM220