Google Webmaster Tools is saying "Sitemap contains urls which are blocked by robots.txt" after Https move...
-
Hi Everyone,
I really don't see anything wrong with our robots.txt file after our https move that just happened, but Google says all URLs are blocked. The only change I know we need to make is changing the sitemap url to https. Anything you all see wrong with this robots.txt file?
robots.txt
This file is to prevent the crawling and indexing of certain parts
of your site by web crawlers and spiders run by sites like Yahoo!
and Google. By telling these "robots" where not to go on your site,
you save bandwidth and server resources.
This file will be ignored unless it is at the root of your host:
Used: http://example.com/robots.txt
Ignored: http://example.com/site/robots.txt
For more information about the robots.txt standard, see:
http://www.robotstxt.org/wc/robots.html
For syntax checking, see:
http://www.sxw.org.uk/computing/robots/check.html
Website Sitemap
Sitemap: http://www.bestpricenutrition.com/sitemap.xml
Crawlers Setup
User-agent: *
Allowable Index
Allow: /*?p=
Allow: /index.php/blog/
Allow: /catalog/seo_sitemap/category/Directories
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /includes/
Disallow: /lib/
Disallow: /magento/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /stats/
Disallow: /var/Paths (clean URLs)
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/
Disallow: /aitmanufacturers/index/view/
Disallow: /blog/tag/
Disallow: /advancedreviews/abuse/reportajax/
Disallow: /advancedreviews/ajaxproduct/
Disallow: /advancedreviews/proscons/checkbyproscons/
Disallow: /catalog/product/gallery/
Disallow: /productquestions/index/ajaxform/Files
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txtPaths (no clean URLs)
Disallow: /.php$
Disallow: /?SID=
disallow: /?cat=
disallow: /?price=
disallow: /?flavor=
disallow: /?dir=
disallow: /?mode=
disallow: /?list=
disallow: /?limit=5
disallow: /?limit=10
disallow: /?limit=15
disallow: /?limit=20
disallow: /*?limit=25 -
Thanks again for the response. Looks like it just took a little more time for Google to resolve the issue. No more errors. Didn't do anything but resubmit Sitemap and Robots.txt.
Thanks for the tips as well. I am going to post one more question in another thread.
-
Jeff,
I was only able to find only ONE URL in the sitemap that is blocked by the robots.txt that you've posted in this question.
Check the image attached.
The URL is: https://www.bestpricenutrition.com/catalog/product/view/id/15650.htmlWhat did I do? A manual search of all the disallowed terms in the sitemap.
Also, you might want to take a comprehensive read at this article about robots.txt. It helped me to find that mistake.
The complete guide to Robots.txt - Portent.comBest Luck.
GR. -
Thanks for the quick response.
-
Yes...Google Webmaster Tools is giving examples...and they are basically all the product pages.
-
Did the Add Site under Google Webmaster Tools yes...this is from that new 'account'.
-
Yes...we are fixing that.
You see anything in that robots.text above that would indicate we are blocking https product pages?
-
-
Hello Jeff,
Just some routine questions to establish a base line:
- Have you checked that the sitemap doesnt include any of the disallowed URLs?
- You said that there was a movement to HTTPS, have you created a new account for the new domain?
- Im seing that the robots.txt has the old URL for the sitemap, without the HTTPS correction.
Let me know.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can I have an http AND a https site on Google Webmaster tools
My website is https but the default property that was configured on Google WMT was http and wasn't showing me any information because of that. I added an https property for that, but my question is: do I need to delete the original HTTP or can I leave both websites?
Technical SEO | | Onboard.com0 -
"INDEX,FOLLOW" then later in the code "NOINDEX,NOFOLLOW" which does google follow?
background info: we have an established closed E-commerce system which the company has been using for years. I have only just started and reviewing the system, I don't have direct access to the code, but can request changes, but it could take months before the changes are in effect (or done at all), and we won't can't change to a new E-commerce system for the short to mid term. While reviewing the site (with help of seomoz crawl diagnostics) I noticed that some of the existing "landing pages" have in the code: <meta name="<a class="attribute-value">robots</a>" content="<a class="attribute-value">INDEX,FOLLOW</a>" /> then a few lines later <meta name="<a class="attribute-value">robots</a>" content="<a class="attribute-value">NOINDEX,NOFOLLOW</a>" /> Which the crawl diagnostics flagged up, but in the webmaster tools says
Technical SEO | | PaddyDisplays
"We didn't detect any issues with non-indexable content on your site." so the question is which instructions does google follow? the first or 2nd? note: clearly this is need fixed, but I have a big list of changes for the system so I need to know how important this is tthanks0 -
Rogue url foung in webmaster toos
Buon Giorno from 2 degrees C thick fog wetherby UK 🙂 On this site www.davidclick.com I ran a crawl test and came across a url that doesnt exist in my site, the findings are illustrated here: http://i216.photobucket.com/albums/cc53/zymurgy_bucket/rogue-urlcopy_zps6c58ee46.jpg The plot thickens... the source of the referring traffic to a page that doesnt exist can be seen here: http://i216.photobucket.com/albums/cc53/zymurgy_bucket/rogue-link-source_zpsc70a34fc.jpg My intial thoughts rae to disavow via this tool:
Technical SEO | | Nightwing
https://www.google.com/webmasters/tools/disavow-links-main So my question is please: Is this sinister or should I just sit back drink a cup of horlicks and return to a Zen like status of inner peace? Any insights welcome 😉 Grazie tanto, David0 -
"Standout" tag and "Original content" tags - what's the latest?
In November 2010 Google introduced the "standout tag" http://support.google.com/news/publisher/bin/answer.py?hl=en&answer=191283 I can't find any articles/blog posts/etc in google after that date, but its use was suggested in a google forum today to help with original content issues. Has anyone used them? Does anyone know what's the latest with them? Are they worth trying for SEO? Is there a possible SEO penalty for using them? Thanks, Jean
Technical SEO | | JeanYates0 -
Trying to get on Google page one for keyword "criminal defense attorney san diego". What can I do?
I'm trying to help a friend who is an attorney get on page one for the keyword "criminal defense attorney san diego." So far I've changed his title and description tags since they weren't optimized before. (SERP shows old title tag, however I submitted a XML sitemap through Webmaster tools to get the new title tags updated.) He also had a few duplicate pages, but I took care of that with some 301 redirects. I also added a h1 tag, alt image tag, and more content. I also spent a few hours building links for him. He currently has a page authority of 52 and domain authority of 44 with a decent amount of links pointing to his site. I'm wondering why he's stuck on page 4, when his competitors that have less impressive numbers seem to show up on page 1. I did look at his link profile using OSE and I'm worried that his old SEO guy got him spam links. His website is www.nasserilegal.com, however the page I was focusing on was www.nasserilegal.com/criminal.html Any advice would be great.
Technical SEO | | micasalucasa0 -
Should I change these "Overly dynamic URLs" ?
Hello, My client have pages that look like this: www.domain.com/blog/index.aspx?blogmonth=1&blogday=10&blogyear=2012&blogid=256 Question 1: SEOMoz say they are overly dynamic. Is it really in this case as the numbers indicate the year, month and day and do not change? Question 2: Should we change the URLs to proper SEO friendly URLs such as www.domain.com/keywords1-keyword2? The pages are already ranking well and we worry that changing the URL may damage the ranking? Do we risk the page to go down in ranking by creating SEO friendly URLs? (and using a 301 to redirect from the old URL)
Technical SEO | | DavidSpivac0 -
Suggested crawl rate in google webmaster tools?
hey moz peeps, got a general question: what is the suggested custom crawl rate in google webmaster tools? or is it better to "Let Google determine my crawl rate (recommended)" If you guys have any good suggestions on this and site why that would be very helpful, thanks guys!
Technical SEO | | david3050 -
Google Webmaster Creation
When creating Google Webmaster Account is it advised to create 2 accounts for the 1 domain. one for non www and one for www?
Technical SEO | | daracreative0