Google webmaster tools says access denied for 77 urls
-
Hi i am looking in google webmaster tools and i have seen a major problem which i hope people can help me sort out.
The problem is, i am being told that 77 urls are being denied access. The message when i look for more information says the below
Googlebot couldn't crawl your URL because your server either requires login to access the page, or is blocking Googlebot from accessing your site.
the responce code is 403
here is a couple of examples
http://www.in2town.co.uk/Entertainment-Magazine
http://www.in2town.co.uk/Weight-Loss-Hypnotherapy-helped-woman-lose-3-stone
i think the problem could be that i have sent them to another url in my httaccess file using the 403 re-direct but why would it bring up that google bot could not crawl them
any help would be great
-
Yup, deleted.
-
I have now deleted the old version can you check on this and make sure you can no longer see it.
-
You have a fairly complex .htaccess file (hint: I looked up your OLD .htaccess file - you should delete old htaccess files or something so people can't access them via a web browser), so I'm guessing the problem will be within your .htaccess file.
If possible, put a plain and simple .htaccess file on, test it with Google Webmaster Tools and see if the error still persists.
hi thanks for that. i will delete the old one now
-
In Webmaster Tools, you can "fetch as google bot" meaning you can enter one of those 77 URLs, and see what the Google "bot" sees when going to that URL.
You can also use:
http://www.dnsqueries.com/en/googlebot_simulator.php
For the URL: http://www.in2town.co.uk/Entertainment-Magazine
the Google Bot Simulator says:
HTTP CODE = HTTP/1.1 301 Moved Permanently
Location = http://www.in2town.co.uk/Showbiz-Gossip
and for: http://www.in2town.co.uk/Weight-Loss-Hypnotherapy-helped-woman-lose-3-stone
HTTP CODE = HTTP/1.1 301 Moved Permanently
Location = http://www.in2town.co.uk/Weight-Loss-Hypnotherapy
Interestingly, both the NEW URLs work fine although http://www.in2town.co.uk/Weight-Loss-Hypnotherapy doesn't look too good (at least in my web browser) but that's another issue.
You have a fairly complex .htaccess file (hint: I looked up your OLD .htaccess file - you should delete old htaccess files or something so people can't access them via a web browser), so I'm guessing the problem will be within your .htaccess file.
If possible, put a plain and simple .htaccess file on, test it with Google Webmaster Tools and see if the error still persists.
Adam
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Website URL, Robots.txt and Google Search Console (www. vs non www.)
Hi MOZ Community,
Technical SEO | | Badiuzz
I would like to request your kind assistance on domain URLs - www. VS non www. Recently, my team have moved to a new website where a 301 Redirection has been done. Original URL : https://www.example.com.my/ (with www.) New URL : https://example.com.my/ (without www.) Our current robots.txt sitemap : https://www.example.com.my/sitemap.xml (with www.)
Our Google Search Console property : https://www.example.com.my/ (with www.) Question:
1. How/Should I standardize these so that Google crawler can effectively crawl my website?
2. Do I have to change back my website URLs to (with www.) or I just need to update my robots.txt?
3. How can I update my Google Search Console property to reflect accordingly (without www.), because I cannot see the options in the dashboard.
4. Is there any to dos such as Canonicalization needed, or should I wait for Google to automatically detect and change it, especially in GSC property? Really appreciate your kind assistance. Thank you,
Badiuzz0 -
Google Webmaster Tools is saying "Sitemap contains urls which are blocked by robots.txt" after Https move...
Hi Everyone, I really don't see anything wrong with our robots.txt file after our https move that just happened, but Google says all URLs are blocked. The only change I know we need to make is changing the sitemap url to https. Anything you all see wrong with this robots.txt file? robots.txt This file is to prevent the crawling and indexing of certain parts of your site by web crawlers and spiders run by sites like Yahoo! and Google. By telling these "robots" where not to go on your site, you save bandwidth and server resources. This file will be ignored unless it is at the root of your host: Used: http://example.com/robots.txt Ignored: http://example.com/site/robots.txt For more information about the robots.txt standard, see: http://www.robotstxt.org/wc/robots.html For syntax checking, see: http://www.sxw.org.uk/computing/robots/check.html Website Sitemap Sitemap: http://www.bestpricenutrition.com/sitemap.xml Crawlers Setup User-agent: * Allowable Index Allow: /*?p=
Technical SEO | | vetofunk
Allow: /index.php/blog/
Allow: /catalog/seo_sitemap/category/ Directories Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /includes/
Disallow: /lib/
Disallow: /magento/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /stats/
Disallow: /var/ Paths (clean URLs) Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/
Disallow: /aitmanufacturers/index/view/
Disallow: /blog/tag/
Disallow: /advancedreviews/abuse/reportajax/
Disallow: /advancedreviews/ajaxproduct/
Disallow: /advancedreviews/proscons/checkbyproscons/
Disallow: /catalog/product/gallery/
Disallow: /productquestions/index/ajaxform/ Files Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt Paths (no clean URLs) Disallow: /.php$
Disallow: /?SID=
disallow: /?cat=
disallow: /?price=
disallow: /?flavor=
disallow: /?dir=
disallow: /?mode=
disallow: /?list=
disallow: /?limit=5
disallow: /?limit=10
disallow: /?limit=15
disallow: /?limit=20
disallow: /*?limit=250 -
If a URL canonically points to another link, is that URL indexed?
Hi, I have two URL both talking about keyword phrase 'counting aggregated cells' The first URL has canonical link pointing to the second URL, but if one searches for 'counting aggregated cells' both URLs are shown in the results. The first URL is the pdf, and i need only second URL (the landing page) to be shown in the search results. The canonical links should tell Google which URL to index, i don't understand why both URLs are present in search results? Is 'noindex' for the first URL only solution? I am using Yoast SEO for my website. Thank you for the answers.
Technical SEO | | Chemometec0 -
Google Search Results Display URL
Our urls show as www.domain.com/getproduct.aspx?productid=48376 (url #1) in Google search results. When you click on the link and go to the site the URL is www.domain.com/product-name.aspx (url #2) I checked in Google Webmaster Tools (Fetch as Google) and there is a 302 redirect from url #1 to url #2. It also shows a Set-Cookie value, ASP.NET_SessionID= If we make it a 301 redirect instead, will the url displayed in Google search results be the url #2? We need to get rid of the Set-Cookie for crawlers correct?
Technical SEO | | Guy_Huyett0 -
Second URL
Hi We have a .com and a .co.uk Main website is .co.uk, we also have a landing page for the .com If we redirect the .com to the .co.uk, will it create duplicate content ... May seem like a silly question, but want to be sure that that the visitors cant access our website at both urls, as that would be duplicate content Thanks in advance John
Technical SEO | | Johnny4B0 -
Google Analytics
I usually have Google analytics "real-time" running on one of my monitors, occasionally I glance at the screen to see that are TOP KEYWORDS people are using, lately there have been a lot of long-tail keywords. If I try to copy and paste the queries into google, i can never seem to find us organically for the long-tail searches? Is the real-time feature accurate? Thank you!
Technical SEO | | TP_Marketing0 -
Recent Webmaster Tools Glitch Impacting Site Quality?
The ramifications of this would not be specific to myself but to anyone with this type of content on their pages... Maybe someone can chime in here, but I'm not sure how much if at all site errors (for example 404 errors) as reported by Google Webmaster Tools are seen as a factor in site quality, which would impact SEO rankings. Any insight on that alone would be appreciated. I've noticed some fairly new weird stuff going on in the WMT 404 error reports. It seems as though their engine is finding objects within the source code of the page that are NOT links but look a URL, then trying to crawl them and reporting them as broken. I've seen a couple different of cases in my environment that seem to trigger this issue. The easiest one to explain are Google Analytic virtual pageview Javascript calls where for example you might send a virtual pageview back to GA for clicks on outbound links. So in the source code of your page you would have something like: onclick="<a class="attribute-value">_gaq.push(['_trackPageview', '/outboundclick/www.othersite.com']);</a> Although this is obviously not a crawl-able link, sure enough Webmaster Tools now would be reporting the following broken page with a 404: www.mysite.com/outboundclick/www.otherwite.com I've seen other such cases of thing that look like URLs but not actual links being pulled out of the page source and reported as broken links. Has anyone else noticed this? Do 404 instances (in this case false ones) reported by Webmaster Tools impact site quality rankings and SEO? Interesting issue here, I'm looking forward to hear some people's thoughts on this. Chris
Technical SEO | | cbubinas0 -
Google plus
With a single Google search, you can see regular search results, along with all sorts of results that are tailored to you -- pages shared with you by your friends, Google+ posts from people you know. **Does pages shared by friends ** Does this mean pages shared by friends on Google plus ?
Technical SEO | | seoug_20050