GWT False Reporting or GoogleBot has weird crawling ability?
-
Hi I hope someone can help me.
I have launched a new website and trying hard to make everything perfect. I have been using Google Webmaster Tools (GWT) to ensure everything is as it should be but the crawl errors being reported do not match my site. I mark them as fixed and then check again the next day and it reports the same or similar errors again the next day.
Example:
http://www.mydomain.com/category/article/ (this would be a correct structure for the site).
GWT reports:
http://www.mydomain.com/category/article/category/article/ 404 (It does not exist, never has and never will) I have been to the pages listed to be linking to this page and it does not have the links in this manner. I have checked the page source code and all links from the given pages are correct structure and it is impossible to replicate this type of crawl.
This happens accross most of the site, I have a few hundred pages all ending in a trailing slash and most pages of the site are reported in this manner making it look like I have close to 1000, 404 errors when I am not able to replicate this crawl using many different methods.
The site is using a htacess file with redirects and a rewrite condition.
Rewrite Condition:
Need to redirect when no trailing slash
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !.(html|shtml)$
RewriteCond %{REQUEST_URI} !(.)/$
RewriteRule ^(.)$ /$1/ [L,R=301]The above condition forces the trailing slash on folders.
Then we are using redirects in this manner:
Redirect 301 /article.html http://www.domain.com/article/
In addition to the above we had a development site whilst I was building the new site which was http://dev.slimandsave.co.uk now this had been spidered without my knowledge until it was too late. So when I put the site live I left the development domain in place (http://dev.domain.com) and redirected it like so:
<ifmodule mod_rewrite.c="">RewriteEngine on
RewriteRule ^ - [E=protossl]
RewriteCond %{HTTPS} on
RewriteRule ^ - [E=protossl:s]RewriteRule ^ http%{ENV:protossl}://www.domain.com%{REQUEST_URI} [L,R=301]</ifmodule>
Is there anything that I have done that would cause this type of redirect 'loop' ?
Any help greatly appreciated.\
-
Yeah - do this!
-
Anyone any thoughts on this?
-
Sorry I also should add that the url structure that google generates is like this:
http://www.domain.com/category/article/
http://www.domain.com/category/article/same-category/differentarticle/
http://www.domain.com/category/article/same-category/another-different-article/
http://www.domain.com/category/article/another-different-category/differentarticle/
etc, it is like it gets to a category article and then moves sideways and somehow adds the move onto the current url without keeping hold of the suffix of the URL
-
Doesn't sound like GWT is false reporting. May want to check your trailing slash URL rewrite. It seems like there is an issue there as what you are describing sounds like the URLs are being written incorrectly and causing the incorrect URLs to be generated and show up in GWT.
Your 301 looks ok and if the dev site was spidered and indexed, you should just add the site to GWT and then use the URL removal tool to remove the site from the index, then remove the site and redirect.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Submitted URL has crawl issue - Submitted URL seems to be a Soft 404 - but all looks fine
Google Search Console is showing some pages up as "Submitted URL has crawl issue" but they look fine to me. I have set them as fixed but after a month they were finally re-crawled and google states the issue persists. Examples are: https://www.rscpp.co.uk/counselling/175809/psychology-alcester-lanes-end.html
Technical SEO | | TommyNewmanCEO
https://www.rscpp.co.uk/browse/location-index/889/index-of-therapy-in-hanger-lane.html
https://www.rscpp.co.uk/counselling/274646/psychology-waltham-forest-sexual-problems.html There's also some "Submitted URL seems to be a Soft 404": https://www.rscpp.co.uk/counselling/112585/counselling-moseley-depression.html I also have more which are "pending", but again I couldn't see a problem with them in the first place. I'm at a bit of a loss as to what to do next. Any advice? Thanks in advance.0 -
What is the best way to correct GWT telling me I have mobile usability errors in Image directories
In GWT, I wish to remove / resolve the following errors Mobile Usability > Viewport not configured Mobile Usability > Small font size Mobile Usability > Touch elements too close The domain www.sandpiperbeacon.com is responsive, and passes the mobile usability test. A new issue I noticed, is that GWT is reporting 200+ errors just for image index pages such as http://www.sandpiperbeacon.com/images/special-events/ for example. Website users cannot access these pages (without editing the URL manually) so I don't consider these usability issues. BUT, I hate to see 200+ errors, especially when Google itself says "Websites with mobile usability issues will be demoted in mobile search results." I could set the image directories to dissalow in Robots.txt, but I do not want the images to stop appearing in image search, so this seems like a flawed solution. I cannot be the only person experiencing this, but I have been unable to find any suggestions online.
Technical SEO | | RobertoGusto0 -
My website pages are not crawled, what to do?
Hi all. I have made some changes on the website so i like to crawled them by the search engines Google especially. I have made these changes around 2 weeks ago. I have submitted my website on good bookmarking websites. Also i used a tool available in Google webmasters "Fetch as Google", Resubmitted a sitemap.xml. Still my pages are not crawled your opinion please. Thanks
Technical SEO | | lucidsoftech0 -
My 404 page shows in the report as an error.
How can i make my actual 404 page not show up as a 404 error in the report?
Technical SEO | | LindseyNewman0 -
SEOMoz Crawl Diagnostic indicates duplicate page content for home page?
My first SEOMoz Crawl Diagnostic report for my website indicates duplicate page content for my home page. It lists the home page URL Page Title and URL twice. How do I go about diagnosing this? Is the problem related to the following code that is in my .htaccess file? (The purpose of the code was to redirect any non "www" backlink referrals to the "www" version of the domain.) RewriteCond %{HTTP_HOST} ^whatever.com [NC]
Technical SEO | | Linesides
RewriteRule ^(.*)$ http://www.whatever.com/$1 [L,R=301] Should I get rid of the "http" reference in the second line? Related to this is a notice in the "Crawl Notices Found" -- "301 Permanent redirect" which shows my home page title as "http://whatever.com" and shows the redirect address as http://http://www.whatever.com/ I'm guessing this problem is again related to the redirect code I'm using. Also... The report indicates duplicate content for those links that have different parameters added to the URL i.e. http://www.whatever.com?marker=Blah Blah&markerzoom=13 If I set up a canonical reference for the page, will this fix this? Thank you.0 -
GWT indexing wrong pages
Hi SEOMoz I have a listings site. In a part of the page, I have 3 comboboxes, for state, county and city. On the change event, the javascript redirects the user to the page of the selected location. Parameters are passed via GET, and my URL is rewrited via htaccess. Example: http:///www.site.com/state/county/city.html The problem is, there is A LOT(more than 10k) of 404 errors. It is happenning because the crawler is trying to index the pages, sometimes WITHOUT a parameter, like http:///www.site.com/state//city.html I don't know how to stop it, and I don't wanna remove it, once it's very clicked by the users. What should I do?
Technical SEO | | elias990 -
Tracking a Crawl error
Hi All, If you find a crawl error on your page. How do you find it? The error only says the URL that is wrong but this is not the location. Can i drill down and find out more information? Thank you!
Technical SEO | | wedmonds0 -
Weird Indexing Question
Google has indexed mysite.com/ and mysitem.com/\/ (no idea why). If you click on the /%5C? URL it takes you to mysite.com//. I have a rel=canonical tag on it that goes to mysite.com/ but I was wondering if there was another way to correct the issue.
Technical SEO | | BryanPhelps-BigLeapWeb0