404s in GWT - Not sure how they are being found
-
We have been getting multiple 404 errors in GWT that look like this: http://www.example.com/UpdateCart.
The problem is that this is not a URL that is part of our structure, it is only a piece. The actual URL has a query string on the end, so if you take the query string off, the page does not work.
I can't figure out how Google is finding these pages. Could it be removing the query string?
Thanks.
-
Kelli - the first thing I thought was what garfield_disliker asks: have you set up Google Webmaster Tools to ignore these parameters that are important for the cart page to load?
That said, Google Webmaster Tools is run by a team that's separate from the primary search team, so it's possible that GWT is flagging an issue that isn't an actual issue for Google. Run a search in Google for "site:yourdomain.com/UpdateCart" and see what URLs Google has indexed. If they have that 404ing URL, that's not good. If they have correct URLs, it's possible that this is a Google Webmaster Tools thing.
-
Hi,
Are you using the /updateCart url in goal tracking or pushing events to analytics using this url? I have seen GWT pick up 404's from us pushing virtual (non existing) page views to analytics for goal tracking etc. Just a thought.
-
First, you can never be sure there are no external links. Open Site Explorer's index (and any other link analysis tool) is not a full picture, and Google doesn't always provide all the inbound links to your site. The junkier the scraper, the less likely you will see the link.
Secondly, could you provide a concrete example of this?
Where is the page (with parameters) linked from/to on your site? How is your site appending those parameters to the URL? Does it send users through a redirect to get to that URL? It might be useful to run your own crawl (w/ Screaming Frog or any other crawling software) of the site and take a look at all the internal links and the response codes.
Also have you set up Google WMT to ignore any parameters?
It's certainly possible that Google's crawlers are stripping parameters on their own.
-
We do not dynamically inject canonicals into the page. They are also not old URLs because they have never been valid URLs.
They are all linked from internal pages, but when I look at those pages, the URL with the query string is the only URL that is being pointed to, not the partial URL. There are no external links.
Thanks,
Kelli -
In WMT click on the URL that is 404'd and then select "linked to from". It will show you where Google is picking up the 404 error.
Are these 404 pages being linked to from an external site? Sometimes the 404s that appear in WMT are from links pointing to your domain from an external site, often one that has scraped your site.
-
Does your website dynamically inject canonical links into the page? Some content management systems will automatically generate canonicals that strip parameters from the URL. If that's the case then that might be why you wouldn't see it in your ordinary site structure.
It's also possible that it's an old URL that Google indexed which is no longer on your site or something that is linked externally somewhere, so the crawlers are finding it somewhere off site.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is there a way to set up 301 auto redirects from 404s
some of our pages under a specific website section gets deleted from another data source and we want to resolve the problem of 404s can we set up automated 301 redirects to the main page as soon as one of these pages are deleted
Technical SEO | | lina_digital2 -
Fetching & Rendering a non ranking page in GWT to look for issues
Hi I have a clients nicely optimised webpage not ranking for its target keyword so just did a fetch & render in GWT to look for probs and could only do a partial fetch with the below robots.text related messages: Googlebot couldn't get all resources for this page Some boiler plate js plugins not found & some js comments reply blocked by robots (file below): User-agent: *
Technical SEO | | Dan-Lawrence
Disallow: /wp-admin/
Disallow: /wp-includes/ As far as i understand it the above is how it should be but just posting here to ask if anyone can confirm whether this could be causing any prrobs or not so i can rule it out or not. Pages targeting other more competitive keywords are ranking well and are almost identically optimised so cant think why this one is not ranking. Does fetch and render get Google to re-crawl the page ? so if i do this then press submit to index should know within a few days if still problem or not ? All Best Dan0 -
Why is there a difference in the number of indexed pages shown by GWT and site: search?
Hi Moz Fans, I have noticed that there is a huge difference between the number of indexed pages of my site shown via site: search and the one that shows Webmaster Tools. While searching for my site directly in the browser (site:), there are about 435,000 results coming up. According to GWT there are over 2.000.000 My question is: Why is there such a huge difference and which source is correct? We have launched the site about 3 months ago, there are over 5 million urls within the site and we get lots of organic traffic from the very beginning. Hope you can help! Thanks! Aleksandra
Technical SEO | | aleker0 -
404s effecting crawl rate?
We made a change to our site where we all of a sudden we are creating a large number of 404 pages. Is this effecting the crawl/indexing rate? Currently we've submitted 3.4 million pages, have over 834K indexed but have over and 330K pages not found. Since the large increase in 404s we've noticed a decrease in pages crawled per day. I found this Q & A in Webmasters (http://googlewebmastercentral.blogspot.com/2011/05/do-404s-hurt-my-site.html) but it seems like the 404s should not have an effect. Is this article out of date? What do you think fellow Moz-ers? Is this a problem?
Technical SEO | | JoshKimber0 -
GWT Change of Address Keeps Failing
Followed Google's instructions for using the Change of Address Tool in GWT to move rethinkisrael.org to www.fromthegrapevine.com. I'm getting this message, "We tried to reconfirm ownership of your old site (rethinkisrael.org) and failed. Make sure your verification token is present, and try again."Even though the site is verified, we undid the DNS change, and checked the meta verification tag. The tag is correct. And, since the site is ALREADY verified there was NO way to 'veryify' in GWT again. The message in GWT says "verification successful."We redid the DNS change, tried again to do the address change and get the same error message. Any ideas?
Technical SEO | | Aggie0 -
When choosing GWT preferred domain its asking for re-verification?
Trying to set a preferred domain in GWT, and the site is verified via Google Analytics and meta tag in the code, but still asks: Part of the process of setting a preferred domain is to verify that you own http://site.org/. Please verify http://site.org/. Tried looking for answer to no avail, am I missing anything?
Technical SEO | | vmialik0 -
Removing a staging area/dev area thats been indexed via GWT (since wasnt hidden) from the index
Hi, If you set up a brand new GWT account for a subdomain, where the dev area is located (separate from the main GWT account for the main live site) and remove all pages via the remove tool (by leaving the page field blank) will this definately not risk hurting/removing the main site (since the new subdomain specific gwt account doesn't apply to the main site in any way) ?? I have a new client who's dev area has been indexed, dev team has now prevented crawling of this subdomain but the 'the stable door was shut after the horse had already bolted' and the subdomains pages are on G's index so we need to remove the entire subdomain development area asap. So we are going to do this via the remove tool in a subdomain specific new gwt account, but I just want to triple check this wont accidentally get main site removed too ?? Cheers Dan
Technical SEO | | Dan-Lawrence0 -
Not sure which URL to use for 301 redirect
A client has new website design completed by another developer, was launched in April of this year. No 301 redirect was set up so duplicate content is an issue. Client has had a website with same domain name for about 10 years, but has not had any SEO work completed before or since his new site design. For non-www there are 6 referring links - 1 considered to have authority, for www there are also 6 but 3 considered to have authority. More links seem to coming from www than non-www. But for one of the clients keywords they are ranked #1 for their area and that links to their non-www address. And even though no redirects set up by developer, non-www has had far more visits according to Google Analytics. So many basics that still need to be done for site: no meta-descriptions on any page, H1 and page titles could use keywords, call to action moved above fold, etc. Considering this is a new site, and new SEO work and many more inbound links needed, does it matter which address I redirect to? _Cindy Barnard
Technical SEO | | CeCeBar0