404s in GWT - Not sure how they are being found
-
We have been getting multiple 404 errors in GWT that look like this: http://www.example.com/UpdateCart.
The problem is that this is not a URL that is part of our structure, it is only a piece. The actual URL has a query string on the end, so if you take the query string off, the page does not work.
I can't figure out how Google is finding these pages. Could it be removing the query string?
Thanks.
-
Kelli - the first thing I thought was what garfield_disliker asks: have you set up Google Webmaster Tools to ignore these parameters that are important for the cart page to load?
That said, Google Webmaster Tools is run by a team that's separate from the primary search team, so it's possible that GWT is flagging an issue that isn't an actual issue for Google. Run a search in Google for "site:yourdomain.com/UpdateCart" and see what URLs Google has indexed. If they have that 404ing URL, that's not good. If they have correct URLs, it's possible that this is a Google Webmaster Tools thing.
-
Hi,
Are you using the /updateCart url in goal tracking or pushing events to analytics using this url? I have seen GWT pick up 404's from us pushing virtual (non existing) page views to analytics for goal tracking etc. Just a thought.
-
First, you can never be sure there are no external links. Open Site Explorer's index (and any other link analysis tool) is not a full picture, and Google doesn't always provide all the inbound links to your site. The junkier the scraper, the less likely you will see the link.
Secondly, could you provide a concrete example of this?
Where is the page (with parameters) linked from/to on your site? How is your site appending those parameters to the URL? Does it send users through a redirect to get to that URL? It might be useful to run your own crawl (w/ Screaming Frog or any other crawling software) of the site and take a look at all the internal links and the response codes.
Also have you set up Google WMT to ignore any parameters?
It's certainly possible that Google's crawlers are stripping parameters on their own.
-
We do not dynamically inject canonicals into the page. They are also not old URLs because they have never been valid URLs.
They are all linked from internal pages, but when I look at those pages, the URL with the query string is the only URL that is being pointed to, not the partial URL. There are no external links.
Thanks,
Kelli -
In WMT click on the URL that is 404'd and then select "linked to from". It will show you where Google is picking up the 404 error.
Are these 404 pages being linked to from an external site? Sometimes the 404s that appear in WMT are from links pointing to your domain from an external site, often one that has scraped your site.
-
Does your website dynamically inject canonical links into the page? Some content management systems will automatically generate canonicals that strip parameters from the URL. If that's the case then that might be why you wouldn't see it in your ordinary site structure.
It's also possible that it's an old URL that Google indexed which is no longer on your site or something that is linked externally somewhere, so the crawlers are finding it somewhere off site.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Gradual Drop in GWT Indexed Pages for large website
Hey all, I am working on SEO for a massive sports website. The information provided will be limited but I will give you as much context as possible. I just started digging into it and have found several on-page SEO issues of which I will fix when I get to the meat of it but this seems like something else could be going on. I have attached an image below. It doesn't seem like it's a GWT bug as reported at one point either as it's been gradually dropping over the past year. Also, there is about a 20% drop in traffic in Google Analytics over this time as well. This website has hundreds of thousands of pages of player profiles, sports team information and more all marked up with JSON-LD. Some of the on-page stuff that needs to be fixed are the h1 and h2, title tags and meta description. Also, some of the descriptions are pulled from wikipedia and linked to a "view more" area. Anchor text has "sign up" language as well. Not looking for a magic bullet but to be pointed in the right direction. Where should I start checking off to ensure I cover my bases besides the on page stuff above? There aren't any serious errors and I don't see any manual penalties. There are 4,300 404's but I have seen plenty of sites with that many 404's all of which still got traffic. It doesn't look like a sitemap was submitted to GWT and when I try submitting sitemap.xml, I get a 504 error (network unreachable). Thanks for reading. I am just getting started on this project but would like to spend as much time sharpening the axe before getting to work. lJWk8Rh
Technical SEO | | ArashG0 -
Manual Action found in WMTs, no email, no message in WMTs
Someone I know said that they were looking though there WMTs and under Manual Actions they found they had a partial penalty. There is no date against it and they never got an email and there are no messages WMTs for it. I haven't personally dealt with a Manual penalty before, but I would have expected there to be a message in WMTs for it ( an email might have been missed because of a spam filter etc). Could it be a very old penalty?
Technical SEO | | PaddyDisplays0 -
Help Understanding GWT Message
Brief background: A few months ago, our firm exchanged blog posts with another law firm in Pennsylvania with followed links. Though we did exchange links, the posts weren't spammy. They wrote "A Floridian's Guide To A Car Accident In Pennyslvania" and we wrote one for Pennsylvanians in Florida. (The reason for this is that Personal Injury law varies drastically from state-to-state, and Florida has a ton of people who move back and forth). My question: His firm got a message from google saying our link to him violated googles' guidelines. I went and removed the link, BUT I didn't get any message saying his link to our site was a violation. Shouldn't we both have gotten messages? Perhaps, mine is "in the mail" so to speak, but I would think both would go out at the same time, so I'm wondering if there is another possible reason? Thanks, Ruben
Technical SEO | | KempRugeLawGroup0 -
403s vs 404s
Hey all, Recently launched a new site on S3, and old pages that I haven't been able to redirect yet are showing up as 403s instead of 404s. Is a 403 worse than a 404? They're both just basically dead-ends, right? (I have read the status code guides, yes.)
Technical SEO | | danny.wood1 -
When choosing GWT preferred domain its asking for re-verification?
Trying to set a preferred domain in GWT, and the site is verified via Google Analytics and meta tag in the code, but still asks: Part of the process of setting a preferred domain is to verify that you own http://site.org/. Please verify http://site.org/. Tried looking for answer to no avail, am I missing anything?
Technical SEO | | vmialik0 -
How can we fix duplicate title tags like these being reported in GWT?
Hi all, I posted this in the GWT Forum on Monday and still no answers so I will try here. Our URL is http://www.ccisolutions.com
Technical SEO | | danatanseo
We have over 200 pages on our site being flagged by GWT as having
duplicate title tags. The majority of them look similar to this: Title: <a>JBL EON MusicMix 16 | Mixer | CCI Solutions</a> GWT is reporting these URLs to have all the same title: /StoreFront/product/R-JBL-MUSICMIX.prod/StoreFront/product/R-JBL-MUSICMIX.prod?Origin=Category/StoreFront/product/R-JBL-MUSICMIX.prod?Origin=Footer/StoreFront/product/R-JBL-MUSICMIX.prod?Origin=Header/StoreFront/product/R-JBL-MUSICMIX.prod?origin=../StoreFront/product/R-JBL-MUSICMIX.prod?origin=GoogleBase These are all the same page. There was a time when we used these origin codes, but we stopped using them over a year ago. We also added canonical tags to every page to prevent us from having duplicate content issues. However, these origin codes are
still showing up in GWT. Is there anything we can do to fix this problem. Do we have a technical issue with our site code and the way Google is seeing our dynamic URLs? Any suggestions on how we can fix this problem? The same is true in our report for Meta descriptions. Thanks
you,
Dana Tan0 -
Dramatic Decrease in Google Organic Traffic Indicates a Penalty But None Found
So we've been having some difficulty with one of our websites since we split it in half and moved one section of content to a new domain with a new name, at the end of May. http://www.dialtosave.co.uk/mobile/ was moved to http://www.somobile.co.uk And in the following 6 weeks, the google organic traffic has fallen to miniscule levels, that seem to indicate a more serious issue than just low ranking. Initially when the site was moved, the 301s transferred the authority very quickly and the new website pages ranked well. Now, some of them simply won't rank at all unless you include the name of the website "somobile". Here is one of the current rankings that indicates an issue:
Technical SEO | | purpleindigo
"somobile" - 1
"somobile mobile phones" - not in top 50 These are some of the terms we used to rank in the top 10 on Google UK, and still do on Bing UK, but don't rank in the top 50 on Google UK now:
samsung galaxy ace
apple iphone 5 deals
samsung tocco icon Our webmaster central account says that only 30% of the pages in our sitemap are in the index. It seems like a penalty has been imposed, but our reconsideration request (just submitted because it seemed like a sensible next step) came back saying there were no manual actions taken. Can you see what it is that might be causing the problem for us? I would have thought it was the new domain (with less direct links and less brand credibility), or content issues, but I would have thought that would just reduce the ranking by a few pages rather than just hide the pages altogether.0 -
Which is more accurate? site: or GWT?
when viewing urls in google's index, is it more accurate to refer to site:www.domain.com or google webmaster tools (urls in web index)?
Technical SEO | | nicole.healthline0