404s in GWT - Not sure how they are being found
-
We have been getting multiple 404 errors in GWT that look like this: http://www.example.com/UpdateCart.
The problem is that this is not a URL that is part of our structure, it is only a piece. The actual URL has a query string on the end, so if you take the query string off, the page does not work.
I can't figure out how Google is finding these pages. Could it be removing the query string?
Thanks.
-
Kelli - the first thing I thought was what garfield_disliker asks: have you set up Google Webmaster Tools to ignore these parameters that are important for the cart page to load?
That said, Google Webmaster Tools is run by a team that's separate from the primary search team, so it's possible that GWT is flagging an issue that isn't an actual issue for Google. Run a search in Google for "site:yourdomain.com/UpdateCart" and see what URLs Google has indexed. If they have that 404ing URL, that's not good. If they have correct URLs, it's possible that this is a Google Webmaster Tools thing.
-
Hi,
Are you using the /updateCart url in goal tracking or pushing events to analytics using this url? I have seen GWT pick up 404's from us pushing virtual (non existing) page views to analytics for goal tracking etc. Just a thought.
-
First, you can never be sure there are no external links. Open Site Explorer's index (and any other link analysis tool) is not a full picture, and Google doesn't always provide all the inbound links to your site. The junkier the scraper, the less likely you will see the link.
Secondly, could you provide a concrete example of this?
Where is the page (with parameters) linked from/to on your site? How is your site appending those parameters to the URL? Does it send users through a redirect to get to that URL? It might be useful to run your own crawl (w/ Screaming Frog or any other crawling software) of the site and take a look at all the internal links and the response codes.
Also have you set up Google WMT to ignore any parameters?
It's certainly possible that Google's crawlers are stripping parameters on their own.
-
We do not dynamically inject canonicals into the page. They are also not old URLs because they have never been valid URLs.
They are all linked from internal pages, but when I look at those pages, the URL with the query string is the only URL that is being pointed to, not the partial URL. There are no external links.
Thanks,
Kelli -
In WMT click on the URL that is 404'd and then select "linked to from". It will show you where Google is picking up the 404 error.
Are these 404 pages being linked to from an external site? Sometimes the 404s that appear in WMT are from links pointing to your domain from an external site, often one that has scraped your site.
-
Does your website dynamically inject canonical links into the page? Some content management systems will automatically generate canonicals that strip parameters from the URL. If that's the case then that might be why you wouldn't see it in your ordinary site structure.
It's also possible that it's an old URL that Google indexed which is no longer on your site or something that is linked externally somewhere, so the crawlers are finding it somewhere off site.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Changed site to https now GWT and analytics - do I Know have to re-add it
Hi had the previous version (wordpress) site in GWT working just fine - now everything seems to have stopped. Do I have to treat this as an entirely new site and now add a new account for the https version? Many thanks,
Technical SEO | | AndreavanEugen0 -
GWT Soft 404 count is climbing. Important to fix?
In GWT I am seeing my mobile site's soft 404 count slowly rise from 5 two weeks ago to over 100 as of today. If I do nothing I expect it will continue to rise into the thousands. This is due to there being followed links on external sites to thousands of discontinued products we used to offer. The landing page for these links simply says the product is no longer available and gives links to related areas of our site. I know I can address this by returning a 404 for these pages, but doing so will cause these pages to be de-indexed. Since these pages still have utility in redirecting people to related, available products, I want these pages to stay in the index and so I don't want to return a 404. Another way of addressing this is to add more useful content to these pages so that Google no longer classifies them as soft 404. I have images and written content for these pages that I'm not showing right now, but I could show if necessary. But before investing any time in addressing these soft 404s, does anyone know the real consequences of not addressing them? Right now I'm getting 275k pages indexed and historically crawl budget has not been an issue on my site, nor have I seen any anomalous crawl activity since the climb in soft 404s began. Unchecked, the soft 404s could climb to 20,000ish. I'm wondering if I should start expecting effects on the crawl, and also if domain authority takes a hit when there are that many soft 404s being reported. Any information is appreciated.
Technical SEO | | merch_zzounds0 -
403s vs 404s
Hey all, Recently launched a new site on S3, and old pages that I haven't been able to redirect yet are showing up as 403s instead of 404s. Is a 403 worse than a 404? They're both just basically dead-ends, right? (I have read the status code guides, yes.)
Technical SEO | | danny.wood1 -
GWT Error for RSS Feed
Hello there! I have a new RSS feed that I submitted to GWT. The feed validates no problemo on http://validator.w3.org/feed/ and also when I test the feed in GWT it comes back aok, finds all the content with "No errors found". I recently got a issue with GWT not being able to read the rss feed, error on line 697 "We were unable to read your Sitemap. It may contain an entry we are unable to recognize. Please validate your Sitemap before resubmitting." I am assuming this is an intermittent issue, possibly we had a server issue on the site last night etc. I am checking with my developer this morning. Wanted to see if anyone else had this issue, if it resolved itself, etc. Thanks!
Technical SEO | | CleverPhD0 -
Massive Increase in 404 Errors in GWT
Last June, we transitioned our site to the Magento platform. When we did so, we naturally got an increase in 404 errors for URLs that were not redirected (for a variety of reasons: we hadn't carried the product for years, Google no longer got the same string when it did a "search" on the site, etc.). We knew these would be there and were completely fine with them. We also got many 404s due to the way Magento had implemented their site map (putting in products that were not visible to customers, including all the different file paths to get to a product even though we use a flat structure, etc.). These were frustrating but we did custom work on the site map and let Google resolve those many, many 440s on its own. Sure enough, a few months went by and GWT started to clear out the 404s. All the poor, nonexistent links from the site map and missing links from the old site - they started disappearing from the crawl notices and we slowly went from some 20k 404s to 4k 404s. Still a lot, but we were getting there. Then, in the last 2 weeks, all of those links started showing up again in GWT and reporting as 404s. Now we have 38k 404s (way more than ever reported). I confirmed that these bad links are not showing up in our site map or anything and I'm really not sure how Google found these again. I know, in general, these 404s don't hurt our site. But it just seems so odd. Is there any chance Google bots just randomly crawled a big ol' list of outdated links it hadn't tried for awhile? And does anyone have any advice for clearing them out?
Technical SEO | | Marketing.SCG0 -
Page Not Found Help!
Hi, I recently (about 2 months ago) moved a blog from a separate domain name over to my eCommerce site to help with marketing. http://www.moondoggieinc.com/blog. I seem to have gotten it all to work right, but I'm getting tons of 404 errors and they all have " in them for example: http://www.moondoggieinc.com/blog/”http://www.moondoggieinc.com/custom_dog_tanks_and_tees.php” I'm not sure how this happened of how to fix it, but there are about 250 pages like this. I know how to redirect them all with a 301 in htaccess, but Im not sure if that's the appropriate course to fix this or if that's just putting a patch on something that's causing a more major issue. Or do i just need to write 250 301 redirects? Thanks! Kristy O
Technical SEO | | KristyO0 -
Unnatural Link Warning No Longer Showing in GWT?
Hi, We recently took on a new client that had been hit by the recent Google updates. After having a really good look at their analytics and their link profile it looked like they had been hit with over-optimisation of anchor text. Over the last month or so we have been working to remove a pile of links that contain their main keyword starting with the easiest to remove and the lowest quality. At the same time we have been building links using sematic keywords and junk anchor text in a bid to dilute the ration of main anchor text within their profile. We have a timetable of tasks drawn-up which we are working through, at the end of the timetable when all tasks were complete we planned to write a very nice reconsideration request to Mr Google. I have logged in to Google Webmaster Tools this morning and I have noticed that the 'Unnatural Links' notice has been removed from that domain. Does anyone know if this signifies anything? We haven't sent a reconsideration request to google yet. Thanks.
Technical SEO | | AdeLewis
Ade.0 -
I have found a website on my dedicated server which is not mine
Hi i have found a website on my dedicated server that is not mine by using a number of tools including http://www.yougetsignal.com/tools/web-sites-on-web-server/ the site that i found that was not mine is buycostumes.com i contacted my hosting company and told them and this is the reply i got back We have checked the issue and it appears there have a bug/glitch in this site, because after we checked the mentioned domain we found that this website respond on IP which is not our and is not assisgned in your Dedicated server as you may preview below: Found 3 domains hosted on the same web server as buycostumes.com (66.45.3.44). buycostumes.com (linkback) target.com (linkback) www.certifigroup.com (linkback) Thus if you find any issues on your web server, please mention them in this ticket and we will be glad to provide you with further assistance on this matter. Please feel free to contact us if you have any further technical difficulties. Best regards, Now they have not said if they are going to do anything about this and to be honest i am getting fed up with the hosting company because i am being told that the slow speed i was receiving for my website www.in2town.co.uk was down to it taking a long time to reach my server before reaching my site. Now not sure if this is correct or not but with all the help i have received off semoz i have managed to increase the speed to my site but now i have found this problem. Can anyone tell me if i am being played and also can anyone recommend a professional UK hosting company. Also is this site affecting my spped and my site performance.
Technical SEO | | ClaireH-1848860