Google Webmaster Tools Sitemap errors for phantom urls?
-
Two weeks ago we changed our urls so the correct addresses are all lowercase. Everything else 301 redirects to those. We have submitted and made sure that Google has downloaded our updated sitemap several times since.
Even so, Webmaster Tools is reporting 33000 + errors in our sitemap for urls that are no longer in our sitemap and haven't been for weeks. It claims to have found the errors within the last couple of days but the sitemap has been updated for a couple of weeks and has been downloaded by Google at least three times since.
Here is our sitemap: http://www.aquinasandmore.com/urllist.xml
Here are a couple of urls that Webmaster Tools says are in the sitemap:
http://www.aquinasandmore.com/catholic-gifts/Caroline-Gerhardinger-Large-Sterling-Silver-Medal/sku/78664
Redirect errorunavailable
Oct 7, 2011
http://www.aquinasandmore.com/catholic-gifts/Catherine-of-Bologna-Small-Gold-Filled-Medal/sku/78706
Redirect errorunavailable
Oct 7, 2011 -
How long does the actual data usually take to catch up with what WMT says is current?
I have not experienced any delay before. There should only be one sitemap record for your site at any time. That record could be composed of multiple files, but it is one collection of records.
When Google identifies crawl errors, those errors should be generated from the sitemap on file at the time of the error. There is a view sitemap option in Google WMT you can use to see the sitemap they have on file. This step would be next. If you can confirm the bad URL does not appear in the sitemap, I would then wait to see if the issue re-appears after today, October 11th.
I know this is frustrating but the system is very straight forward. I cannot explain why a URL not included in your sitemap would appear on your sitemap crawl errors tab. The only two possibilities I can come with is either you have made an error when sharing some information, or there is an unusual glitch on Google's end.
With all the above noted, working with sitemaps is not a good investment of your time. If your site navigation is properly designed, your sitemap offers no benefit whatsoever.
-
"then these links should not appear going forward." - They are showing up now even though Google says they have our latest sitemap and that the errors were found yesterday. How long does the actual data usually take to catch up with what WMT says is current?
The image urls are built from the actual title on the fly and don't 301 so those aren't a problem. The other one you mentioned does need to be cleaned up in the site map. Thanks for catching that.
These errors are showing up when I go to the crawl errors section and click the sitemap tab. Yes, the sitemap I shared is the same one in WMT.
-
I was unable to locate the URLs listed in your sitemap. If you Google WMT tools settings are correct and the sitemap which you have shared is the same one listed in your Google WMT account, then these links should not appear going forward.
You would need to examine your Google WMT account closely to determine the exact source of these errors.
Where exactly within your Google WMT are you seeing these errors? How are you identifying the source of these URLs are being from your sitemap?
Two weeks ago we changed our urls so the correct addresses are all lowercase.
There are many URLs in your site map which are not lower case. An example:
http://www.aquinasandmore.com/title/Brian-Kolodiejchuk/FuseAction/store.AuthorSearch/Author/2337/
Also you share a lot of image URLs which are not lower case either.
I would not necessarily advise cleaning up the entire site, but at least establish the best practice going forward.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What to try when Google excludes your URL only from high-traffic search terms and results?
We have a high authority blog post (high PA) that used to rank for several high-traffic terms. Right now the post continues to rank high for variations of the high-traffic terms (e.g keyword + " free", keyword + " discussion") but the URL has been completed excluded from the money terms with alternative URLs of the domain ranking on positions 50+. There is no manual penalty in place or a DCMA exclusion. What are some of the things ppl would try here? Some of the things I can think of: - Remove keyword terms in article - Change the URL and do a 301 redirect - Duplicate the POST under new URL, 302 redirect from old blog post, and repoint links as much as you have control - Refresh content including timestamps - Remove potentially bad neighborhood links etc Has anyone seen the behavior above for their articles? Are there any recommendations? /PP
Intermediate & Advanced SEO | | ppseo800 -
If we migrate the URLs from HTTP to HTTPS, Do I need to request again an inclusion in Google News?
Hi, If we migrate the URLs from HTTP to HTTPS, Do I need to request again an inclusion in Google News? Thanks Roy
Intermediate & Advanced SEO | | kadut1 -
Removing Parameterized URLs from Google Index
We have duplicate eCommerce websites, and we are in the process of implementing cross-domain canonicals. (We can't 301 - both sites are major brands). So far, this is working well - rankings are improving dramatically in most cases. However, what we are seeing in some cases is that Google has indexed a parameterized page for the site being canonicaled (this is the site that is getting the canonical tag - the "from" page). When this happens, both sites are being ranked, and the parameterized page appears to be blocking the canonical. The question is, how do I remove canonicaled pages from Google's index? If Google doesn't crawl the page in question, it never sees the canonical tag, and we still have duplicate content. Example: A. www.domain2.com/productname.cfm%3FclickSource%3DXSELL_PR is ranked at #35, and B. www.domain1.com/productname.cfm is ranked at #12. (yes, I know that upper case is bad. We fixed that too.) Page A has the canonical tag, but page B's rank didn't improve. I know that there are no guarantees that it will improve, but I am seeing a pattern. Page A appears to be preventing Google from passing link juice via canonical. If Google doesn't crawl Page A, it can't see the rel=canonical tag. We likely have thousands of pages like this. Any ideas? Does it make sense to block the "clicksource" parameter in GWT? That kind of scares me.
Intermediate & Advanced SEO | | AMHC0 -
Does including your site in Google News (and Google) Alerts helps with SEO?
Based on the following article http://homebusiness.about.com/od/yourbusinesswebsite/a/google-alerts.htm in order to check if you are included you need to run site:domain.com and click the news search tab. If you are not there then... I ran the test on MOZ and got no results which surprised me. Next step according to :https://support.google.com/news/publisher/answer/40787?hl=en#ts=3179198 is to submit your site for inclusion. Should I? Will it help? P.S.
Intermediate & Advanced SEO | | BeytzNet
This is a followup question to the following: http://moz.com/community/q/what-makes-a-site-appear-in-google-alerts-and-does-it-mean-anything0 -
Sitemap.xml
Hi guys I read the seomoz article about sitemap.xml dated 2008. Just wanted to check views on: Is it worthwhile using the 'priority' What if everything is set to 100% Any tips to using the priority Many thanks in advance! Richard
Intermediate & Advanced SEO | | Richard5550 -
Can links indexed by google "link:" be bad? or this is like a good example by google
Can links indexed by google "link:" be bad? Or this is like a good example shown by google. We are cleaning our links from Penguin and dont know what to do with these ones. Some of them does not look quality.
Intermediate & Advanced SEO | | bele0 -
Google+ Pages on Google SERP
Do you think that a Google+ Page (not profile) could appear on the Google SERP as a Rich Snippet Author? Thanks
Intermediate & Advanced SEO | | overalia0 -
Squarespace Errors
We have a website hosted by SquareSpace. We are happy with SS, but have done some crawl diagnostics and noticed several errors. These are primarily: Duplicate Page Title Duplicate Page Content Client Error (4xx) We dont really understand why these errors are taking place, and wonder if someone in the Seomoz forum has a firm understanding of SS who is able to assist us with this? rainforestcruises.com thanks.
Intermediate & Advanced SEO | | RainforestCruises0