Sitemaps for Google
-
In Google Webmaster Central, if a URL is reported in your site map as 404 (Not found), I'm assuming Google will automatically clean it up and that the next time we generate a sitemap, it won't include the 404 URL.
Is this true?
Do we need to comb through our sitemap files and remove the 404 pages Google finds, our will it "automagically" be cleaned up by Google's next crawl of our site?
-
Nice - thanks Kane. Cool Chrome tool too, thanks for the suggestion.
I'm in GWT every morning to check things out since our site is fairly large - about 220,000 pages. The sitemap checker is a really cool new feature in GWT too!
-
You should be monitoring them periodically. A quick scan monthly or quarterly depending on how large your site is should be sufficient. I've read reports within the last year that say Bing tends to be picky about sites with 404s in their sitemaps and will begin to distrust them if they get too messy. I am assuming that Google doesn't like it much, either.
It's very easy to check a large number of links easily, luckily: download the Check My Links Chrome Add-on here:
https://chrome.google.com/webstore/detail/ojkcdipcgfaekbeaelaapakgnjflfglf
Click that while looking at your sitemap, and it will test all of the URLs and make any errors stick out in red. Then fix them.
On the other hand, GWT will report them within a few days of finding them. If you're already in your GWT on a frequent basis, make that one more task to check on.
You should be fixing all 404s with 301 directs. Read "How to Fix Crawl Errors in GWT" for any tricky ones that you can't figure out.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
XML sitemap and rel alternate hreflang requirements for Google Shopping
Our company implemented Google Shopping for our site for multiple countries, currencies and languages. Every combination of language and country is accessible via a url path and for all site pages, not just the pages with products for sale. I was not part of the project. We support 18 languages and 14 shop countries. When the project was finished we had a total of 240 language/country combinations listed in our rel alternate hreflang tags for every page and 240 language/country combinations in our XML sitemap for each page and canonicals are unique for every one of these page. My concern is with duplicate content. Also I can see odd language/country url combinations (like a country with a language spoken by a very low percentage of people in that country) are being crawled, indexed, and appearing in serps. This uses up my crawl budget for pages I don't care about. I don't this it is wise to disallow urls in robots.txt for that we are simultaneously listing in the XML sitemap. Is it true that these are requirements for Google Shopping to have XML sitemap and rel alternate hreflang for every language/country combination?
Technical SEO | | awilliams_kingston0 -
URL Indexed But Not Submitted to Sitemap
Hi guys, In Google's webmaster tool it says that the URL has been indexed but not submitted to the sitemap. Is it necessary that the URL be submitted to the sitemap if it has already been indexed? Appreciate your help with this. Mark
Technical SEO | | marktheshark100 -
Google webmaster errors
**If you know what these google webmasters errors mean, and you can explain it to me in simple english and tell me how I can locate the problem, I would really appreciate it!. <colgroup><col width=""><col width=""><col width=""><col width=""><col width="*"><col width="124"><col width="54"></colgroup>
Technical SEO | | Joseph-Green-SEO
| | | | | Server error | | | | Soft 404 | | | | Access denied | | Not found | | | Not followed | | | |** I have many of these errors, is it harming SEO?Yoseph0 -
Does Google index has expiration?
Hi, I have this in mind and I think you can help me. Suppose that I have a pagin something like this: www.mysite.com/politics where I have a list of the current month news. Great, everytime the bot check this url, index the links that are there. What happens next month, all that link are not visible anymore by the user unless he search in a search box or google. Does google keep those links? The current month google check that those links are there, but next month are not, but they are alive. So, my question is, Does google keep this links for ever if they are alive but nowhere in the site (the bot not find them anymore but they work)? Thanks
Technical SEO | | informatica8100 -
How is Google finding our preview subdomains?
I've noticed that Google is able to find, crawl and index preview subdomains we set up for new client sites (e.g. clientpreview.example.com). I know now to use "meta name="robots" and robots.txt) to block the search engines from crawling these subdomains. My question though, is how is Google finding these subdomains? We don't link to these preview domains from anywhere else, so I can't figure out how Google is even getting there. Does anybody have any insight on this?
Technical SEO | | ZeeCreative0 -
Google.com
Hi We are managing a .com site for a client working on getting the site ranking. The site is hosted in the US. The content is rich, deep and unique. The site is in a competitive market but had begun ranking top 50 for a selection of keywords and we could see many more in the top 100. The site is now going backwards and only has a few keywords ranking top 50 and all the others have disappeared from the rankings all together. Any thought as to what could cause this. The site is managed from the Uk but as mentioned is hosted in the US. No penguin issues as all content unique, rich, relevant and fresh. SEO is also managed from the UK. Thoughts
Technical SEO | | SEOwins0 -
Is google all over the place tonight?
Is it me or is google all over the place tonight? Whilst checking my rankings I came across a site with a page authority of 29 and 23 links from 5 domains ranking at number 6 for a competitive keyword! This site came from nowhere and I'm getting different results every time I search! Weird....
Technical SEO | | SamCUK0 -
Google Fresh Update
Here's a question, if you put the date in your url (often a wordpress format) would that be seen by Google now as 'fresh' content? - part of their new update.
Technical SEO | | pauledwards0