Omitting URLs from XML Sitemap - Bad??
-
Hi all,
We are working on an extremely large retail site with some major duplicate content issues that we are in the process of remedying. The site also does not currently have an XML sitemap.
Would it be advisable to create a small XML sitemap with only the main category pages for the time being, and then after our duplicate content issues are resolved, uploading the complete sitemap? Or should we wait to upload anything until all work is complete down to the product page level and canonicals are in place? Will uploading a incomplete sitemap be fraudulent or misleading in the eyes of the search engines and prompt a penalty, or would having at least the main pages mapped while we continue work be okay?
Please let me know if more info is needed to answer! Thanks in advance!
-
Some good answers here, so I'll just throw in my own 2 cents.
The purpose of a sitemap is to help search engines find pages they might not otherwise find during a regular crawl. Sometimes sitemaps can help pages get indexed faster. Other sitemaps serve special purposes, such as News or Video sitemaps, which can add extra information and help ranking particular types of content.
In reality, many, many sitemaps are incomplete, missing, or flat out wrong. To my knowledge, no search engine will penalize you for this, as they would be penalizing half the web.
The danger of an inaccurate sitemap is that the search engines may chose to ignore it completely. Daune Forrester of Bing has stated that if they find a 1% error rate in your sitemap file, then they will disregard the file. However, no such action is known to exist for incomplete sitemaps.
So I'd say there is little in submitting a sitemap of your truly important page. Unfortunately, this won't stop Google from discovering or crawling your duplicate content issues.
The faster you get these fixed, the better.
-
Hi Thomas,
Definitely comforting to hear that you ran your site with an incomplete sitemap without seeing any negative results. Like I said in my response above, I think we will proceed with the partial sitemap, just to have one on there, and then upload a complete one once we can clean up the site a little more. Thanks for your insights - they were very helpful!
-
Hi Saijo,
Thanks for your recommendations! We do want to place a little more emphasis on our top level and main navigation pages, so I think we will probably proceed with a preliminary sitemap with just those pages for now. Once we get to that point, we'll definitely be needing to use multiple sitemaps and an index - thanks for pointing this out!
-
I ran my site with an incomplete site map for years and didn't seem to have a negative effect. I feel that any site map is better than no sitemap. Sitemaps are such a small part of the SEO equation. What they are most useful for is telling Google what to crawl. Beyond that, I don't believe they have much relevance in passing authority.
-
My Theory On this ( I have no tests to prove this )
If you upload a verified site map thats is essentially telling Google these are the important pages on my site. would you want to risk the importance of the other pages by telling Google you consider only the few important categories as the important ones . They wont drop the other pages completely but MIGHT see them as less important .
I really don't see Google penalizing a site for incomplete sitemaps.
If you have a really large site you might also want to look in to multiple sitemaps : http://googlewebmastercentral.blogspot.com.au/2006/10/multiple-sitemaps-in-same-directory.html
You might also want to look in to the best situations to use rel canonical vs a 301 redirect .
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Getting 'Indexed, not submitted in sitemap' for around a third of my site. But these pages ARE in the sitemap we submitted.
As in the title, we have a site with around 40k pages, but around a third of them are showing as "Indexed, not submitted in sitemap" in Google Search Console. We've double-checked the sitemaps we have submitted and the URLs are definitely in the sitemap. Any idea why this might be happening? Example URL with the error: https://www.teacherstoyourhome.co.uk/german-tutor/Egham Sitemap it is located on: https://www.teacherstoyourhome.co.uk/sitemap-subject-locations-surrey.xml
Technical SEO | | TTYH0 -
Trailing Slashes on URLs
Hi everyone I have a question on trailing slashes in URL. The crux of it is this: is having both: example.com/subdirectory/ and: example.com/subdirectory on all of your subdirectories considered duplicate content by Google - or in some other way really bad? We have done a heck a lot of research into this, and it would seem...no one knows for sure (it is easy to get lost in a sea of Webmaster tool forums from 2012). Google itself has both URLs for it's subdirectories (try https://www.google.co.uk/maps and https://www.google.co.uk/maps/) as does Moz; and yet there are some rumblings on the internet of people who think you must put a 'redirect' (although not really a redirect as it isn't a 301) in your htaccess file to one or the other (so for example.com/subdirectory/ would 'forward' to example.com/subdirectory); and this is what bbc.co.uk do. We tried putting this htaccess 'forward' in as an experiment, but I noticed our site then stopped being fully crawled by Google bot, so we reversed it. Can any one shed any light?
Technical SEO | | NickOrbital0 -
Why is robots.txt blocking URL's in sitemap?
Hi Folks, Any ideas why Google Webmaster Tools is indicating that my robots.txt is blocking URL's linked in my sitemap.xml, when in fact it isn't? I have checked the current robots.txt declarations and they are fine and I've also tested it in the 'robots.txt Tester' tool, which indicates for the URL's it's suggesting are blocked in the sitemap, in fact work fine. Is this a temporary issue that will be resolved over a few days or should I be concerned. I have recently removed the declaration from the robots.txt that would have been blocking them and then uploaded a new updated sitemap.xml. I'm assuming this issue is due to some sort of crossover. Thanks Gaz
Technical SEO | | PurpleGriffon0 -
Page URL Change
We're planning on rolling out a redesign of an existing page, and at the same time, we're looking to possibly changing the URL of the page. Currently, the URL is www.blah.com/phraseword1-phraseword2-phraseword3-phraseword4 and we're ranking top 3 in Google SERP for that 4-word phrase. The keyword phrase is something we have in our Page Title, Site Copy and the URL. Now, we are planning on simplifying the URL to below.. www.blah.com/phraseword1-phraseword2 The plan is to 301 redirect the original URL to this new URL and actually work the exact phrase into the copy a few more times. My understanding is that URL doesn't get as much weight as it does in the past, but it's still important. So my question is... How important is the URL in this case where we will continue to have it in our page title and also we'll be working more copy on to the page with the appropriate keyword? Will 301 redirect from the old URL address the issue of passing SEO value for that keyword phrase? Thanks,
Technical SEO | | JoeLin
Joe0 -
How do I use only one URL
my site can be reach by both www.site.com and site.com. How do I make it only use www?
Technical SEO | | Weblion0 -
Compare URLs with 302 redirects
Hello I have a store which was developed in Magento. I have about 8300 errors like this: URL: http://www.theprinterdepo.com/catalog/product_compare/add/product/100/uenc/aHR0cDovL3d3dy50aGVwcmludGVyZGVwby5jb20vcHJpbnRlci1wYXJ0cy5odG1sP3A9NA,,/ 1 Warning 302 (Temporary Redirect) Found 3 days ago <dl> <dt>Redirects to</dt> <dt>http://goo.gl/XMaZg</dt> <dd>Description</dd> <dd>Using a 302 redirect will cause search engine crawlers to treat the redirect as temporary and not pass any link juice (ranking power). We highly recommend that you replace 302 redirects with 301 redirects.</dd> </dl> <a class="more expanded">Minimize</a> These URLs, are generated by magento by the COMPARE feature. In my store we bought an extension called SEO Enterprise Suite and I asked the developers(www.mageworx) about this error. Their answer is: Sorry for the late reply. Our extension adds NOINDEX,FOLLOW tag to compare and cookies pages so that they won't be indexed. I do not think that these redirects can hurt your SEO because these pages won't be indexed at all. The question is: What should I do? Is there anyway that SEOMOZ ignores these URLs? What should I do next, I just dont like to have that HIGH number of errors and warnings. Thank you
Technical SEO | | levalencia10 -
Trailing Slashes In Url use Canonical Url or 301 Redirect?
I was thinking of using 301 redirects for trailing slahes to no trailing slashes for my urls. EG: www.url.com/page1/ 301 redirect to www.url.com/page1 Already got a redirect for non-www to www already. Just wondering in my case would it be best to continue using htacces for the trailing slash redirect or just go with Canonical URLs?
Technical SEO | | upick-1623910 -
Handling '?' in URLs.
Adios! (or something), I've noticed in my SEOMoz campaign that I am getting duplicate content warnings for URLs with extensions. For example: /login.php?action=lostpassword /login.php?action=register etc. What is the best way to deal with these type of URLs to avoid duplicate content penelties in search engines? Thanks 🙂
Technical SEO | | craigycraig0