Submitting XML Sitemap for large website: how big?
-
Hi there,
I’m currently researching how I can generate an XML sitemap for a large website we run. We think that Google is having problems indexing the URLs based on some of the messages we have been receiving in Webmaster tools, which also shows a large drop in the total number of indexed pages.
Content on this site can be accessed in two ways. On the home page, the content appears as a list of posts. Users can search for previous posts and can search all the way back to the first posts that were submitted.
Posts are also categorised using tags, and these tags can also currently be crawled by search engines. Users can then click on tags to see articles covering similar subjects. A post could have multiple tags (e.g. SEO, inbound marketing, Technical SEO) and so can be reached in multiple ways by users, creating a large number of URLs to index.
Finally, my questions are:
- How big should a sitemap be? What proportion of the URLs of a website should it cover?
- What are the best tools for creating the sitemaps of large websites?
- How often should a sitemap be updated?
Thanks
-
Thanks Matt, that's really useful
-
Yeah, it's better to have one than not - but I have always aimed to make it as complete as I can. Why? I'm not sure - mostly because I figure Google is GREAT at crawling my main structure - it's those far-reaching pages that I'm hoping they find in the sitemap.
-
Thanks for both your replies - I will check out the tools and recommendations you suggested.
I'm sure I remember somewhere reading a recommendation that it was only necessary to submit the basic site structure in a sitemap. It sounds like this is not the case and that a site map should , if possible, be comprehensive.
Would it be better to have a basic sitemap giving the main navigational URLs than having nothing at all?
-
I've created sitemaps with the paid version of Screaming Frog that were almost 80,000 pages. That's what I'd use. No point asking what % unless you can't get it all. If you're crawling Microsoft, break it up. Otherwise, organize it if you can (category sitemap, month by month, something.) or just make one big finger to Google type sitemap. lol
-
Hi!
First off, since your content can be accessed in multiple ways, I'd make sure that you're applying means to indicate duplicate pages as such to search engines. Easy access to great content is fantastic, but you can devaluate your own pages a lot when you're not careful. If you're not using it yet, I recommend implementing the rel="canonical" tag in your website.
To answer your questions:
- It should cover all URLs that want indexed. Ideally, that would be every URL
- I'm not sure what 'the best' tools would be, but I used http://www.xml-sitemaps.com a lot a few years back. Their sitemaps are free up to 500 URLs. There are payment plans for bigger ones.
- I wouldn't update an XML sitemap for every new page you make once a month. Instead, let the search engine find their own way in that case. Should your entire site structure change, an XML sitemap can be a great way to help search engine understand your new site setup better.
I hope this helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
URL indexed but not submitted in sitemap, however the URL is in the sitemap
Dear Community, I have the following problem and would be super helpful if you guys would be able to help. Cheers Symptoms : On the search console, Google says that some of our old URLs are indexed but not submitted in sitemap However, those URLs are in the sitemap Also the sitemap as been successfully submitted. No error message Potential explanation : We have an automatic cache clearing process within the company once a day. In the sitemap, we use this as last modification date. Let's imagine url www.example.com/hello was modified last time in 2017. But because the cache is cleared daily, in the sitemap we will have last modified : yesterday, even if the content of the page did not changed since 2017. We have a Z after sitemap time, can it be that the bot does not understands the time format ? We have in the sitemap only http URL. And our HTTPS URLs are not in the sitemap What do you think?
Intermediate & Advanced SEO | | ZozoMe0 -
Strange rankings on new website
HI All My website is 10 years old, and has decent rankings. The domain is www.advanced-driving.co.uk I have recently had a major overhaul of the site, before it was very outdated, with lots of duplicated content. My main keywords are "advanced driving course" and "advanced driving courses" both of which I am on page 1. However, since I have been live with new site - (5 days) I am not ranking for some easy win keywords. I have submitted new content thought webmaster tools, and whilst some content is ranking, others are not. The content not ranking is fresh and unique ( have used copyscape on all new pages). For example my homepage is on page 1 for "advanced driving courses london" - around rank 6. So I hand made some content titled advanced driving courses london to provide more of an exact match, outlining our courses in London and the routes we take - http://www.advanced-driving.co.uk/defensive-advanced-driving-courses-london/ However, this page which is unique does not rank at all....I have done this with another website and it worked well, but google is not understanding this at all. Also I am now on page 1 for "advanced driving course" but not for "advanced driving courses" - well I am but the page for the plural keyword is a page not really related - surely Googles semantic search should realise course and courses are the same! I suspect that Google is still getting used to my new website? No errors or anything in Webmaster tools... Can anyone confirm this - or outline if I have done something awful..!! Thanks Rob
Intermediate & Advanced SEO | | robert780 -
Will Schema help my website?
I'm doing SEO on a website, zing.co.nz, which is a soon to launch company. At the moment there is a splash sight up, which will be replaced by the real sight in a few weeks upon launch. Is it worth me putting in Schemas (for the first time) so that it is recognized as an organization? Will this effect us in the serps? Thanks for your help 🙂
Intermediate & Advanced SEO | | Startupfactory0 -
XML Sitemap works fine in GWT, but does not show in SERP
XML Sitemap works properly in GWT, but when I run a search in Google for "site:example.com/sitemap.xml" it does not show. However, my XML image sitemap show when I run the same search in Google. Is this potentially an issue on my end and is there a solution?
Intermediate & Advanced SEO | | khi50 -
Canonicalized Website
We are new to SEO MOZ, and as we are doing our evaluation, multiple page problems have arisen. Our domain is www.moxicopy.com and www.moxicopy.com/blog. Our blog is wordpress hosted but integrated into our site. As we ran our analytics from MOZ PRO, we got TONS of Duplicate Page Title and Duplicate Page Content warnings, over 90 each. Most seem to come from our blog and our different products (we are an ecommerce website). Would the canonicalization of the pages be the cause? And couuld someone further explain exactly what canonical/canonicalization is>? I am very confused, and have a feeling that this is what has hurt our site so much in the last 2-3 weeks
Intermediate & Advanced SEO | | Moxicopy.com0 -
Unable to Crawl my Website
Hi all, I have a website that I am trying to promote, but tried to add it here in SEOMoz and got the following message: We have detected that the root domain evolving-networks.co.uk does not respond to web requests. Using this domain, we will be unable to crawl your site or present accurate SERP information. Does anyone know why this website cannot be crawled? Please help. Thank you in advance!
Intermediate & Advanced SEO | | LSDigital0 -
Mobile Sitemap
Do I need to submit a sitemap for a mobile site? What are the benefits/disadvantages? Thanks!
Intermediate & Advanced SEO | | AlightAnalytics0 -
What is the best process to move a wordpress website ?
Hello Seomoz community, Simple question , i am looking forward to move a word press website from blog.domain.com sub domain to domain.com/blog to increase my indexed link on the root domain indexed by search engine.The blog i want to move already have high PR ( 6 ) i , of course want to avoid broken link , already indexed in search engine. What would be the best way to process to prepare this move accordingly on a SEO perspective ??? Many thanks in advance. Yan Desjardins
Intermediate & Advanced SEO | | SherWeb0