Submitting XML Sitemap for large website: how big?
-
Hi there,
I’m currently researching how I can generate an XML sitemap for a large website we run. We think that Google is having problems indexing the URLs based on some of the messages we have been receiving in Webmaster tools, which also shows a large drop in the total number of indexed pages.
Content on this site can be accessed in two ways. On the home page, the content appears as a list of posts. Users can search for previous posts and can search all the way back to the first posts that were submitted.
Posts are also categorised using tags, and these tags can also currently be crawled by search engines. Users can then click on tags to see articles covering similar subjects. A post could have multiple tags (e.g. SEO, inbound marketing, Technical SEO) and so can be reached in multiple ways by users, creating a large number of URLs to index.
Finally, my questions are:
- How big should a sitemap be? What proportion of the URLs of a website should it cover?
- What are the best tools for creating the sitemaps of large websites?
- How often should a sitemap be updated?
Thanks
-
Thanks Matt, that's really useful
-
Yeah, it's better to have one than not - but I have always aimed to make it as complete as I can. Why? I'm not sure - mostly because I figure Google is GREAT at crawling my main structure - it's those far-reaching pages that I'm hoping they find in the sitemap.
-
Thanks for both your replies - I will check out the tools and recommendations you suggested.
I'm sure I remember somewhere reading a recommendation that it was only necessary to submit the basic site structure in a sitemap. It sounds like this is not the case and that a site map should , if possible, be comprehensive.
Would it be better to have a basic sitemap giving the main navigational URLs than having nothing at all?
-
I've created sitemaps with the paid version of Screaming Frog that were almost 80,000 pages. That's what I'd use. No point asking what % unless you can't get it all. If you're crawling Microsoft, break it up. Otherwise, organize it if you can (category sitemap, month by month, something.) or just make one big finger to Google type sitemap. lol
-
Hi!
First off, since your content can be accessed in multiple ways, I'd make sure that you're applying means to indicate duplicate pages as such to search engines. Easy access to great content is fantastic, but you can devaluate your own pages a lot when you're not careful. If you're not using it yet, I recommend implementing the rel="canonical" tag in your website.
To answer your questions:
- It should cover all URLs that want indexed. Ideally, that would be every URL
- I'm not sure what 'the best' tools would be, but I used http://www.xml-sitemaps.com a lot a few years back. Their sitemaps are free up to 500 URLs. There are payment plans for bigger ones.
- I wouldn't update an XML sitemap for every new page you make once a month. Instead, let the search engine find their own way in that case. Should your entire site structure change, an XML sitemap can be a great way to help search engine understand your new site setup better.
I hope this helps!
- It should cover all URLs that want indexed. Ideally, that would be every URL
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Please have a look at my website. I am stuck here.
Here might be the reason. I had loads of unnecessary content so I given them the noindex tag. I tried to change the robot.txt file but that shouldn't be a problem in SEO. First my site had a country specific domain and then a year later I changed it to .Com, as to target globally (Mainly US). My site is ranking well in that specific country (never been close to page 1) on page 3 almost every time. It's not ranking in other countries, despite the fact that I've not targeted it to any specific country since the domain was changed. A month ago, I deleted 404 pages and all the thin content which was indexed in the SERP and also deleted the duplicated contents and as well as the copied contents. Meanwhile I've also tried changing the headings in some of the products articles as they were causing the duplicate heading issue. I've recently switched my hosting from the UK based server to the Us based server because the last hosting has bad downtime. So far until now nothing seems to be working in my favor. I'm just tired of resolving issues and in return finding a zero result. This is my devil site: 10stuffs.com plz check it out and tell me why my site is not ranking at all and what sould I do.
Intermediate & Advanced SEO | | anshu14320 -
Check website update frequency?
Is the tools out there that can check our frequently website is updated with new content products? I'm trying to do an SEO analysis between two websites. Thanks in advance Richard
Intermediate & Advanced SEO | | seoman100 -
Do you get links from new websites?
There's a new industry specific website that looks decent. It's clean and nothing spammy. However, it's so new it's DA is under 10. Is it worth pursuing a link from a site like this? On one hand, there's nothing spammy and it is industry specific. On the other...it's just DA is so terrible (worse than any of our other links), I don't want it to hurt us. Any thoughts? Ruben
Intermediate & Advanced SEO | | KempRugeLawGroup1 -
Website Structured data in Google
Can anyone help me to show website structure data in Google when someone search my website in Google. I already added my website in Google and Google webmaster tool. Thanks in adv.
Intermediate & Advanced SEO | | talkinnetventure0 -
Best server-side sitemap generators
I've been looking into sitemap generators recently and have got a good knowledge of what creating a sitemap for a small website of below 500 URLs involves. I have successfully generated a sitemap for a very small site, but I’m trying to work out the best way of crawling a large site with millions of URLs. I’ve decided that the best way to crawl such a large number of URLs is to use a server side sitemap, but this is an area that doesn’t seem to be covered in detail on SEO blogs / forums. Could anyone recommend a good server side sitemap generator? What do you think of the automated offerings from Google and Bing? I’ve found a list of server side sitemap generators from Google, but I can’t see any way to choose between them. I realise that a lot will depend on the type of technologies we use server side, but I'm afraid that I don't know them at this time.
Intermediate & Advanced SEO | | RG_SEO0 -
Am I missing an issue on my website?
Are there any glaring issues that I am missing with my site? I am building links, and growing the profile but had seen a drop in rankings a couple of months ago. Is this do to a site issue or am I just missing something? www.wallybuysell.com Any help would be great.
Intermediate & Advanced SEO | | CKerr0 -
Big 301 Redirect Help!
Hey guys I need a little help with setting up a big 301. Background: It's a bit of a mess as the old site is a total mess after being online for 10 years plus. It has html and php pages, and a mod rewrite to redirect old html links to the newer php version of those pages. It's now moving to a new site and as the domain name and URL structure has changed we can't use any fancy regex and have to do a page to page redirect. There are 1500 pages to redirect. However, the old site has thousands of linking root domains, and some of these are to the old html pages (which currently redirect to the php pages) and some to the newer php pages. Question: My initial plan was to leave the mod rewrite and only redirect the php pages. That means 1500 individual redirects instead of 3000 if I individually redirect both the php and html pages. I'm not sure what's best to be honest. We don't really want multiple hops in the redirect (html>php>new site), but surely 1500 redirects is better than 3000! Does anyone have any advice on which option may be best, or even a better option? Thanks 🙂
Intermediate & Advanced SEO | | HarveyP0 -
Broken sitemaps vs no sitemaps at all?
The site I am working on is enormous. We have 71 sitemap files, all linked to from a sitemap index file. The sitemaps are not up to par with "best practices" yet, and realistically it may be another month or so until we get them cleaned up. I'm wondering if, for the time being, we should just remove the sitemaps from Webmaster Tools altogether. They are currently "broken", and I know that sitemaps are not mandatory. Perhaps they're doing more harm than good at this point? According to Webmaster Tools, there are 8,398,082 "warnings" associated with the sitemap, many of which seem to be related to URLs being linked to that are blocked by robots.txt. I was thinking that I could remove them and then keep a close eye on the crawl errors/index status to see if anything changes. Is there any reason why I shouldn't remove these from Webmaster Tools until we get the sitemaps up to par with best practices?
Intermediate & Advanced SEO | | edmundsseo0