Submitting XML Sitemap for large website: how big?
-
Hi there,
I’m currently researching how I can generate an XML sitemap for a large website we run. We think that Google is having problems indexing the URLs based on some of the messages we have been receiving in Webmaster tools, which also shows a large drop in the total number of indexed pages.
Content on this site can be accessed in two ways. On the home page, the content appears as a list of posts. Users can search for previous posts and can search all the way back to the first posts that were submitted.
Posts are also categorised using tags, and these tags can also currently be crawled by search engines. Users can then click on tags to see articles covering similar subjects. A post could have multiple tags (e.g. SEO, inbound marketing, Technical SEO) and so can be reached in multiple ways by users, creating a large number of URLs to index.
Finally, my questions are:
- How big should a sitemap be? What proportion of the URLs of a website should it cover?
- What are the best tools for creating the sitemaps of large websites?
- How often should a sitemap be updated?
Thanks
-
Thanks Matt, that's really useful
-
Yeah, it's better to have one than not - but I have always aimed to make it as complete as I can. Why? I'm not sure - mostly because I figure Google is GREAT at crawling my main structure - it's those far-reaching pages that I'm hoping they find in the sitemap.
-
Thanks for both your replies - I will check out the tools and recommendations you suggested.
I'm sure I remember somewhere reading a recommendation that it was only necessary to submit the basic site structure in a sitemap. It sounds like this is not the case and that a site map should , if possible, be comprehensive.
Would it be better to have a basic sitemap giving the main navigational URLs than having nothing at all?
-
I've created sitemaps with the paid version of Screaming Frog that were almost 80,000 pages. That's what I'd use. No point asking what % unless you can't get it all. If you're crawling Microsoft, break it up. Otherwise, organize it if you can (category sitemap, month by month, something.) or just make one big finger to Google type sitemap. lol
-
Hi!
First off, since your content can be accessed in multiple ways, I'd make sure that you're applying means to indicate duplicate pages as such to search engines. Easy access to great content is fantastic, but you can devaluate your own pages a lot when you're not careful. If you're not using it yet, I recommend implementing the rel="canonical" tag in your website.
To answer your questions:
- It should cover all URLs that want indexed. Ideally, that would be every URL
- I'm not sure what 'the best' tools would be, but I used http://www.xml-sitemaps.com a lot a few years back. Their sitemaps are free up to 500 URLs. There are payment plans for bigger ones.
- I wouldn't update an XML sitemap for every new page you make once a month. Instead, let the search engine find their own way in that case. Should your entire site structure change, an XML sitemap can be a great way to help search engine understand your new site setup better.
I hope this helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Ranking issue for new website
Hi all, I have got a specific SEO challenge. 6 months ago, we started to build an eCommerce site (located in the UK). In order to speed up the site launch, we copied the entire site over from an existing site based in Ireland. Now, the new UK site has been running for 5 months. Google has indexed many pages, which is good, but we can't rank high (position: between 20-30 for most pages). We thought it was because of content duplication in spite of different regions. So we tried to optimize the pages for the UK site to make them more UK-related and avoid content duplication. I've also used schema to tell google it's a UK-based site and set up Google my business and got more local citations. Besides, If you could give me any suggestions, it'd be perfect.
Intermediate & Advanced SEO | | Insightful_Media
Thank you so much for your time and advice.1 -
Please have a look at my website. I am stuck here.
Here might be the reason. I had loads of unnecessary content so I given them the noindex tag. I tried to change the robot.txt file but that shouldn't be a problem in SEO. First my site had a country specific domain and then a year later I changed it to .Com, as to target globally (Mainly US). My site is ranking well in that specific country (never been close to page 1) on page 3 almost every time. It's not ranking in other countries, despite the fact that I've not targeted it to any specific country since the domain was changed. A month ago, I deleted 404 pages and all the thin content which was indexed in the SERP and also deleted the duplicated contents and as well as the copied contents. Meanwhile I've also tried changing the headings in some of the products articles as they were causing the duplicate heading issue. I've recently switched my hosting from the UK based server to the Us based server because the last hosting has bad downtime. So far until now nothing seems to be working in my favor. I'm just tired of resolving issues and in return finding a zero result. This is my devil site: 10stuffs.com plz check it out and tell me why my site is not ranking at all and what sould I do.
Intermediate & Advanced SEO | | anshu14320 -
In Search Console, why is the XML sitemap "issue" count 5x higher than the URL submission count?
Google Search Console is telling us that there are 5,193 sitemap "issues" - URLs that are present on the XML sitemap that are blocked by robots.txt However, there are only 1,222 total URLs submitted on the XML sitemap. I only found 83 instances of URLs that fit their example description. Why is the number of "issues" so high? Does it compound over time as Google re-crawls the sitemap?
Intermediate & Advanced SEO | | FPD_NYC0 -
Sitemap Query
I've decided to write my own sitemap because frankly, the automated ones pull all kinds of out of I don't know where. So to get around that, manual it is. But I have some products appear in various categories, should I still list every product in each category in the sitemap, regardless of some being duplicates, or should I choose the most relevant category and list them there? I do have a canonical URL extension which should resolve any duplicate content I have.
Intermediate & Advanced SEO | | moon-boots0 -
Sitemap and content question
This is our primary sitemap https://www.samhillbands.com/sitemaps/sitemap.xml We have a about 750 location based URL's that aren't currently linked anywhere on the site. https://www.samhillbands.com/sitemaps/locations.xml Google is indexing most of the URL because we submitted the locations sitemap directly for indexing. Thoughts on that? Should we just create a page that contains all of the location links and make it live on the site? Should we remove the locations sitemap from separate indexing...because of duplicate content? # Sitemap Type Processed Issues Items Submitted Indexed --- --- --- --- --- --- --- --- --- 1 /sitemaps/locations.xml Sitemap May 10, 2016 - Web 771 648 2 /sitemaps/sitemap.xml Sitemap index May 8, 2016 - Web 862 730
Intermediate & Advanced SEO | | brianvest0 -
URL strategy mobile website
Hello everyone, We are facing a challenging decision about where our website (Flash Gaming website) is going. We are in the process of creating html5 games in the same theme of the flash games that we provide to our users. Now our main concern is to decide how to show this new content to the user? Shall we create brand new set of urls such as : http://www.mydomain.com/games/mobile/kids/ Or shall we adapt the main desktop url : http://www.mydomain.com/games/kids/ and show the users two different versions of the page depending on whether they are using a mobile device (so they see a mobile version) or a pc/laptop (so they a see desktop version). Or even redirect people to a sub-domain : http://m.mydomain.com/ The main idea we had is to keep the same url structure, as it seems that google is giving the same search results if you are using a mobile device or not. And creating a new set of urls or even a sub-domain, may involve a lot of work to get those new links to the same PA as the desktop URL that is here and know since a while now. Also the desktop page game should not be accessible to the mobile devices, so should this be redirected (301?) to the mobile homepage of the site? But how google will look at the fact that one url is giving 2 different contents, CSS etc, and also all those redirects might look strange... we are worried that doing so will hurt the page authority and its ranking ... but we are trying to find the best way to combine SEO and user experience. Any input on this will be really appreciated. Cheers,
Intermediate & Advanced SEO | | drimlike0 -
Google Sitemap only indexing 50% Is that a problem?
We have about 18,000 pages submitted on our Google Sitemap and only about 9000 of them are indexed. Is this a problem? We have a script that creates a sitemap on a daily basis and it is submitted on a daily basis. Am I better off only doing it once a week? Is this why I never get to the full 18,000 indexed?
Intermediate & Advanced SEO | | EcommerceSite0 -
My websites position has dropped, any ideas why?
Hi, First off im new here, so hello to everyone. Now to the reason why I have joined. I am currently trying to rank for 2 terms: **UK Bank Holidays 2013 (Term 1) **and Bank Holidays 2013 (Term 2) The page which im trying to rank these terms on is: http://www.followuk.co.uk/bank-holidays Now some background history: On the 29th Dec 2013, term 1 was 5th and term 2 was 7th - rankings achieved through guest blogging. Last night I changed the h1 tag from 'Bank Holidays 2013' to 'UK Bank Holidays 2013'. Re-worded the meta description to try and increase the CTR. And removed the term 'Bank Holiday' from the end of each sub-heading - Ex: 'New Year's Day Bank Holiday' to 'New Year's Day' - I did this because I felt it was to much so in total 'Bank Holiday' term had been removed from 5 sub-headings. Ok, so I went into WMT and resubmitted for indexing, over night the page got reindexed - the term 'UK Bank Holidays 2013' stayed at the same position (5) BUT the 'Bank Holidays 2013' term dropped into hell at roughly position 250. I'm thinking of changing everything back and crossing my fingers that term which dropped comes back BUT maybe im being to rash and it might jump back as the page stands. I did a grade test using SEOMOZ and both terms generate a grade of 'A'. Has anyone got any ideas? Sorry if the thread is a bit messy im currently crying all over the keyboard as im typing. Thanks
Intermediate & Advanced SEO | | followuk0