Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Best server-side sitemap generators
-
I've been looking into sitemap generators recently and have got a good knowledge of what creating a sitemap for a small website of below 500 URLs involves. I have successfully generated a sitemap for a very small site, but I’m trying to work out the best way of crawling a large site with millions of URLs.
I’ve decided that the best way to crawl such a large number of URLs is to use a server side sitemap, but this is an area that doesn’t seem to be covered in detail on SEO blogs / forums. Could anyone recommend a good server side sitemap generator? What do you think of the automated offerings from Google and Bing? I’ve found a list of server side sitemap generators from Google, but I can’t see any way to choose between them. I realise that a lot will depend on the type of technologies we use server side, but I'm afraid that I don't know them at this time.
-
Unless they have fixed it in recent months, xml-sitemaps does not generate correct video sitemaps.
-
Yeah, they offer free and paid hosted versions too. But I found the server side version much simpler to setup and control.
-
-
Excellent advice Federico. My first reaction was, "but that's not a server-side sitemap generator". I just looked at their website though and it turns out that it is! Looks like I need to read things more carefully!
I'll look into that as an option but if anyone else has any server side sitemap generators that they'd recommend then I'd be really interested to hear about them

-
I have been using xml-sitemaps (paid version) for all my sites over 5 years and they work like a charm, scraping and indexing what it needs to be indexed ans scraped, plus it consumes really low resources. 100% recommended (they have nice plugins too for extra sitempas (video, news, images, etc).
Hope that helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Sitemaps: Best Practice
What should and what shouldn't go in the sitemap? In particular, pages like subscribe to our newsletter/ unsubscribe to our newsletter? Is there really any benefit in highlighting those pages to the SEs? Thanks for any advice/ anecdotes 🙂
Intermediate & Advanced SEO | | Fubra0 -
What is best practice for "Sorting" URLs to prevent indexing and for best link juice ?
We are now introducing 5 links in all our category pages for different sorting options of category listings.
Intermediate & Advanced SEO | | lcourse
The site has about 100.000 pages and with this change the number of URLs may go up to over 350.000 pages.
Until now google is indexing well our site but I would like to prevent the "sorting URLS" leading to less complete crawling of our core pages, especially since we are planning further huge expansion of pages soon. Apart from blocking the paramter in the search console (which did not really work well for me in the past to prevent indexing) what do you suggest to minimize indexing of these URLs also taking into consideration link juice optimization? On a technical level the sorting is implemented in a way that the whole page is reloaded, for which may be better options as well.0 -
Should sitemap include https pages?
Hi guys, Trying to figure out some onsite issues I've been having. Would appreciate any feedback on the following 2 questions: My homepage (http://mysite.com) is a 301 redirect to https://mysite.com, which is under SSL. Only 2 pages of my site are https, the rest are http. Should the directory of my sitemap be https://mysite.com/sitemap.xml or should it be kept with http (even though the redirected homepage is to https)? Should my sitemap include the https pages (only 2 pages) as well as the http? Thanks, G
Intermediate & Advanced SEO | | G.Anderson0 -
Best way to block a sub-domain from being indexed
Hello, The search engines have indexed a sub-domain I did not want indexed its on old.domain.com and dev.domain.com - I was going to password them but is there a best practice way to block them. My main domain default robots.txt says :- Sitemap: http://www.domain.com/sitemap.xml global User-agent: *
Intermediate & Advanced SEO | | JohnW-UK
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Disallow: /trackback/
Disallow: /feed/
Disallow: /comments/
Disallow: /category//
Disallow: */trackback/
Disallow: */feed/
Disallow: /comments/
Disallow: /?0 -
Google Not Indexing XML Sitemap Images
Hi Mozzers, We are having an issue with our XML sitemap images not being indexed. The site has over 39,000 pages and 17,500 images submitted in GWT. If you take a look at the attached screenshot, 'GWT Images - Not Indexed', you can see that the majority of the pages are being indexed - but none of the images are. The first thing you should know about the images is that they are hosted on a content delivery network (CDN), rather than on the site itself. However, Google advice suggests hosting on a CDN is fine - see second screenshot, 'Google CDN Advice'. That advice says to either (i) ensure the hosting site is verified in GWT or (ii) submit in robots.txt. As we can't verify the hosting site in GWT, we had opted to submit via robots.txt. There are 3 sitemap indexes: 1) http://www.greenplantswap.co.uk/sitemap_index.xml, 2) http://www.greenplantswap.co.uk/sitemap/plant_genera/listings.xml and 3) http://www.greenplantswap.co.uk/sitemap/plant_genera/plants.xml. Each sitemap index is split up into often hundreds or thousands of smaller XML sitemaps. This is necessary due to the size of the site and how we have decided to pull URLs in. Essentially, if we did it another way, it may have involved some of the sitemaps being massive and thus taking upwards of a minute to load. To give you an idea of what is being submitted to Google in one of the sitemaps, please see view-source:http://www.greenplantswap.co.uk/sitemap/plant_genera/4/listings.xml?page=1. Originally, the images were SSL, so we decided to reverted to non-SSL URLs as that was an easy change. But over a week later, that seems to have had no impact. The image URLs are ugly... but should this prevent them from being indexed? The strange thing is that a very small number of images have been indexed - see http://goo.gl/P8GMn. I don't know if this is an anomaly or whether it suggests no issue with how the images have been set up - thus, there may be another issue. Sorry for the long message but I would be extremely grateful for any insight into this. I have tried to offer as much information as I can, however please do let me know if this is not enough. Thank you for taking the time to read and help. Regards, Mark Oz6HzKO rYD3ICZ
Intermediate & Advanced SEO | | edlondon0 -
302 redirects in the sitemap?
My website uses a prefix at the end to instruct the back-end about visitor details. The setup is similar to this site - http://sanfrancisco.giants.mlb.com/index.jsp?c_id=sf with a 302 redirect from the normal link to the one with additional info and a canonical tag on the actual URL without the extra info ((the normal one here being http://sanfrancisco.giants.mlb.com,) However, when I used www.xml-sitemaps.com to create a sitemap they did so using the URLs with the extra info on the links... what should I do to create a sitemap using the normal URLs (which are the ones I want to be promoting)
Intermediate & Advanced SEO | | theLotter0 -
XML Sitemap Index Percentage (Large Sites)
Hi all I'm wanting to find out from those who have experience dealing with large sites (10s/100s of millions of pages). What's a typical (or highest) percentage of indexed pages vs. submitted pages you've seen? This information can be found in webmaster tools where Google shows you the pages submitted & indexed for each of your sitemap. I'm trying to figure out whether, The average index % out there There is a ceiling (i.e. will never reach 100%) It's possible to improve the indexing percentage further Just to give you some background, sitemap index files (according to schema.org) have been implemented to improve crawl efficiency and I'm wanting to find out other ways to improve this further. I've been thinking about looking at the URL parameters to exclude as there are hundreds (e-commerce site) to help Google improve crawl efficiency and utilise the daily crawl quote more effectively to discover pages that have not been discovered yet. However, I'm not sure yet whether this is the best path to take or I'm just flogging a dead horse if there is such a ceiling or if I'm already at the average ballpark for large sites. Any suggestions/insights would be appreciated. Thanks.
Intermediate & Advanced SEO | | danng0 -
Submitting URLs multiple times in different sitemaps
We have a very dynamic site, with a large number of pages. We use a sitemap index file, that points to several smaller sitemap files. The question is: Would there be any issue if we include the same URL in multiple sitemap files? Scenario: URL1 appears on sitemap1. 2 weeks later, the page at URL1 changes and we'd like to update it on a sitemap. Would it be acceptable to add URL1 as an entry in sitemap2? Would there be any issues with the same URL appearing multiple times? Thanks.
Intermediate & Advanced SEO | | msquare0