Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How Do I Generate a Sitemap for a Large Wordpress Site?
-
Hello Everyone!
I am working with a Wordpress site that is in Google news (i.e. everyday we have about 30 new URLs to add to our sitemap) The site has years of articles, resulting in about 200,000 pages on the site. Our strategy so far has been use a sitemap plugin that only generates the last few months of posts, however we want to improve our SEO and submit all the URLs in our site to search engines.
The issue is the plugins we've looked at generate the sitemap on-the-fly. i.e. when you request the sitemap, the plugin then dynamically generates the sitemap. Our site is so large that even a single request for our sitemap.xml ties up tons of server resources and takes an extremely long time to generate the sitemap (if the page doesn't time out in the process).
Does anyone have a solution?
Thanks,
Aaron
-
In my case, xml-sitempas works extremely good. I fully understand that a DB solution would avoid the crawl need, but the features that I get from xml-sitemaps are worth it.
I am running my website on a powerful dedicated server with SSDs, so perhaps that's why I'm not getting any problems plus I set limitations on the generator memory consumption and activated the feature that saves temp files just in case the generation fails.
-
My concern with recommending xml-sitemaps was that I've always had problems getting good, complete maps of extremely large sites. An internal CMS-based tool is grabbing pages straight from the database instead of having to crawl for them.
You've found that it gets you a pretty complete crawl of your 5K-page site, Federico?
-
I would go with the paid solution of xml-sitemaps.
You can set all the resources that you want it to have available, and it will store in temp files to avoid excessive consumption.
It also offers settings to create large sitemaps using a sitemap_index and you could get plugins that create the news sitemap automatically looking for changes since the last sitemap generation.
I have it running in my site with 5K pages (excluding tag pages) and it takes 10 minutes to crawl.
Then you also have plugins that create the sitemaps dynamically, like SEO by Yoast, Google XML Sitemaps, etc.
-
I think the solution to your server resource issue is to create multiple sitemaps, Aaron. Given that the sitemap protocol only allows 50,000 URLs max. per sitemap and Google News sitemaps can't be over 1000 URLs, this was going to be a necessity anyway, so may as well use these limitations to your advantage.
There's a functionality available for sitemaps called a sitemap index. It basically lists all the sitemap.xmls you've created, so the search engines can find and index them. You put it at the root of the site and then link to it in robots.txt just like a regular sitemap. (Can also submit it in GWT). In fact, Yoast's SEO plugin sitemaps and others use just this functionality already for their News add-on.
In your case, you could build the News sitemap dynamically to meet its special requirements (up to 1000 URLs and will crawl only last 2 days of posts) and to ensure it's up-to-the-minute accurate, as is critical for news sites.
Then separately you would build additional, segmented sitemaps for the existing 200,000 pages. Since these are historical pages, you could easily serve them from static files, since they wouldn't need to update once created. By having them static, there's be no server load to serve them each time - only the load to generate the current news sitemap. (I'd actually recommend you keep each static sitemap to around 25,000 pages each to ensure search engines can crawl them easily)
This approach would involve a bit of fiddling to initially set up, as you'd need to generate the "archive" sitemaps then convert them to static versions, but once set up, the News sitemap would take care of itself and once a month (or whatever you decide) you'd need to add the "expiring" pages from the News sitemap to the most recent "archive" segment. A smart programmer might even be able to automate that process.
Does this approach sound like it might solve your problem?
Paul
P.S. Since you'd already have the sitemap index capability, you could also add video and image sitemaps to your site if appropriate.
-
Have you ever tried using a web-based sitemap generator? Not sure how it would respond to your site but at least it would be running on someone else's server, right?
Not sure what else to say honestly.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Adult Toys Sites
Does anyone know of any changes SEOwise when running an adult toy site versus a normal eCommerce site? Is there any tips or suggestions that are worth knowing to achieve rankings faster? Thanks,
Intermediate & Advanced SEO | | the-gate-films0 -
Moving to a new site while keeping old site live
For reasons I won't get into here, I need to move most of my site to a new domain (DOMAIN B) while keeping every single current detail on the old domain (DOMAIN A) as it is. Meaning, there will be 2 live websites that have mostly the same content, but I want the content to appear to search engines as though it now belongs to DOMAIN B. Weird situation. I know. I've run around in circles trying to figure out the best course of action. What do you think is the best way of going about this? Do I simply point DOMAIN A's canonical tags to the copied content on DOMAIN B and call it good? Should I ask sites that link to DOMAIN A to change their links to DOMAIN B, or start fresh and cut my losses? Should I still file a change of address with GWT, even though I'm not going to 301 redirect anything?
Intermediate & Advanced SEO | | kdaniels0 -
Hreflang in vs. sitemap?
Hi all, I decided to identify alternate language pages of my site via sitemap to save our development team some time. I also like the idea of having leaner markup. However, my site has many alternate language and country page variations, so after creating a sitemap that includes mostly tier 1 and tier 2 level URLs, i now have a sitemap file that's 17mb. I did a couple google searches to see is sitemap file size can ever be an issue and found a discussion or two that suggested keeping the size small and a really old article that recommended keeping it < 10mb. Does the sitemap file size matter? GWT has verified the sitemap and appears to be indexing the URLs fine. Are there any particular benefits to specifying alternate versions of a URL in vs. sitemap? Thanks, -Eugene
Intermediate & Advanced SEO | | eugene_bgb0 -
Urls missing from product_cat sitemap
I'm using Yoast SEO plugin to generate XML sitemaps on my e-commerce site (woocommerce). I recently changed the category structure and now only 25 of about 75 product categories are included. Is there a way to manually include urls or what is the best way to have them all indexed in the sitemap?
Intermediate & Advanced SEO | | kisen0 -
XML Sitemap for classifieds
I have seeon some trends for sites which do not even use XML sitemp and robots e.g. see this site. How do you see if sitemap is not used. Also for classified websites, should ad pages be included in sitemap because after certain duration those ads will be deleted and google might not be able to crawl. What do you suggest about XML sitemap for classified website.
Intermediate & Advanced SEO | | MozAddict0 -
Why does a site have no domain authority?
A website was built and launched eight months ago, and their domain authority is 1. When a site has been live for a while and has such a low DA, what's causing it?
Intermediate & Advanced SEO | | optimalwebinc0 -
How important are sitemap errors?
If there aren't any crawling / indexing issues with your site, how important do thing sitemap errors are? Do you work to always fix all errors? I know here: http://www.seomoz.org/blog/bings-duane-forrester-on-webmaster-tools-metrics-and-sitemap-quality-thresholds Duane Forrester mentions that sites with many 302's 301's will be punished--does any one know Googe's take on this?
Intermediate & Advanced SEO | | nicole.healthline0 -
Franchise sites on subdomains
I've been asked by a client to optimise a a webpage for a location i.e. London. Turns out that the location is actually a franchise of the main company. When the company launch a new franchise, so far they have simply added a new page to the main site, for example: mysite.co.uk/sub-folder/london They have so far done this for 10 or so franchises and task someone with optimising that page for their main keyword + location. I think I know the answer to this, but would like to get a back up / additional info on it in terms of ranking / seo benefits. I am going to suggest the idea of using a subdomain for each location, example: london.mysite.co.uk Would this be the correct approach. If you think yes, why? Many thanks,
Intermediate & Advanced SEO | | Webrevolve0