How Do I Generate a Sitemap for a Large Wordpress Site?
-
Hello Everyone!
I am working with a Wordpress site that is in Google news (i.e. everyday we have about 30 new URLs to add to our sitemap) The site has years of articles, resulting in about 200,000 pages on the site. Our strategy so far has been use a sitemap plugin that only generates the last few months of posts, however we want to improve our SEO and submit all the URLs in our site to search engines.
The issue is the plugins we've looked at generate the sitemap on-the-fly. i.e. when you request the sitemap, the plugin then dynamically generates the sitemap. Our site is so large that even a single request for our sitemap.xml ties up tons of server resources and takes an extremely long time to generate the sitemap (if the page doesn't time out in the process).
Does anyone have a solution?
Thanks,
Aaron
-
In my case, xml-sitempas works extremely good. I fully understand that a DB solution would avoid the crawl need, but the features that I get from xml-sitemaps are worth it.
I am running my website on a powerful dedicated server with SSDs, so perhaps that's why I'm not getting any problems plus I set limitations on the generator memory consumption and activated the feature that saves temp files just in case the generation fails.
-
My concern with recommending xml-sitemaps was that I've always had problems getting good, complete maps of extremely large sites. An internal CMS-based tool is grabbing pages straight from the database instead of having to crawl for them.
You've found that it gets you a pretty complete crawl of your 5K-page site, Federico?
-
I would go with the paid solution of xml-sitemaps.
You can set all the resources that you want it to have available, and it will store in temp files to avoid excessive consumption.
It also offers settings to create large sitemaps using a sitemap_index and you could get plugins that create the news sitemap automatically looking for changes since the last sitemap generation.
I have it running in my site with 5K pages (excluding tag pages) and it takes 10 minutes to crawl.
Then you also have plugins that create the sitemaps dynamically, like SEO by Yoast, Google XML Sitemaps, etc.
-
I think the solution to your server resource issue is to create multiple sitemaps, Aaron. Given that the sitemap protocol only allows 50,000 URLs max. per sitemap and Google News sitemaps can't be over 1000 URLs, this was going to be a necessity anyway, so may as well use these limitations to your advantage.
There's a functionality available for sitemaps called a sitemap index. It basically lists all the sitemap.xmls you've created, so the search engines can find and index them. You put it at the root of the site and then link to it in robots.txt just like a regular sitemap. (Can also submit it in GWT). In fact, Yoast's SEO plugin sitemaps and others use just this functionality already for their News add-on.
In your case, you could build the News sitemap dynamically to meet its special requirements (up to 1000 URLs and will crawl only last 2 days of posts) and to ensure it's up-to-the-minute accurate, as is critical for news sites.
Then separately you would build additional, segmented sitemaps for the existing 200,000 pages. Since these are historical pages, you could easily serve them from static files, since they wouldn't need to update once created. By having them static, there's be no server load to serve them each time - only the load to generate the current news sitemap. (I'd actually recommend you keep each static sitemap to around 25,000 pages each to ensure search engines can crawl them easily)
This approach would involve a bit of fiddling to initially set up, as you'd need to generate the "archive" sitemaps then convert them to static versions, but once set up, the News sitemap would take care of itself and once a month (or whatever you decide) you'd need to add the "expiring" pages from the News sitemap to the most recent "archive" segment. A smart programmer might even be able to automate that process.
Does this approach sound like it might solve your problem?
Paul
P.S. Since you'd already have the sitemap index capability, you could also add video and image sitemaps to your site if appropriate.
-
Have you ever tried using a web-based sitemap generator? Not sure how it would respond to your site but at least it would be running on someone else's server, right?
Not sure what else to say honestly.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Canonical tag on a large site
when would you reccomend using a canonical tag on a large site?
Intermediate & Advanced SEO | | Cristiana.Solinas0 -
Best server-side xml sitemap generator?
I have tried xml-sitemaps which tends to crash when spidering my site(s) and requires multiple manual resumes which aren't practical for our businesses. Please let me know if any other server-side generators that could be used on multiple enterprise-sized websites exist that could be a good fit. Image sitemaps would also be helpfu.l +++One with multiple starting URLs would help spidering/indexing the most important sections of our sites. Also, has anyone heard of or used Dyno Mapper? This also looks like a good solution for us, but was wondering if anyone has had any experience with this product.
Intermediate & Advanced SEO | | recbrands0 -
Sitemap Query
I've decided to write my own sitemap because frankly, the automated ones pull all kinds of out of I don't know where. So to get around that, manual it is. But I have some products appear in various categories, should I still list every product in each category in the sitemap, regardless of some being duplicates, or should I choose the most relevant category and list them there? I do have a canonical URL extension which should resolve any duplicate content I have.
Intermediate & Advanced SEO | | moon-boots0 -
Is this site worth subscribing to?
Hi everyone is, the below site worthwhile submitting to? I see one of our competitors is on here and the article they have published has in turn be picked up by other sites. Is the financial cost worth the back link reward? https://app.prweb.com/Main.aspx?Entity=Home
Intermediate & Advanced SEO | | Hardley10 -
How do you find a truely knowledgable SEO person to analyze are large site?
We are a large site, 5600 pages with local pages in almost every city across the US. We are struggling with page rank on some pages and I dont think its as simple as backlinks and its definitely not poor on-page SEO. I think we might have some truly technical issues that is causing us to get penalized in SERP's. Any agencies which analyze sites? This is NOT a job posting so please don't send me messages...I truly want to know how/where to find a solution to our problem. Thanks
Intermediate & Advanced SEO | | CTSupp0 -
Understanding the levels in my site
How can I figure out which pages are on the same level on my site ? I created an automatic sitemap with a software online but it doesn't tell me abc page is on the 1 st level, xyz page is on the second level etc... and I have a hard time figuring out if my main menu is on the same level as my drop down menu as it is visible on the same page. Is there anyway to figure what which pages are on the same level ?
Intermediate & Advanced SEO | | seoanalytics0 -
Multiple sites in the same niche
Hi All A question regarding multiple sites in the same niche... If I have say 10 sites all targetting the same niche yet all on different C-class IPs with different hosts, registrars, whois data and ages can I use the same template, or will Google discern a pattern? Basically I have developed a WordPress template which I want to use on the sites albeit with different logos / brand colours. NB/ All of the 10 sites will have unique, original content and they will NOT be interlinked
Intermediate & Advanced SEO | | danielparry1 -
Press Release Sites
Ok, I am getting a lot of conflicting information about press release sites. i have been doing press release's for a while (mostly manually), I have also tried a few companies that claim to do it well (never do). After the Panda update the PR sites I have been using are just not as effective. Does anyone else have this problem or are there better PR sites that can be recommended.
Intermediate & Advanced SEO | | TomBarker820