How Do I Generate a Sitemap for a Large Wordpress Site?
-
Hello Everyone!
I am working with a Wordpress site that is in Google news (i.e. everyday we have about 30 new URLs to add to our sitemap) The site has years of articles, resulting in about 200,000 pages on the site. Our strategy so far has been use a sitemap plugin that only generates the last few months of posts, however we want to improve our SEO and submit all the URLs in our site to search engines.
The issue is the plugins we've looked at generate the sitemap on-the-fly. i.e. when you request the sitemap, the plugin then dynamically generates the sitemap. Our site is so large that even a single request for our sitemap.xml ties up tons of server resources and takes an extremely long time to generate the sitemap (if the page doesn't time out in the process).
Does anyone have a solution?
Thanks,
Aaron
-
In my case, xml-sitempas works extremely good. I fully understand that a DB solution would avoid the crawl need, but the features that I get from xml-sitemaps are worth it.
I am running my website on a powerful dedicated server with SSDs, so perhaps that's why I'm not getting any problems plus I set limitations on the generator memory consumption and activated the feature that saves temp files just in case the generation fails.
-
My concern with recommending xml-sitemaps was that I've always had problems getting good, complete maps of extremely large sites. An internal CMS-based tool is grabbing pages straight from the database instead of having to crawl for them.
You've found that it gets you a pretty complete crawl of your 5K-page site, Federico?
-
I would go with the paid solution of xml-sitemaps.
You can set all the resources that you want it to have available, and it will store in temp files to avoid excessive consumption.
It also offers settings to create large sitemaps using a sitemap_index and you could get plugins that create the news sitemap automatically looking for changes since the last sitemap generation.
I have it running in my site with 5K pages (excluding tag pages) and it takes 10 minutes to crawl.
Then you also have plugins that create the sitemaps dynamically, like SEO by Yoast, Google XML Sitemaps, etc.
-
I think the solution to your server resource issue is to create multiple sitemaps, Aaron. Given that the sitemap protocol only allows 50,000 URLs max. per sitemap and Google News sitemaps can't be over 1000 URLs, this was going to be a necessity anyway, so may as well use these limitations to your advantage.
There's a functionality available for sitemaps called a sitemap index. It basically lists all the sitemap.xmls you've created, so the search engines can find and index them. You put it at the root of the site and then link to it in robots.txt just like a regular sitemap. (Can also submit it in GWT). In fact, Yoast's SEO plugin sitemaps and others use just this functionality already for their News add-on.
In your case, you could build the News sitemap dynamically to meet its special requirements (up to 1000 URLs and will crawl only last 2 days of posts) and to ensure it's up-to-the-minute accurate, as is critical for news sites.
Then separately you would build additional, segmented sitemaps for the existing 200,000 pages. Since these are historical pages, you could easily serve them from static files, since they wouldn't need to update once created. By having them static, there's be no server load to serve them each time - only the load to generate the current news sitemap. (I'd actually recommend you keep each static sitemap to around 25,000 pages each to ensure search engines can crawl them easily)
This approach would involve a bit of fiddling to initially set up, as you'd need to generate the "archive" sitemaps then convert them to static versions, but once set up, the News sitemap would take care of itself and once a month (or whatever you decide) you'd need to add the "expiring" pages from the News sitemap to the most recent "archive" segment. A smart programmer might even be able to automate that process.
Does this approach sound like it might solve your problem?
Paul
P.S. Since you'd already have the sitemap index capability, you could also add video and image sitemaps to your site if appropriate.
-
Have you ever tried using a web-based sitemap generator? Not sure how it would respond to your site but at least it would be running on someone else's server, right?
Not sure what else to say honestly.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Site not progressing at all....
We relaunched our site almost a year ago after our old site dropped out of ranking due to what we think was overused anchor text.... We transferred over the content to the new site, but started fresh in terms of links etc. And did not redirect the old site. Since the launch we have focused on producing good content and social, but the site has made no progress at all. The only factor I can think off is that one site linked to us from all of their pages, which we asked them to remove which they did over 3 months ago, but still showing in Webmaster tools.... Any help would be appreciated. Thanks
Intermediate & Advanced SEO | | jj34340 -
301s from previous site
Hi! Got quite a tricky problem regarding a client, http://www.muchbetteradventures.com/ and their previous site, http://v1.muchbetteradventures.com/ Here's the background: We have approx 1500 'listing' pages like this: http://v1.muchbetteradventures.com/listing/view/1925/the-barre-des-ecrins-or-the-dome-des-ecrins-mountaineering-trip They bring in min 2k hits/month, and also add to the overall site authority I suspect. They will eventually all have a home on main domain. When they do, they will also each have been rewritten to be unique, so the value of them will increase (many are currently not). We also have landing pages like this: http://v1.muchbetteradventures.com/view/559/volunteering-holidays- which despite being hideous are ranked fairly well (page 1 for key terms). We cannot currently fulfil all these on main domain, but do not want to shut them down and lose positioning. Choices as I see it: Make a landing page e.g. muchbetteradventures.com/volunteering and a) redirect from old landing page, b) redirect all related 'listings' to this page. May help preserve rankings of main landing page (the most important), but not of any listings? Import all listings to have a home on main domain, (probably as children of a landing page, but not rewritten to be unique just yet). Make them not accessible from homepage, and change functionality of them so that new visitors from google are told we cannot currently help them with this trip. This is more work to complete so will take longer to do and is a distraction from our core focus so needs good justification! Stay running largely as we are, slowly redirecting 1 page at a time as we carry over more and more options to main domain. This will take over 12 months min.
Intermediate & Advanced SEO | | neooptic0 -
Does Google hate wordpress?
I have my categories pages set to noindex, follow. I deactivated the author and date based archives, and all the /page/2 /page/3 are noindex. Is this the right approach? I had thought about adding some text to the topic of each category page and then changing them to index. I'm using showing recent post excerpts on the homepage. Another other suggestions? I think two of my sites are in panda for no good reason. It seems like non-wordpress blogs in my industry do better than comparable wordpress sites.
Intermediate & Advanced SEO | | KateV0 -
Mobile Sitemaps
We are planning on creating a mobile site using a different URL. Our current sitemap creator won't dynamically detect mobile pages using the rel="alternate" tag but can can create a Project for that domain in Sitemap Creator and use the "mobile" option when you export it. The Sitemap Creator will then insert the mobile:mobilecontent tag for all the URLs in that sitemap. </mobile:mobile> Is this okay or will it cause problems?
Intermediate & Advanced SEO | | theLotter0 -
Wordpress No 404
Hello, My issue is that in wordpress 404 does not seem to be working properly. An example of this is: sitename.com/category/catname loads the files in that category but I can also type sitename.com/category/asdasfaasd/catname and it still goes to the posts in that category and does not 404. I can replace the misc text with anything and it does not 404. My worry is that this can be used to exploit duplicate content. I've looked at a couple of other sites and they do the same. I'm using Yoast as my SEO plugin and my theme is elogix from themeforest. I've tried disabling all plugins, cloudflare and changing theme and the same issue exists. If anyone can help it would be extremely appreciated.
Intermediate & Advanced SEO | | LukeHutchinson0 -
Please review my site
Hi I hope that all is going well in Seattle! I just make this site and I would like to be judged! site is http://mangakaotaku.com I am open for recommendations and review. thanks
Intermediate & Advanced SEO | | nyanainc0 -
Wordpress or Joomla? Discussion
Hi All I'm about to start on a new project where I've been having lots of discussions with the developers involved on the merits of both wordpress and joomla. I'm experienced with wordpress but haven't really done too much with Joomla. I've found some general info on Joomla online, most issues seems to be around duplicate content, but can't seem to find too much else. Therefore I thought I'd throw it out there for discussion as I'd love to hear from those of you who have used both CMS's and the drawbacks/ pitfalls or plus points in both. The project is based around a non transactional site, offering a service, but no product. There's lots of thought leadership type content planned, either through interviews, surveys, articles, video etc, and some linkbait etc. Lot of content will also be newsworthy so keep Google news etc in the back of your mind too. Lots of social integration too... Looking forward to hearing what you might have to say Mozzers.
Intermediate & Advanced SEO | | PerchDigital1 -
Multiple sitemaps for one site?
Excuse my sitemap ignorance here. I've got a site and it's got a blog in a sub-folder. The blog gets updated frequently, the main site does not. Is it best to; a) Have 2 sitemaps.. one in the root and one in the /blog folder. b) Have 1 sitemap that is regularly updated The reason being, I know there's various plugins that create blog sitemaps on the fly, so that would be much easier than updating the main sitemap every time a change was made. If the answer is 2 sitemaps; Would you stop the root sitemap from detailing the contents of the blog folder or just update it every so often with the contents of the blog folder?
Intermediate & Advanced SEO | | PeterAlexLeigh0