Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How Do I Generate a Sitemap for a Large Wordpress Site?
-
Hello Everyone!
I am working with a Wordpress site that is in Google news (i.e. everyday we have about 30 new URLs to add to our sitemap) The site has years of articles, resulting in about 200,000 pages on the site. Our strategy so far has been use a sitemap plugin that only generates the last few months of posts, however we want to improve our SEO and submit all the URLs in our site to search engines.
The issue is the plugins we've looked at generate the sitemap on-the-fly. i.e. when you request the sitemap, the plugin then dynamically generates the sitemap. Our site is so large that even a single request for our sitemap.xml ties up tons of server resources and takes an extremely long time to generate the sitemap (if the page doesn't time out in the process).
Does anyone have a solution?
Thanks,
Aaron
-
In my case, xml-sitempas works extremely good. I fully understand that a DB solution would avoid the crawl need, but the features that I get from xml-sitemaps are worth it.
I am running my website on a powerful dedicated server with SSDs, so perhaps that's why I'm not getting any problems plus I set limitations on the generator memory consumption and activated the feature that saves temp files just in case the generation fails.
-
My concern with recommending xml-sitemaps was that I've always had problems getting good, complete maps of extremely large sites. An internal CMS-based tool is grabbing pages straight from the database instead of having to crawl for them.
You've found that it gets you a pretty complete crawl of your 5K-page site, Federico?
-
I would go with the paid solution of xml-sitemaps.
You can set all the resources that you want it to have available, and it will store in temp files to avoid excessive consumption.
It also offers settings to create large sitemaps using a sitemap_index and you could get plugins that create the news sitemap automatically looking for changes since the last sitemap generation.
I have it running in my site with 5K pages (excluding tag pages) and it takes 10 minutes to crawl.
Then you also have plugins that create the sitemaps dynamically, like SEO by Yoast, Google XML Sitemaps, etc.
-
I think the solution to your server resource issue is to create multiple sitemaps, Aaron. Given that the sitemap protocol only allows 50,000 URLs max. per sitemap and Google News sitemaps can't be over 1000 URLs, this was going to be a necessity anyway, so may as well use these limitations to your advantage.
There's a functionality available for sitemaps called a sitemap index. It basically lists all the sitemap.xmls you've created, so the search engines can find and index them. You put it at the root of the site and then link to it in robots.txt just like a regular sitemap. (Can also submit it in GWT). In fact, Yoast's SEO plugin sitemaps and others use just this functionality already for their News add-on.
In your case, you could build the News sitemap dynamically to meet its special requirements (up to 1000 URLs and will crawl only last 2 days of posts) and to ensure it's up-to-the-minute accurate, as is critical for news sites.
Then separately you would build additional, segmented sitemaps for the existing 200,000 pages. Since these are historical pages, you could easily serve them from static files, since they wouldn't need to update once created. By having them static, there's be no server load to serve them each time - only the load to generate the current news sitemap. (I'd actually recommend you keep each static sitemap to around 25,000 pages each to ensure search engines can crawl them easily)
This approach would involve a bit of fiddling to initially set up, as you'd need to generate the "archive" sitemaps then convert them to static versions, but once set up, the News sitemap would take care of itself and once a month (or whatever you decide) you'd need to add the "expiring" pages from the News sitemap to the most recent "archive" segment. A smart programmer might even be able to automate that process.
Does this approach sound like it might solve your problem?
Paul
P.S. Since you'd already have the sitemap index capability, you could also add video and image sitemaps to your site if appropriate.
-
Have you ever tried using a web-based sitemap generator? Not sure how it would respond to your site but at least it would be running on someone else's server, right?
Not sure what else to say honestly.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can't generate a sitemap with all my pages
I am trying to generate a site map for my site nationalcurrencyvalues.com but all the tools I have tried don't get all my 70000 html pages... I have found that the one at check-domains.com crawls all my pages but when it writes the xml file most of them are gone... seemingly randomly. I have used this same site before and it worked without a problem. Can anyone help me understand why this is or point me to a utility that will map all of the pages? Kindly, Greg
Intermediate & Advanced SEO | | Banknotes0 -
Splitting One Site Into Two Sites Best Practices Needed
Okay, working with a large site that, for business reasons beyond organic search, wants to split an existing site in two. So, the old domain name stays and a new one is born with some of the content from the old site, along with some new content of its own. The general idea, for more than just search reasons, is that it makes both the old site and new sites more purely about their respective subject matter. The existing content on the old site that is becoming part of the new site will be 301'd to the new site's domain. So, the old site will have a lot of 301s and links to the new site. No links coming back from the new site to the old site anticipated at this time. Would like any and all insights into any potential pitfalls and best practices for this to come off as well as it can under the circumstances. For instance, should all those links from the old site to the new site be nofollowed, kind of like a non-editorial link to an affiliate or advertiser? Is there weirdness for Google in 301ing to a new domain from some, but not all, content of the old site. Would you individually submit requests to remove from index for the hundreds and hundreds of old site pages moving to the new site or just figure that the 301 will eventually take care of that? Is there substantial organic search risk of any kind to the old site, beyond the obvious of just not having those pages to produce any more? Anything else? Any ideas about how long the new site can expect to wander the wilderness of no organic search traffic? The old site has a 45 domain authority. Thanks!
Intermediate & Advanced SEO | | 945010 -
Sitemap generator which only includes canonical urls
Does anyone know of a 3rd party sitemap generator that will only include the canonical url's? Creating a sitemap with geo and sorting based parameters isn't the most ideal way to generate sitemaps. Please let me know if anyone has any ideas. Mind you we have hundreds of thousands of indexed url's and this can't be done with a simple text editor.
Intermediate & Advanced SEO | | recbrands0 -
Multilingual Sitemaps
Hey there, I have a site with many languages. So here are my questions concerning the sitemaps. The correct way of creating a sitemap for a multilingual site is as followed ( by the official blog of Google ) <urlset xmlns="</span>http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml"> http://www.example.com/loc> <xhtml:link rel="alternate" hreflang="en" href="</span>http://www.example.com/"/> <xhtml:link rel="alternate" hreflang="de" href="</span>http://www.example.com/de"/> <xhtml:link rel="alternate" hreflang="fr" href="</span>http://www.example.com/fr"/><a href=" http:="" www.example.com="" fr"="" target="_blank"></xhtml:link><a href=" http:="" www.example.com="" de"="" target="_blank"></xhtml:link><a href=" http:="" www.example.com="" "="" target="_blank"></xhtml:link><a href=" http:="" www.sitemaps.org="" schemas="" sitemap="" 0.9"="" rel="nofollow" target="_blank"></urlset> **So here is my first question. My site has over 200.000 pages that all of them support around 5-6 languages. Am I suppose to do this example 200.000 times?****My second question is. My root domain is www.example.com but this one redirects with 301 to www.example.com/en should the sitemap be at ****www.example.com/sitemap.xmlorwww.example.com/en/sitemap.xml ???****My third question is as followed. On WMT do I submit my sitemap in all versions of my site? I have all my languages there.**Thanks in advance for taking the time to respond to this thread and by creating it I hope many people will solve their own questions.
Intermediate & Advanced SEO | | Angelos_Savvaidis0 -
Sitemap on a Subdomain
Hi, For various reasons I placed my sitemaps on a subdomain where I keep images and other large files (static.example.com). I then submitted this to Google as a separate site in Webmaster tools. Is this a problem? All of the URLs are for the actual site (www.example.com), the only issue on my end is not being able to look at it all at the same time. But I'm wondering if this would cause any problems on Google's end.
Intermediate & Advanced SEO | | enotes0 -
XML Sitemap for classifieds
I have seeon some trends for sites which do not even use XML sitemp and robots e.g. see this site. How do you see if sitemap is not used. Also for classified websites, should ad pages be included in sitemap because after certain duration those ads will be deleted and google might not be able to crawl. What do you suggest about XML sitemap for classified website.
Intermediate & Advanced SEO | | MozAddict0 -
What Wordpress Update Services Should You Be Using on Your Wordpress Blog?
I have been told that pingomatic.com is all that you need however yesterday I went to a conference and others were recommending to have a good list of pinging services to cover all your bases Here are 4 that have been recommended: pingomatic technorati blogsearch.google.com feedburner Any others that should be included on this list? My goal is not to spam these ping lists however want to make sure my content is getting indexed quickly
Intermediate & Advanced SEO | | webestate0 -
So What On My Site Is Breaking The Google Guidelines?
I have a site that I'm trying to rank for the Keyword "Jigsaw Puzzles" I was originally ranked around #60 or something around there and then all of a sudden my site stopped ranking for that keyword. (My other keyword rankings stayed) Contacted Google via the site reconsideration and got the general response... So I went through and deleted as many links as I could find that I thought Google may not have liked... heck, I even removed links that I don't think I should have JUST so I could have this fixed. I responded with a list of all links I removed and also any links that I've tried to remove, but couldn't for whatever reasons. They are STILL saying my website is breaking the Google guidelines... mainly around links. Can anyone take a peek at my site and see if there's anything on the site that may be breaking the guidelines? (because I can't) Website in question: http://www.yourjigsawpuzzles.co.uk UPDATE: Just to let everyone know that after multiple reconsideration requests, this penalty has been removed. They stated it was a manual penalty. I tried removing numerous different types of links but they kept saying no, it's still breaking rules. It wasn't until I removed some website directory links that they removed this manual penalty. Thought it would be interesting for some of you guys.
Intermediate & Advanced SEO | | RichardTaylor0