How Do I Generate a Sitemap for a Large Wordpress Site?
-
Hello Everyone!
I am working with a Wordpress site that is in Google news (i.e. everyday we have about 30 new URLs to add to our sitemap) The site has years of articles, resulting in about 200,000 pages on the site. Our strategy so far has been use a sitemap plugin that only generates the last few months of posts, however we want to improve our SEO and submit all the URLs in our site to search engines.
The issue is the plugins we've looked at generate the sitemap on-the-fly. i.e. when you request the sitemap, the plugin then dynamically generates the sitemap. Our site is so large that even a single request for our sitemap.xml ties up tons of server resources and takes an extremely long time to generate the sitemap (if the page doesn't time out in the process).
Does anyone have a solution?
Thanks,
Aaron
-
In my case, xml-sitempas works extremely good. I fully understand that a DB solution would avoid the crawl need, but the features that I get from xml-sitemaps are worth it.
I am running my website on a powerful dedicated server with SSDs, so perhaps that's why I'm not getting any problems plus I set limitations on the generator memory consumption and activated the feature that saves temp files just in case the generation fails.
-
My concern with recommending xml-sitemaps was that I've always had problems getting good, complete maps of extremely large sites. An internal CMS-based tool is grabbing pages straight from the database instead of having to crawl for them.
You've found that it gets you a pretty complete crawl of your 5K-page site, Federico?
-
I would go with the paid solution of xml-sitemaps.
You can set all the resources that you want it to have available, and it will store in temp files to avoid excessive consumption.
It also offers settings to create large sitemaps using a sitemap_index and you could get plugins that create the news sitemap automatically looking for changes since the last sitemap generation.
I have it running in my site with 5K pages (excluding tag pages) and it takes 10 minutes to crawl.
Then you also have plugins that create the sitemaps dynamically, like SEO by Yoast, Google XML Sitemaps, etc.
-
I think the solution to your server resource issue is to create multiple sitemaps, Aaron. Given that the sitemap protocol only allows 50,000 URLs max. per sitemap and Google News sitemaps can't be over 1000 URLs, this was going to be a necessity anyway, so may as well use these limitations to your advantage.
There's a functionality available for sitemaps called a sitemap index. It basically lists all the sitemap.xmls you've created, so the search engines can find and index them. You put it at the root of the site and then link to it in robots.txt just like a regular sitemap. (Can also submit it in GWT). In fact, Yoast's SEO plugin sitemaps and others use just this functionality already for their News add-on.
In your case, you could build the News sitemap dynamically to meet its special requirements (up to 1000 URLs and will crawl only last 2 days of posts) and to ensure it's up-to-the-minute accurate, as is critical for news sites.
Then separately you would build additional, segmented sitemaps for the existing 200,000 pages. Since these are historical pages, you could easily serve them from static files, since they wouldn't need to update once created. By having them static, there's be no server load to serve them each time - only the load to generate the current news sitemap. (I'd actually recommend you keep each static sitemap to around 25,000 pages each to ensure search engines can crawl them easily)
This approach would involve a bit of fiddling to initially set up, as you'd need to generate the "archive" sitemaps then convert them to static versions, but once set up, the News sitemap would take care of itself and once a month (or whatever you decide) you'd need to add the "expiring" pages from the News sitemap to the most recent "archive" segment. A smart programmer might even be able to automate that process.
Does this approach sound like it might solve your problem?
Paul
P.S. Since you'd already have the sitemap index capability, you could also add video and image sitemaps to your site if appropriate.
-
Have you ever tried using a web-based sitemap generator? Not sure how it would respond to your site but at least it would be running on someone else's server, right?
Not sure what else to say honestly.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why does some sites rank with no seo
Why is it that some site rank with zero efforts? I have been working on some seo for a while on my main site and i have been getting more info competition analysis with sem and moz. Looking at the states from this website which tends to popup often in the searches on page 1-2 before my site. This site is not keyword optimized, meaning they arent even trying to rank.
Intermediate & Advanced SEO | | CooperStrzelecki
There is no content, articles etc.,
6 backlinks (nothing powerful just 2 directory links and 2 from developer)
Site really isnt even designed to get traffic as its a trade only ecommerce website
I doubt they are hiding anything as far as backlinks etc. as it will get them too many visitors they dont want
The city i am searching isnt even on the page (it is a city within a city so maybe google still relates it)
PA 24 DA 15 Now my site:
Optimized reasearched keywords
175 backlinks
All my main pages have content with images, alt tags, internal linking
full of content, blogs, videos, products (probably 4000, could a site being too big be an issue?)
Site gets regular updates
I probably have 200 citations
All the social media which gets done often
PA 32 DA 20 They do get a good bit of traffic but that is probably the only thing i would see but it would be direct traffic mostly i believe as it would be people going to order regularly since it is a print reseller. They may have some age on me 15 vs 8 years. Could it be some kind of penalty i am not sure about lingering? According to what i know to check everyything looks ok, no shady links accoding to sem. I am working more and more on all the pages but this competittion site really doesnt have crap going on probably 8 pages and 1 page does all the ordering. What the hell does google want from me exactly!0 -
Sitemaps on the fly
Has anyone submitted pages that generate sitemaps on the fly as opposed to only submitting static XML files to Bing? For instance, sitemap.php vs sitemap.xml, video sitemap.php vs videositemap.xml?
Intermediate & Advanced SEO | | alhallinan0 -
Will an inbound follow link on a site be devalued by an inbound affiliate link on the same site?
Hey guys, quick question I didn't find an answer to online. Scenario: 1. Site A links to Site B. It's a natural, regular, follow-link 2. Site A joins Site B's affiliate program, and adds an affiliate link Question: Does the first, regular follow link get devalued by the second affiliate link? Cheers!
Intermediate & Advanced SEO | | ipancake0 -
Mobile Sitemaps
We are planning on creating a mobile site using a different URL. Our current sitemap creator won't dynamically detect mobile pages using the rel="alternate" tag but can can create a Project for that domain in Sitemap Creator and use the "mobile" option when you export it. The Sitemap Creator will then insert the mobile:mobilecontent tag for all the URLs in that sitemap. </mobile:mobile> Is this okay or will it cause problems?
Intermediate & Advanced SEO | | theLotter0 -
Broken sitemaps vs no sitemaps at all?
The site I am working on is enormous. We have 71 sitemap files, all linked to from a sitemap index file. The sitemaps are not up to par with "best practices" yet, and realistically it may be another month or so until we get them cleaned up. I'm wondering if, for the time being, we should just remove the sitemaps from Webmaster Tools altogether. They are currently "broken", and I know that sitemaps are not mandatory. Perhaps they're doing more harm than good at this point? According to Webmaster Tools, there are 8,398,082 "warnings" associated with the sitemap, many of which seem to be related to URLs being linked to that are blocked by robots.txt. I was thinking that I could remove them and then keep a close eye on the crawl errors/index status to see if anything changes. Is there any reason why I shouldn't remove these from Webmaster Tools until we get the sitemaps up to par with best practices?
Intermediate & Advanced SEO | | edmundsseo0 -
Sites banned from Google?
How do you find out sites banned from Google? I know how to find out sites no longer cached, or is it the same thing once deindexed? As always aprpeciate your advice everyone.
Intermediate & Advanced SEO | | pauledwards0 -
Sitemap or Sitemaps for Magento and Wordpress?
I'm trying to figure out what to do with our sitemap situation. We have a magento install for our shopping cart
Intermediate & Advanced SEO | | chrishansen
sdhydroponics.com
and a wordpress install on
sdhydroponics.com/resources In Magento we get the XML sitemap manually by going to Catalog => Google Sitemap => Add Sitemap In wordpress we use Google XML sitemaps plugin. My questions are: Do I need both of these sitemaps? Or can I use one or the other? If I use both, do I make one sitemap1.xml and the other sitemap2.xml and drop them in the root? How do I make sure google knows I have 2 sitemaps? Anything else I should know? Thank You0 -
One page wordpress site - what are the steps for SEO
Hello, I am launching 5 sites with keyword exact domains. I am developing the sites on wordpress as one page sales funnel sites. What do I need to do to optimize my sites? Really appreciate any bullet points or directions. Tks
Intermediate & Advanced SEO | | brianmaher0