How Do I Generate a Sitemap for a Large Wordpress Site?
-
Hello Everyone!
I am working with a Wordpress site that is in Google news (i.e. everyday we have about 30 new URLs to add to our sitemap) The site has years of articles, resulting in about 200,000 pages on the site. Our strategy so far has been use a sitemap plugin that only generates the last few months of posts, however we want to improve our SEO and submit all the URLs in our site to search engines.
The issue is the plugins we've looked at generate the sitemap on-the-fly. i.e. when you request the sitemap, the plugin then dynamically generates the sitemap. Our site is so large that even a single request for our sitemap.xml ties up tons of server resources and takes an extremely long time to generate the sitemap (if the page doesn't time out in the process).
Does anyone have a solution?
Thanks,
Aaron
-
In my case, xml-sitempas works extremely good. I fully understand that a DB solution would avoid the crawl need, but the features that I get from xml-sitemaps are worth it.
I am running my website on a powerful dedicated server with SSDs, so perhaps that's why I'm not getting any problems plus I set limitations on the generator memory consumption and activated the feature that saves temp files just in case the generation fails.
-
My concern with recommending xml-sitemaps was that I've always had problems getting good, complete maps of extremely large sites. An internal CMS-based tool is grabbing pages straight from the database instead of having to crawl for them.
You've found that it gets you a pretty complete crawl of your 5K-page site, Federico?
-
I would go with the paid solution of xml-sitemaps.
You can set all the resources that you want it to have available, and it will store in temp files to avoid excessive consumption.
It also offers settings to create large sitemaps using a sitemap_index and you could get plugins that create the news sitemap automatically looking for changes since the last sitemap generation.
I have it running in my site with 5K pages (excluding tag pages) and it takes 10 minutes to crawl.
Then you also have plugins that create the sitemaps dynamically, like SEO by Yoast, Google XML Sitemaps, etc.
-
I think the solution to your server resource issue is to create multiple sitemaps, Aaron. Given that the sitemap protocol only allows 50,000 URLs max. per sitemap and Google News sitemaps can't be over 1000 URLs, this was going to be a necessity anyway, so may as well use these limitations to your advantage.
There's a functionality available for sitemaps called a sitemap index. It basically lists all the sitemap.xmls you've created, so the search engines can find and index them. You put it at the root of the site and then link to it in robots.txt just like a regular sitemap. (Can also submit it in GWT). In fact, Yoast's SEO plugin sitemaps and others use just this functionality already for their News add-on.
In your case, you could build the News sitemap dynamically to meet its special requirements (up to 1000 URLs and will crawl only last 2 days of posts) and to ensure it's up-to-the-minute accurate, as is critical for news sites.
Then separately you would build additional, segmented sitemaps for the existing 200,000 pages. Since these are historical pages, you could easily serve them from static files, since they wouldn't need to update once created. By having them static, there's be no server load to serve them each time - only the load to generate the current news sitemap. (I'd actually recommend you keep each static sitemap to around 25,000 pages each to ensure search engines can crawl them easily)
This approach would involve a bit of fiddling to initially set up, as you'd need to generate the "archive" sitemaps then convert them to static versions, but once set up, the News sitemap would take care of itself and once a month (or whatever you decide) you'd need to add the "expiring" pages from the News sitemap to the most recent "archive" segment. A smart programmer might even be able to automate that process.
Does this approach sound like it might solve your problem?
Paul
P.S. Since you'd already have the sitemap index capability, you could also add video and image sitemaps to your site if appropriate.
-
Have you ever tried using a web-based sitemap generator? Not sure how it would respond to your site but at least it would be running on someone else's server, right?
Not sure what else to say honestly.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Best sitemap generator that can automatically create and submit
I like screamingfrog but they don't automatically generate and submit to google. We use xml-sitemaps.org but they don't have all the functions and they crawl slow too. Can you recommend some good sitemap generator that is fast, with features and can automatically create and submit? Is inspyder good?
Intermediate & Advanced SEO | | rbai0 -
Migrate site from HTML to Wordpress and retain SEO
Hi guys this is a 2 part question so hoping someone is able to assist! 🙂 I own the www.industrytix.com.au/ website which ive been updating manually in Dreamweaver for last 6+ years, it has very high Organic Rankings for most of my targeted keywords which are: industry tix
Intermediate & Advanced SEO | | IsaCleanse
Industry tickets
breakfest tickets
Stereosonic tickets
CUban Club tickets
etc etc - essentially names of events Im currently about 90% of the way through replicating/improving the content of the website using a Wordpress Theme which is located at www.industrytix.com.au/buy-tickets/ so all the URLs/Sites are currently running concurrently. Im using Eventum Theme for hosting events and Woocommerce plugin for products so there is a slighy disconnect between the 'Event Page' and 'Buy tickets/Product page" **For example:
Event page: **http://industrytix.com.au/buy-tickets/breakfest-perth/ Ticket/product page: http://industrytix.com.au/buy-tickets/product/breakfest-ticket-perth/ Next step is to kill off the old outdated homepage and recirrect all the event pages to the new ones - for retaining SEO value is there a best practices for completing this? (I am planning to move the New WP Installation into the root folder from the /buy-tickets/ folder where its currently staged. For example of OLD to NEW redirrection requirements:
OLD http://www.industrytix.com.au/cuban-club-perth-tickets.php
NEW http://industrytix.com.au/buy-tickets/cuban-club-perth-nyd/ OLD http://www.industrytix.com.au/breakfest-tickets.php
NEW EVENT PAGE http://industrytix.com.au/buy-tickets/breakfest-perth/
Nicket/product page: http://industrytix.com.au/buy-tickets/product/breakfest-ticket-perth/ Any other feedback improvements as far as retaining SEO and not keyword stuffing etc? Thanking you all in advance for taking the time to read this 🙂0 -
Large sites linking to us in their menu
Hello, I am digging in to our in-links in WMT and notice that we have a number of sites that link to us in their menu or every page on their site, making for hundreds of thousands of links to our site. Here is an example:
Intermediate & Advanced SEO | | evansluke
http://www.askthebookie.com/ (in the topmost right menu there is a link to our forums) http://www.covers.com/postingforum/ There are about a dozen or so sites that link to our homepage tens-of-thousands of times. Should I disavow them, or is this be viewed as a legitimate link? Thanks in advance for any help!0 -
3 Wordpress sites 1 Tumblr site coming under 1domain(4subdomains) WPMU: Proper Redirect?
Hey Guys, witnessSF.org (WP), witnessLA.org(Tumblr), witnessTO.com(WP), witnessHK.com(WP), and witnessSEOUL.com(new site no redirects needed) are being moved over to sf.ourwitness.com, la.ourwitness.com and so forth. All under on large Wordpress MU instance. Some have hundreds of articles/links others a bit less. What is the best method to take, I understand there are easy redirects, and the complete fully manual one link at a time approach. Even the WP to WP the permalinks are changing from domain.com/date/post-name to domain.com/post-name? Here are some options: Just redirect all previous witinessla.org/* to la.ourwitness.org/ (automatic direct all pages to home page deal) (easiest not the best)2) Download Google Analytics top redirected domains about 50 urls have significant ranking and traffic (in LA's sample) and just redirect those to custom links. (most bang for the buck for the articles that rank manually set up to the correct place) 3) Best of the both worlds may be possible? Automated perhaps?I prefer working with .htaccess vs a redirect plugin for speed issues. Please advise. Thanks guys!
Intermediate & Advanced SEO | | vmialik0 -
Troubled QA Platform - Site Map vs Site Structure
I'm running a Q&A forum that was built prioritizing UX over SEO. This decision has cause a bit of a headache as we're 6 months into the project with 2278 Q&A pages with extremely minimal traffic coming from search engines. The structure has the following hiccups: A. The category navigation from the main Q&A page is entirely javascript and only navigable by users. B. We identify Google bots and send them to another version of the Q&A platform w/o javascript. Category links don't exist in this google bot version of the main Q&A page. On this Google version of the main Q&A page, the Pinterest-like tiles displaying individual Q&As are capped at 10. This means that the only way google bot can identify link juice being passed down to individual QAs (after we've directed them to this page) is through 10 random Q&As. C. All 2278 of the QAs are currently indexed in search. They are just indexed very very poorly in SERPs. My personal assumption, is that Google can't pass link juice to any of the Q&As (poor SERP) but registers them from the site map so it gets included in Google's index. My dilemma has me struggling between two different decisions: 1. Update the navigation in the header to remove the javascript and fundamentally change the look and feel of the Q&A platform. This will allow Google bot to navigate through Expert category links to pass link juice to all Q&As. or 2. Update the redirected main Q&A page to include hard coded category links with 100s of hard coded Q&As under each category page. Make it similar, ugly, flat and efficient for the crawling bots. Any suggestions would be greatly appreciated. I need to find a solution as soon as possible.
Intermediate & Advanced SEO | | TQContent0 -
Domain and Sitemap Question
Hi - I am hoping you can help me with this issue we are currently trying to solve. We are hosting our mobile site's content on a different domain than what the URL of the site is, though owned by same company. In Google Webmasters tool we have the mobile sitemap under "sitemaps.xyz.com", however the URL of the site is "m.xyz.com". We have submitted 60MM pages in the mobile sitemap, but only 1MM pages have been indexed. Do you think this set up causes confusion with the bots? Does this affect the crawlability of the site? Any thoughts would be greatly appreciated. Thank you!
Intermediate & Advanced SEO | | ladylana
Eva0 -
Wordpress Duplicate Content
We have recently moved our company's blog to Wordpress on a subdomain (we utilize the Yoast SEO plugin). We are now experiencing an ever-growing volume of crawl errors (nearly 300 4xx now) for pages that do not exist to begin with. I believe it may have something to do with having the blog on a subdomain and/or our yoast seo plugin's indexation archives (author, category, etc) --- we currently have Subpages of archives and taxonomies, and category archives in use. I'm not as familiar with Wordpress and the Yoast SEO plugin as I am with other CMS' so any help in this matter would be greatly appreciated. I can PM further info if necessary. Thank you for the help in advance.
Intermediate & Advanced SEO | | BethA0 -
What are Benefits to Develop Large HTML Sitemap?
I've developed very simple HTML sitemap on Vista Stores. Today, I was checking Magento extensions and come to know about such a great extension. That will help me to create such a large HTML sitemap on my website similar to following one. http://wiredsport.com/sitemap/ http://www.breathalyzers.com/sitemap/ http://slindi.com/sitemap/ Which is best structure for HTML sitemap & Which are benefits to develop big HTML sitemap with all pages?
Intermediate & Advanced SEO | | CommercePundit0