How Do I Generate a Sitemap for a Large Wordpress Site?
-
Hello Everyone!
I am working with a Wordpress site that is in Google news (i.e. everyday we have about 30 new URLs to add to our sitemap) The site has years of articles, resulting in about 200,000 pages on the site. Our strategy so far has been use a sitemap plugin that only generates the last few months of posts, however we want to improve our SEO and submit all the URLs in our site to search engines.
The issue is the plugins we've looked at generate the sitemap on-the-fly. i.e. when you request the sitemap, the plugin then dynamically generates the sitemap. Our site is so large that even a single request for our sitemap.xml ties up tons of server resources and takes an extremely long time to generate the sitemap (if the page doesn't time out in the process).
Does anyone have a solution?
Thanks,
Aaron
-
In my case, xml-sitempas works extremely good. I fully understand that a DB solution would avoid the crawl need, but the features that I get from xml-sitemaps are worth it.
I am running my website on a powerful dedicated server with SSDs, so perhaps that's why I'm not getting any problems plus I set limitations on the generator memory consumption and activated the feature that saves temp files just in case the generation fails.
-
My concern with recommending xml-sitemaps was that I've always had problems getting good, complete maps of extremely large sites. An internal CMS-based tool is grabbing pages straight from the database instead of having to crawl for them.
You've found that it gets you a pretty complete crawl of your 5K-page site, Federico?
-
I would go with the paid solution of xml-sitemaps.
You can set all the resources that you want it to have available, and it will store in temp files to avoid excessive consumption.
It also offers settings to create large sitemaps using a sitemap_index and you could get plugins that create the news sitemap automatically looking for changes since the last sitemap generation.
I have it running in my site with 5K pages (excluding tag pages) and it takes 10 minutes to crawl.
Then you also have plugins that create the sitemaps dynamically, like SEO by Yoast, Google XML Sitemaps, etc.
-
I think the solution to your server resource issue is to create multiple sitemaps, Aaron. Given that the sitemap protocol only allows 50,000 URLs max. per sitemap and Google News sitemaps can't be over 1000 URLs, this was going to be a necessity anyway, so may as well use these limitations to your advantage.
There's a functionality available for sitemaps called a sitemap index. It basically lists all the sitemap.xmls you've created, so the search engines can find and index them. You put it at the root of the site and then link to it in robots.txt just like a regular sitemap. (Can also submit it in GWT). In fact, Yoast's SEO plugin sitemaps and others use just this functionality already for their News add-on.
In your case, you could build the News sitemap dynamically to meet its special requirements (up to 1000 URLs and will crawl only last 2 days of posts) and to ensure it's up-to-the-minute accurate, as is critical for news sites.
Then separately you would build additional, segmented sitemaps for the existing 200,000 pages. Since these are historical pages, you could easily serve them from static files, since they wouldn't need to update once created. By having them static, there's be no server load to serve them each time - only the load to generate the current news sitemap. (I'd actually recommend you keep each static sitemap to around 25,000 pages each to ensure search engines can crawl them easily)
This approach would involve a bit of fiddling to initially set up, as you'd need to generate the "archive" sitemaps then convert them to static versions, but once set up, the News sitemap would take care of itself and once a month (or whatever you decide) you'd need to add the "expiring" pages from the News sitemap to the most recent "archive" segment. A smart programmer might even be able to automate that process.
Does this approach sound like it might solve your problem?
Paul
P.S. Since you'd already have the sitemap index capability, you could also add video and image sitemaps to your site if appropriate.
-
Have you ever tried using a web-based sitemap generator? Not sure how it would respond to your site but at least it would be running on someone else's server, right?
Not sure what else to say honestly.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Mirror from my site
hi to all i find 2 site they do mirror my site and send back link to all my pages. do you thing its bad for my seo ?? my site is https://android-apk.org mirror sites | Who links the most | fryeboysent.com | 1,342,613 |
Intermediate & Advanced SEO | | moztabliq1
| ficyexp.cl | 934,654 | |0 -
Is this a good sitemap hierarchy for a big eCommerce site (50k+ pages).
Hi guys, hope you're all good. I am currently in the process of designing a new sitemap hierarchy to ensure that every page on the site gets indexed and is accessible via Google. It's important that our sitemap file is well structured, divided and organised into relevant sub-categories to improve indexing. I just wanted to make sure that it's all good before forwarding onto the development team for them to consider. At the moment the site has everything thrown into /sitemap.xml/ and it exceeds the 50k limit. Here is what I have came up with: A primary sitemap.xml referencing other sitemap files, each of the following areas will have their own sitemap of which is referenced by /sitemap.xml/. As an example, sitemap.xml will contain 6 links, all of which link to other sitemaps. Product pages; Blog posts; Categories and sub categories; Forum posts, pages etc; TV specific pages (we have a TV show); Other pages. Is this format correct? Once it has been implemented I can then go ahead and submit all 6 separate sitemaps to webmaster tools + add a sitemap link to the footer of the site. All comments are greatly appreciated - if you know of a site which has a good sitemap architecture, please send the link my way! Brett
Intermediate & Advanced SEO | | Brett-S0 -
International Site Migration
Hi guys, In the process of launching internationally ecommerce site (Magento CMS) for two different countries (Australia and US). Then later on expand to other countries like the UK, Canada, etc. The plan is for each country will have its own sub-folder e.g. www.domain.com/us, www.domain.com.au/au, www.domain.com.au/uk A lot of the content between these English based countries are the same. E.g. same product descriptions.
Intermediate & Advanced SEO | | jayoliverwright
So in order to prevent duplication, from what I’ve read we will need to add Hreflang tags to every single page on the site? So for: Australian pages: United States pages: Just wanted to make sure this is the correct strategy (will hreflang prevent duplicate content issues?) and anything else i should be considering? Thankyou, Chris0 -
Seo for international sites
Hello, I have a question for the group, our main US site- http://www.datacard.com is utilized to move content to other regional sites like http://www.datacard.co.uk/ and http://www.datacard.fr/ and http://www.datacard.com.br/. Anyhow, we essentially have some regional content on those sites, but for ease of maintaining and updating the content we have a company translate this for us and then undergo an in country review for local people in our company to review the content. That being said the meta descriptions, titles, code, everything gets translated to that language. I know there are issue for SEO for these purposes as we get much better rankings with http://www.datacard.com. The regional sites are newer so this could be part of it. We don't have an agency helping us with SEo and i get a lot of questions on what can be done internally for this for regional sites with our current structure. Any tips you have? It would be greatly appreciated! Laura
Intermediate & Advanced SEO | | lauramrobinson320 -
302 redirects in the sitemap?
My website uses a prefix at the end to instruct the back-end about visitor details. The setup is similar to this site - http://sanfrancisco.giants.mlb.com/index.jsp?c_id=sf with a 302 redirect from the normal link to the one with additional info and a canonical tag on the actual URL without the extra info ((the normal one here being http://sanfrancisco.giants.mlb.com,) However, when I used www.xml-sitemaps.com to create a sitemap they did so using the URLs with the extra info on the links... what should I do to create a sitemap using the normal URLs (which are the ones I want to be promoting)
Intermediate & Advanced SEO | | theLotter0 -
Migrating a very large site
hello, we are thinking about changing imageworksstudio.com to imageworkscreative.com we have TONS of of pages and want to make sure that our Page Rank and Rankings are maintained. Are there any best practices or specific guides for redirecting for such a large site (which already has a bunch of redirects in place in the first place) Thanks!
Intermediate & Advanced SEO | | imageworks-2612900 -
Site Structure Question
Hi All, Got a question about site structure, I currently have a website where everything is hosted on the root of the domain. See example below: site.com/men site.com/men-shorts site.com/men-shorts-[product name] I want to change the structure to site.com/men/shorts/[product-name] I have asked a couple of SEOs and some agree with me that the structure needs to be changed and some say that as long as I dictate the structure with internal links and breadcrumbs the URL structure doesn't matter... What do you guys think? Many thanks, Carlos
Intermediate & Advanced SEO | | Carlos-R0 -
One site or five sites for geo targeted industry
OK I'm looking to try and generate traffic for people looking for accommodation. I'm a big believer in the quality of the domain being used for SEO both in terms of the direct benefit of it having KW in it but also the effect on CTR a good domain can have. So I'm considering these options: Build a single site using the best, broad KW-rich domain I can get within my budget. This might be something like CheapestHotelsOnline.com Advantages: Just one site to manage/design One site to SEO/market Better potential to resell the site for a few million bucks Build 5 sites, each catering to a different region using 5 matching domains within my budget. These might be domains like CheapHotelsEurope.com, CheapHotelsAsia.com etc Advantages: Can use domains that are many times 'better' by adding a geo-qualifier. This should help with CTR and search Can be more targeted with SEO & Marketing So hopefully you see the point. Is it worth the dilution of SEO & marketing activities to get the better domain names? I'm chasing the longtail searchs whetever I do. So I'll be creating 5K+ pages each targeting a specific area. These would be pages like CheapestHotelsOnline.com/Europe/France/Paris or CheapHoteslEurope.com/France/Paris to target search terms targeting hotels in Paris So with that thought, is SEO even 100% diluted? Say, a link to the homepage of the first option would end up passing 1/5000th of value through to the Paris page. However a link to the second option would pass 1/1000th of the link juice through to the Paris page. So by thet logic, one only needs to do 1/5th of the work for each of the 5 sites ... that implies total SEO work would be the same? Thanks as always for any help! David
Intermediate & Advanced SEO | | OzDave0