How Do I Generate a Sitemap for a Large Wordpress Site?
-
Hello Everyone!
I am working with a Wordpress site that is in Google news (i.e. everyday we have about 30 new URLs to add to our sitemap) The site has years of articles, resulting in about 200,000 pages on the site. Our strategy so far has been use a sitemap plugin that only generates the last few months of posts, however we want to improve our SEO and submit all the URLs in our site to search engines.
The issue is the plugins we've looked at generate the sitemap on-the-fly. i.e. when you request the sitemap, the plugin then dynamically generates the sitemap. Our site is so large that even a single request for our sitemap.xml ties up tons of server resources and takes an extremely long time to generate the sitemap (if the page doesn't time out in the process).
Does anyone have a solution?
Thanks,
Aaron
-
In my case, xml-sitempas works extremely good. I fully understand that a DB solution would avoid the crawl need, but the features that I get from xml-sitemaps are worth it.
I am running my website on a powerful dedicated server with SSDs, so perhaps that's why I'm not getting any problems plus I set limitations on the generator memory consumption and activated the feature that saves temp files just in case the generation fails.
-
My concern with recommending xml-sitemaps was that I've always had problems getting good, complete maps of extremely large sites. An internal CMS-based tool is grabbing pages straight from the database instead of having to crawl for them.
You've found that it gets you a pretty complete crawl of your 5K-page site, Federico?
-
I would go with the paid solution of xml-sitemaps.
You can set all the resources that you want it to have available, and it will store in temp files to avoid excessive consumption.
It also offers settings to create large sitemaps using a sitemap_index and you could get plugins that create the news sitemap automatically looking for changes since the last sitemap generation.
I have it running in my site with 5K pages (excluding tag pages) and it takes 10 minutes to crawl.
Then you also have plugins that create the sitemaps dynamically, like SEO by Yoast, Google XML Sitemaps, etc.
-
I think the solution to your server resource issue is to create multiple sitemaps, Aaron. Given that the sitemap protocol only allows 50,000 URLs max. per sitemap and Google News sitemaps can't be over 1000 URLs, this was going to be a necessity anyway, so may as well use these limitations to your advantage.
There's a functionality available for sitemaps called a sitemap index. It basically lists all the sitemap.xmls you've created, so the search engines can find and index them. You put it at the root of the site and then link to it in robots.txt just like a regular sitemap. (Can also submit it in GWT). In fact, Yoast's SEO plugin sitemaps and others use just this functionality already for their News add-on.
In your case, you could build the News sitemap dynamically to meet its special requirements (up to 1000 URLs and will crawl only last 2 days of posts) and to ensure it's up-to-the-minute accurate, as is critical for news sites.
Then separately you would build additional, segmented sitemaps for the existing 200,000 pages. Since these are historical pages, you could easily serve them from static files, since they wouldn't need to update once created. By having them static, there's be no server load to serve them each time - only the load to generate the current news sitemap. (I'd actually recommend you keep each static sitemap to around 25,000 pages each to ensure search engines can crawl them easily)
This approach would involve a bit of fiddling to initially set up, as you'd need to generate the "archive" sitemaps then convert them to static versions, but once set up, the News sitemap would take care of itself and once a month (or whatever you decide) you'd need to add the "expiring" pages from the News sitemap to the most recent "archive" segment. A smart programmer might even be able to automate that process.
Does this approach sound like it might solve your problem?
Paul
P.S. Since you'd already have the sitemap index capability, you could also add video and image sitemaps to your site if appropriate.
-
Have you ever tried using a web-based sitemap generator? Not sure how it would respond to your site but at least it would be running on someone else's server, right?
Not sure what else to say honestly.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Best Practices to Design Site Mock Up Using Wordpress Rather than Wireframes?
We are in the process of redesigning our real estate website. Our designer/developer is very quick and confident on Wordpress. He suggests designing directly on Wordpress and bypassing wireframes and a mock ups. He is very confident in his Wordpress abilities. Is it a mistake to take this approach? He has also asked that we select a real estate theme at this point. I would think that the theme would be selected after the wireframes and mock ups get done. But there are certainly different approaches. Are there best practices for redesigning a webiste; any suggestions? Are there significant risks/disadvantages to bypassing wireframes/mock ups? Thanks,
Intermediate & Advanced SEO | | Kingalan1
Alan Rosinsky0 -
Site Migration Question
Hi Guys, I am preparing for a pretty standard site migration. Small business website moving to a new domain, new branding and new cms. Pretty much a perfect storm. Right now the new website is being designed and will need another month, however the client is pretty antsy to get her new brand out over the web. We cannot change the current site, which has the old branding. She wants to start passing out business cards and hang banners with the new domain and brand. However, I don't want to be messing with any redirects and potentially screw up a clean migration from the old site to the new. To be specific, she wants to redirect the new domain to the current domain and then when the new site, flip the redirect. However, I'm a little apprehensive with that because a site migration from the current to the new is already so intricate, I don't want to leave any possibility of error. I'm trying to figure out the best solution, these are 2 options I am thinking of: DO NOT market new domain. Reprint all Marketing material and wait until new domain is up and then start marketing it. (At cost to client) Create a one pager on new domain saying the site is being built & have a No Follow link to the current site. No redirects added. Just the no follow link. I'd like option 2 so that the client could start passing out material, but my number one concern is messing with any part of the migration. We are about to submit a sitemap index to Google Search Console for the current site, so we are just starting the site migration. What do you guys think?
Intermediate & Advanced SEO | | Khoo0 -
Google is alternating what link it likes to rank on wordpress site and
Hi there, I'm experiencing a problem where google is pick and choosing different links structures to rank my Wordpress site for my main keywords. The site had pretty good #1 rankings for a long time but recently I noticed Google is choosing to rank the page in one of two ways. Let me just say that the original way where it held good rankings looked like this for example: flowers.com/the-most-beautiful-wedding-bouquets/ this is just an example it' is not my site. And when google decides to switch it up it uses this link structure:flowers.com > weddings (this still points to this link flowers.com/the-most-beautiful-wedding-bouquets when I hover my mouse over it) however this link structure that never appeared before and now does, usually has much lower rankings. Please note it's not both link structures being ranked at the same time for the keywords. It's one or the other that google is currently alternating in ranking and I believe it's hurting the sites position.
Intermediate & Advanced SEO | | z8YX9F80
I'm not sure if this is a wordpress settings thats gone wrong or what the problem is but I do know when shows the expanded and descriptive link structure flowers.com/the-most-beautiful-wedding-bouquets the rankings are higher and in 2nd place. I'm hoping by rectifying this I can regain back my position. I'm very grateful for any insight you could offer on why this is happening and how I could fix it. Thank you. PS Wordpress site has several SEO plugins0 -
Transferring Domain and redirecting old site to new site and Having Issues - Please help
I have just completed a site redesign under a different domain and new wordpress woo commerce platform. The typical protocol is to just submit all the redirects via the .htaccess file on the current site and thereby tell google the new home of all your current pages on the new site so you maintain your link juice. This problem is my current site is hosted with network solutions and they do not allow access to the .htaccess file and there is no way to redirect the pages they say other than a script they can employ to push all pages of the old site to the new home page of the new site. This is of course bad for seo so not a solution. They did mention they could also write a script for the home page to redirect just it to the new home page then place a script of every individual page redirecting each of those. Does this sound like something plausible? Noone at network solutions has really been able to give me a straight answer. That being said i have discussed with a few developers and they mentioned a workaround process to avoid the above: “The only thing I can think of is.. point both domains (www.islesurfboards.com & www.islesurfandsup.com) to the new store, and 301 there? If you kept WooCommerce, Wordpress has plugins to 301 pages. So maybe use A record or CName for the old URL to the new URL/IP, then use htaccess to redirect the old domain to the new domain, then when that comes through to the new store, setup 301's there for pages? Example ... http://www.islesurfboards.com points to http://www.islesurfandsup.com ... then when the site sees http://www.islesurfboards.com, htaccess 301's to http://www.islesurfandsup.com.. then wordpress uses 301 plugin for the pages? Not 100% sure if this is the best way... but might work." Can anyone confirm this process will work or suggest anything else to redirect my current site on network solutions to my new site withe new domain and maintain the redirects and seo power. My domain www.islesurfboards.com has been around for 10 years so dont just want to flush the link juice down the toilet and want to redirect everything correctly.
Intermediate & Advanced SEO | | isle_surf0 -
On-Site Directory - Delete or Keep?
We have 2 ecommerce sites. Both have been hit by Penguin (no warnings in WMT) and we're in the process of cleaning up backlinks. We have link directories on both sites. They've got links that are relevant to the sites but also links that aren't relevant. And they're big directories - we're talking thousands of links to other sites. What's the best approach here? Do we leave it alone, delete the whole thing, or manually review and keep highly relevant links but get rid of the rest?
Intermediate & Advanced SEO | | Kingof50 -
Removing A Blog From Site...
Hi Everyone, One of my clients I am doing marketing consulting for is a big law firm. For the past 3 years they have been paying someone to write blog posts everyday in hopes of improving search traffic to site. The blog did indeed increase traffic to the site, but analyzing the stats, the firm generates no leads (via form or phone) from any of the search traffic that lands in the blog. Furthermore, I'm seeing Google send many search queries that people use to get to the site to blog pages, when it would be much more beneficial to have that traffic go to the main part of the website. In short, the law firm's blog provides little to no value to end users and was written entirely for SEO purposes. Now the law firm's website has 6,000 unique pages, and only 400 pages of the site are NON-blog pages (the good stuff, essentially). About 35% of the site's total site traffic lands on the blog pages from search, but again... this traffic does not convert, has very high bounce rate and I doubt there is any branding benefit either. With all that said, I didn't know if it would be best to delete the blog, redirect blog pages to some other page on the site, etc? The law firm has ceased writing new blog posts upon my recommendation, as well. I am afraid of doing something ill-advised with the blog since it accounts now for 95% of the pages of the website. But again, it's useless drivel in my eyes that adds no value and was simply a misguided SEO effort from another marketer that heard blogs are good for SEO. I would certainly appreciate any guidance or advice on how best to handle this situation. Thank you for your kind help!
Intermediate & Advanced SEO | | gbkevin0 -
Where would the SEO juice go if I have a wordpress site hosted by godaddy?
I am planning on moving my website to a wordpress that is hosted by godaddy. I am wondering where the SEO juice that my website has already gained would go. Would it go to godaddy when I make the move instead?
Intermediate & Advanced SEO | | SierraPCB0 -
How Does This Site Rank So Well?!
So this website -> http://bailbondsripoffreport.com/ Ranks on the First Page for the term "Bail Bonds" It's the spammiest crappiest piece of junk website ever! lol - How does this site rank so well, it's not even a year old and it's link structure is crap. Can I like report them and have them removed lol. Any ideas would be appreciated. Thanks!
Intermediate & Advanced SEO | | utahseopros0