Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to create site map for large site (ecommerce type) that has 1000's if not 100,000 of pages.
-
I know this is kind of a newbie question but I am having an amazing amount of trouble creating a sitemap for our site Bestride.com. We just did a complete redesign (look and feel, functionality, the works) and now I am trying to create a site map. Most of the generators I have used "break" after reaching some number of pages. I am at a loss as to how to create the sitemap. Any help would be greatly appreciated!
Thanks
-
I agree with Chris. With such large websites it would be advisable having a sitemap index and then splitting the index into various individual indexes such as Pages, Products, Categories, images, media, tags etc.
-
The easiest thing i can think of is to write a script that works with your dispatcher to create a site map. The format I would use is add the page and all of the "product images" on the page to the map and move to the next. At the same time I would use an auto increment variable to keep track of how many lines you have written. When you get around 50k, write out the name of the next site map file that the program will create and have them chained together this way.
-
That's a great help Chris, thank you! And thanks to all for your help!
-
Typically, a sitemap is going to include every page on the site. As Francesca said, each sitemap can be up to 50K urls and if you need multiple sitemaps then you create a sitemap index that points to the rest of the sitemaps.
-
Thanks for the feedback!
I will look into screamingfrog for sure.
@Lesley - we are using a custom platform (in house) so we don't have that functionality. The issue is that we have a lot of inventory (millions) of cars. We have built (and are releasing new functionality today) to provide internal links so that Google can crawl all the inventory easily (users can too :). My question about sitemaps has boiled down to this: Do we need to build the sitemap to include every single page (all the inventory) or do we provide a "map" so that google can find the top pages and then crawl the inventory from there. Again the site is bestride.com. If anyone wants to take a look at the site, that would be fantastic!
Thanks
-
Are you using a custom platform or an off the shelf e-commerce package? Most off the shelf packages actually have a module that can create a site map and a lot have it where you can cron it too.
-
Of course, you can also use the moz's crawl test report at http://pro.moz.com/tools/crawl-test
-
Hi Kristin,
Each sitemap.xml can support maximum 50.000 URLs. So, If you have a site with more than 100K, It'd be better to create 2 or 3 o 4 etc sitemaps.xml in order to contain all URLs. Hope it is useful.
Kind regards!
Francesca
-
You can use screamingfrog to create your sitemap. You just need to license it for crawl more than 500 URI.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google has deindexed a page it thinks is set to 'noindex', but is in fact still set to 'index'
A page on our WordPress powered website has had an error message thrown up in GSC to say it is included in the sitemap but set to 'noindex'. The page has also been removed from Google's search results. Page is https://www.onlinemortgageadvisor.co.uk/bad-credit-mortgages/how-to-get-a-mortgage-with-bad-credit/ Looking at the page code, plus using Screaming Frog and Ahrefs crawlers, the page is very clearly still set to 'index'. The SEO plugin we use has not been changed to 'noindex' the page. I have asked for it to be reindexed via GSC but I'm concerned why Google thinks this page was asked to be noindexed. Can anyone help with this one? Has anyone seen this before, been hit with this recently, got any advice...?
Technical SEO | | d.bird0 -
I want to load my ecommerce site xml via CDN
Hello Experts. My ecommerce site - abcd.com
Technical SEO | | micey123
My ecommrece site sitemap abcd.com/sitemap.xml
My subdomain - xyz.abcd.com ( this is blank page but status is 200 which runs from cdn) My ecommerce site sitemap abcd.com/sitemap.xml contains only 1 link of subdomain sitemap- xyz.abcd.com/sitemap.xml
And this sitemap- xyz.abcd.com/sitemap.xml contains all category and product links of abcd.com So my query is :- Above configuration is okay? In search console I will add new property - xyz.abcd.com. and add sitemap xyz.abcd.com/sitemap.xml So Google will able to give errors for my website abcd.com Purpose - I want to run my xml sitemap from cdn that's why i have created subdomain like xyz.abcd.com Hope you understood my query. Thanks!0 -
Canonical for duplicate pages in ecommerce site and the product out of stock
I’m an SEO for an ecommerce site that sells shoes I have duplicate pages for different colors of the same product (unique URL for each color), Conventionally I have added canonical tags for each page, which direct to a specific product URL My question is what happens when a product which the googlbot is direct to, is out of stock but is still listed in the canonical tag ?
Technical SEO | | shoesonline0 -
Soft 404's on a 301 Redirect...Why?
So we launched a site about a month ago. Our old site had an extensive library of health content that went away with the relaunch. We redirected this entire section of the site to the new education materials, but we've yet to see this reflected in the index or in GWT. In fact, we're getting close to 500 soft 404's in GWT. Our development team confirmed for me that the 301 redirect is configured correctly. Is it just a waiting game at this point or is there something I might be missing? Any help is appreciated. Thanks!
Technical SEO | | MJTrevens0 -
Should I disavow links from pages that don't exist any more
Hi. Im doing a backlinks audit to two sites, one with 48k and the other with 2M backlinks. Both are very old sites and both have tons of backlinks from old pages and websites that don't exist any more, but these backlinks still exist in the Majestic Historic index. I cleaned up the obvious useless links and passed the rest through Screaming Frog to check if those old pages/sites even exist. There are tons of link sending pages that return a 0, 301, 302, 307, 404 etc errors. Should I consider all of these pages as being bad backlinks and add them to the disavow file? Just a clarification, Im not talking about l301-ing a backlink to a new target page. Im talking about the origin page generating an error at ping eg: originpage.com/page-gone sends me a link to mysite.com/product1. Screamingfrog pings originpage.com/page-gone, and returns a Status error. Do I add the originpage.com/page-gone in the disavow file or not? Hope Im making sense 🙂
Technical SEO | | IgorMateski0 -
Blank pages in Google's webcache
Hello all, Is anybody experiencing blanck page's in Google's 'Cached' view? I'm seeing just the page background and none of the content for a couple of my pages but when I click 'View Text Only' all of teh content is there. Strange! I'd love to hear if anyone else is experiencing the same. Perhaps this is something to do with the roll out of Google's updates last week?! Thanks,
Technical SEO | | A_Q
Elias0 -
Should we use Google's crawl delay setting?
We’ve been noticing a huge uptick in Google’s spidering lately, and along with it a notable worsening of render times. Yesterday, for example, Google spidered our site at a rate of 30:1 (google spider vs. organic traffic.) So in other words, for every organic page request, Google hits the site 30 times. Our render times have lengthened to an avg. of 2 seconds (and up to 2.5 seconds). Before this renewed interest Google has taken in us we were seeing closer to one second average render times, and often half of that. A year ago, the ratio of Spider to Organic was between 6:1 and 10:1. Is requesting a crawl-delay from Googlebot a viable option? Our goal would be only to reduce Googlebot traffic, and hopefully improve render times and organic traffic. Thanks, Trisha
Technical SEO | | lzhao0 -
Nofollow and ecommerce cart/checkout pages
Hi!! Another noob question: Should I be nofollowing my site's cart and checkout pages? Or as SEs can't get to the checkout pages without either logging in or completing the form is it something I shouldn't worry about? Have read things saying both. Not sure which is correct. Thank you! Appreciate the help. Lynn
Technical SEO | | hiphound0