Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to create site map for large site (ecommerce type) that has 1000's if not 100,000 of pages.
-
I know this is kind of a newbie question but I am having an amazing amount of trouble creating a sitemap for our site Bestride.com. We just did a complete redesign (look and feel, functionality, the works) and now I am trying to create a site map. Most of the generators I have used "break" after reaching some number of pages. I am at a loss as to how to create the sitemap. Any help would be greatly appreciated!
Thanks
-
I agree with Chris. With such large websites it would be advisable having a sitemap index and then splitting the index into various individual indexes such as Pages, Products, Categories, images, media, tags etc.
-
The easiest thing i can think of is to write a script that works with your dispatcher to create a site map. The format I would use is add the page and all of the "product images" on the page to the map and move to the next. At the same time I would use an auto increment variable to keep track of how many lines you have written. When you get around 50k, write out the name of the next site map file that the program will create and have them chained together this way.
-
That's a great help Chris, thank you! And thanks to all for your help!
-
Typically, a sitemap is going to include every page on the site. As Francesca said, each sitemap can be up to 50K urls and if you need multiple sitemaps then you create a sitemap index that points to the rest of the sitemaps.
-
Thanks for the feedback!
I will look into screamingfrog for sure.
@Lesley - we are using a custom platform (in house) so we don't have that functionality. The issue is that we have a lot of inventory (millions) of cars. We have built (and are releasing new functionality today) to provide internal links so that Google can crawl all the inventory easily (users can too :). My question about sitemaps has boiled down to this: Do we need to build the sitemap to include every single page (all the inventory) or do we provide a "map" so that google can find the top pages and then crawl the inventory from there. Again the site is bestride.com. If anyone wants to take a look at the site, that would be fantastic!
Thanks
-
Are you using a custom platform or an off the shelf e-commerce package? Most off the shelf packages actually have a module that can create a site map and a lot have it where you can cron it too.
-
Of course, you can also use the moz's crawl test report at http://pro.moz.com/tools/crawl-test
-
Hi Kristin,
Each sitemap.xml can support maximum 50.000 URLs. So, If you have a site with more than 100K, It'd be better to create 2 or 3 o 4 etc sitemaps.xml in order to contain all URLs. Hope it is useful.
Kind regards!
Francesca
-
You can use screamingfrog to create your sitemap. You just need to license it for crawl more than 500 URI.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Canonical for duplicate pages in ecommerce site and the product out of stock
I’m an SEO for an ecommerce site that sells shoes I have duplicate pages for different colors of the same product (unique URL for each color), Conventionally I have added canonical tags for each page, which direct to a specific product URL My question is what happens when a product which the googlbot is direct to, is out of stock but is still listed in the canonical tag ?
Technical SEO | | shoesonline0 -
Blog Page Titles - Page 1, Page 2 etc.
Hi All, I have a couple of crawl errors coming up in MOZ that I am trying to fix. They are duplicate page title issues with my blog area. For example we have a URL of www.ourwebsite.com/blog/page/1 and as we have quite a few blog posts they get put onto another page, example www.ourwebsite.com/blog/page/2 both of these urls have the same heading, title, meta description etc. I was just wondering if this was an actual SEO problem or not and if there is a way to fix it. I am using Wordpress for reference but I can't see anywhere to access the settings of these pages. Thanks
Technical SEO | | O2C0 -
Why is Google Webmaster Tools showing 404 Page Not Found Errors for web pages that don't have anything to do with my site?
I am currently working on a small site with approx 50 web pages. In the crawl error section in WMT Google has highlighted over 10,000 page not found errors for pages that have nothing to do with my site. Anyone come across this before?
Technical SEO | | Pete40 -
How Does Google's "index" find the location of pages in the "page directory" to return?
This is my understanding of how Google's search works, and I am unsure about one thing in specific: Google continuously crawls websites and stores each page it finds (let's call it "page directory") Google's "page directory" is a cache so it isn't the "live" version of the page Google has separate storage called "the index" which contains all the keywords searched. These keywords in "the index" point to the pages in the "page directory" that contain the same keywords. When someone searches a keyword, that keyword is accessed in the "index" and returns all relevant pages in the "page directory" These returned pages are given ranks based on the algorithm The one part I'm unsure of is how Google's "index" knows the location of relevant pages in the "page directory". The keyword entries in the "index" point to the "page directory" somehow. I'm thinking each page has a url in the "page directory", and the entries in the "index" contain these urls. Since Google's "page directory" is a cache, would the urls be the same as the live website (and would the keywords in the "index" point to these urls)? For example if webpage is found at wwww.website.com/page1, would the "page directory" store this page under that url in Google's cache? The reason I want to discuss this is to know the effects of changing a pages url by understanding how the search process works better.
Technical SEO | | reidsteven750 -
Duplicate page titles on Ecommerce
Hi, My question is in reference to an E-commerce site- Our SEO MOZ scan is showing many errors for Duplicates- such as Duplicate titles - The majority of these are on the products map- and the page titles are Products Map :: Company Name How do we get correct this or does Google not penalize for it? Thanks.
Technical SEO | | frankrizzo0 -
Blank pages in Google's webcache
Hello all, Is anybody experiencing blanck page's in Google's 'Cached' view? I'm seeing just the page background and none of the content for a couple of my pages but when I click 'View Text Only' all of teh content is there. Strange! I'd love to hear if anyone else is experiencing the same. Perhaps this is something to do with the roll out of Google's updates last week?! Thanks,
Technical SEO | | A_Q
Elias0 -
Blocking URL's with specific parameters from Googlebot
Hi, I've discovered that Googlebot's are voting on products listed on our website and as a result are creating negative ratings by placing votes from 1 to 5 for every product. The voting function is handled using Javascript, as shown below, and the script prevents multiple votes so most products end up with a vote of 1, which translates to "poor". How do I go about using robots.txt to block a URL with specific parameters only? I'm worried that I might end up blocking the whole product listing, which would result in de-listing from Google and the loss of many highly ranked pages. DON'T want to block: http://www.mysite.com/product.php?productid=1234 WANT to block: http://www.mysite.com/product.php?mode=vote&productid=1234&vote=2 Javacript button code: onclick="javascript: document.voteform.submit();" Thanks in advance for any advice given. Regards,
Technical SEO | | aethereal
Asim0 -
A question about RSS feeds and nofollow's
With the nofollow tag used very widely on the internet these days I was just wondering about how an RSS feed might help me find a way around it. Basically my question is this : I post a comment on a blog, it's approved and my comment together with my link(nofollow tag applied) is there. Now when the blogs RSS feed updates, does this nofollow tag get applied to the feed? As far as I can tell it does not - but I'm not too clue'd up on how the feed is generated. Anyone want to help me understand how it works and if what I'm suggesting would be 'a way around the nofollow tag' ? Thanks 🙂
Technical SEO | | DanHill0