Sitemap use for very large forum-based community site
-
I work on a very large site with two main types of content, static landing pages for products, and a forum & blogs (user created) under each product. Site has maybe 500k - 1 million pages. We do not have a sitemap at this time.
Currently our SEO discoverability in general is good, Google is indexing new forum threads within 1-5 days roughly. Some of the "static" landing pages for our smaller, less visited products however do not have great SEO.
Question is, could our SEO be improved by creating a sitemap, and if so, how could it be implemented? I see a few ways to go about it:- Sitemap includes "static" product category landing pages only - i.e., the product home pages, the forum landing pages, and blog list pages. This would probably end up being 100-200 URLs.
- Sitemap contains the above but is also dynamically updated with new threads & blog posts.
Option 2 seems like it would mean the sitemap is unmanageably long (hundreds of thousands of forum URLs). Would a crawler even parse something that size? Or with Option 1, could it cause our organically ranked pages to change ranking due to Google re-prioritizing the pages within the sitemap?
Not a lot of information out there on this topic, appreciate any input. Thanks in advance. -
Agreed, you'll likely want to go with option #2. Dynamic sitemaps are a must when you're dealing with large sites like this. We advise them on all of our clients with larger sites. If your forum content is important for search then these are definitely important to include as the content likely changes often and might be naturally deeper in the architecture.
In general, I'd think of sitemaps from a discoverability perspective instead of a ranking one. The primary goal is to give Googlebot an avenue to crawl your sites content regardless of internal linking structure.
-
Hi
Go with option 2, there is no scaling issue here. I have worked with and for sites that have a high multiplier on the number of sitemaps and pages that they're submitting, in some cases up to 100M pages. In all cases, Google was totally fine in crawling and processing the data that was there. As long as you follow the guidelines (max 50K URLs in a sitemap) you're fine as you're just providing another file that usually doesn't exceed about 50MB (depending on if you also add images to the sitemap). If you have an engineering team build the right infrastructure you can easily deal with thousands of these files and run them automated every day/week.
My main focus on big sites is also to streamline their sitemaps to have sitemaps with just the last 50.000 pages and the same for the last 50.000 pages that were updated. This way you're able to also monitor the indexation level of these pages. If you are able to, for example, combine the data from log file analysis you can say: we added 50K pages and Google in the last days were able to crawl X percentage of that.
Hope this gives you some extra insights.
Martijn.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
John Mueller says don't use Schema as its not working yet but I get markup conflicts using Google Mark-up
I watched recently John Mueller's Google Webmaster Hangout [DEC 5th]. In hit he mentions to a member not to use Schema.org as it's not working quite yet but to use Google's own mark-up tool 'Structured Data Markup Helper'. Fine this I have done and one of the tags I've used is 'AUTHOR'. However if you use Google's Structured Data Testing Tool in GWMT you get an error saying the following Error: Page contains property "author" which is not part of the schema. Yet this is the tag generated by their own tool. Has anyone experienced this before? and if so what action did you take to rectify it and make it work. As it stands I'm considering just removing this tag altogether. Thanks David cqbsdbunpicv8s76dlddd1e8u4g
Technical SEO | | David-E-Carey0 -
Am I using 301 correctly?
Hello, I have a 'Free download' type site for free graphics for designers. To prevent hot linking we authenticate the downloads and use a 301 redirect. So for example: The download URL looks like this if someone is clicking on the download button: http://www.website.com**/resources/243-name-of-the-file/download/dc37** and then we 301 that URL back to: http://www.website.com**/category-name/243-name-of-the-file** Is a 301 the correct way to do that?
Technical SEO | | shawn810 -
Wordpress multilanguage sitemaps
Hi, I have a multilingual wordpress site. which is in Bulgarian and English - translated using qtranslate. The xml sitemap of the 2 languages is in one sitemap file- all the links for the Bulgarian and English version are in one file. (Our web is using this plugin - http://wordpress.org/extend/plugins/google-xml-sitemaps-v3-for-qtranslate Do you have any idea how can I make separate xml sitemap for every language? I ask you here because may be you have identical problems with your multilanguage wordpress website. You can see the sitemap with 2 languages links in one sitemap here: http://cholakovit.com/ sitemap.xml Cholakov IT I have read from this article that it is better practise and also it will help with geo-targetazing your web site: http://www.seomoz.org/blog/multiple-xml-sitemaps-increased-indexation-and-traffic
Technical SEO | | vladokan0 -
How can I use a keyword based domain to rank for my existing site?
Hi everyone, From my understanding if your keywords are in your domain name it can help you rank for the keyword. My site www.pixelchefs.com was affected from the latest Google Algorithm changes, as I used my main site as a testing ground for all my back linking. Our site was a single page with Jquery slide, late February the same time with the Google algo changes we uploaded our new site, larger site with lots of pages and info. Result of that was that home page has PR3 and all other pages PR0. Well I don't really depend on Google for any work as most of my work comes from referrals.......but While searching for names for my private page I came across the domain name www.DesignOrlando.com, The specific keyword gets 22,210 view per month according to Google analytics and also contains part of the keyword for all the keywords I am after. I want to use the domain name for my main site but i am not sure what is the best way to forward the domain so Google can start reading my site as DesignOrlando.com Any Suggestions will be very appreciated.
Technical SEO | | alex_pixelchefs0 -
What is campaign based rank tracking tool? How to use it?
I'm having difficulties with SEOmoz Rank Tracker tool. During last month, it hasn't worked properly. I was suggested to use "campaign based rank tracking tool"- I would like to learn if anyone has already used it? Thanks, Sema
Technical SEO | | WTGEvents0 -
Problem? Use no follow with paid advertisers ? Or Duplicate site www.
I recently changed some content and added a few advertisers on my real estate site. then ... my traffic stopped! I thought it was possible duplicate indexpage.. can I just redirect index.html? I read the post about link dillution from today. The a site cape cod realtor.co since adding a few sponsors I noticed I lost some rank especiall for key word cape cod realtor. Im not showing in top 100 anymore with big "G" and I was #4. It also removed my G places rank I was #4 .. I shop 40 links in bing nothing in google that I can see from mozilla tool... thanks- J
Technical SEO | | Capecod0 -
Does google use the wayback machine to determine the age of a site?
I have a site that I had removed from the wayback machine because I didn't want old versions to show. However I noticed that in many seo tools the site now always shows a domain age of zero instead of 6 years ago when I registered it. My question is what do the actual search engines use to determine age when they factor it into the ranking algorithm? By having it removed from the wayback machine, does that make the search engines think the site is brand new? Thanks
Technical SEO | | FastLearner0