Sitemap use for very large forum-based community site
-
I work on a very large site with two main types of content, static landing pages for products, and a forum & blogs (user created) under each product. Site has maybe 500k - 1 million pages. We do not have a sitemap at this time.
Currently our SEO discoverability in general is good, Google is indexing new forum threads within 1-5 days roughly. Some of the "static" landing pages for our smaller, less visited products however do not have great SEO.
Question is, could our SEO be improved by creating a sitemap, and if so, how could it be implemented? I see a few ways to go about it:- Sitemap includes "static" product category landing pages only - i.e., the product home pages, the forum landing pages, and blog list pages. This would probably end up being 100-200 URLs.
- Sitemap contains the above but is also dynamically updated with new threads & blog posts.
Option 2 seems like it would mean the sitemap is unmanageably long (hundreds of thousands of forum URLs). Would a crawler even parse something that size? Or with Option 1, could it cause our organically ranked pages to change ranking due to Google re-prioritizing the pages within the sitemap?
Not a lot of information out there on this topic, appreciate any input. Thanks in advance. -
Agreed, you'll likely want to go with option #2. Dynamic sitemaps are a must when you're dealing with large sites like this. We advise them on all of our clients with larger sites. If your forum content is important for search then these are definitely important to include as the content likely changes often and might be naturally deeper in the architecture.
In general, I'd think of sitemaps from a discoverability perspective instead of a ranking one. The primary goal is to give Googlebot an avenue to crawl your sites content regardless of internal linking structure.
-
Hi
Go with option 2, there is no scaling issue here. I have worked with and for sites that have a high multiplier on the number of sitemaps and pages that they're submitting, in some cases up to 100M pages. In all cases, Google was totally fine in crawling and processing the data that was there. As long as you follow the guidelines (max 50K URLs in a sitemap) you're fine as you're just providing another file that usually doesn't exceed about 50MB (depending on if you also add images to the sitemap). If you have an engineering team build the right infrastructure you can easily deal with thousands of these files and run them automated every day/week.
My main focus on big sites is also to streamline their sitemaps to have sitemaps with just the last 50.000 pages and the same for the last 50.000 pages that were updated. This way you're able to also monitor the indexation level of these pages. If you are able to, for example, combine the data from log file analysis you can say: we added 50K pages and Google in the last days were able to crawl X percentage of that.
Hope this gives you some extra insights.
Martijn.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Hi Mozers, is the AMP project is supposed to be an SEO factor on mobile platforms? Also, can it be used on ecommerce sites such as Magento or Shopify as well? Thanks!
It stands to reason that Google will favor early adopters of Accelerated Mobile Pages, but it seems heavily geared toward news publishers so far. What about regular Wordpress sites, or ecommerce sites like Shopify, should AMP be pursued on that type of CMS?
Technical SEO | | CalamityJane771 -
Should I Use Two Domains for Multi Language Sites?
I have an immigration attorney that wants a website in English and another in Spanish. We're going to have some of the website content from the English site translated via a translator to make it true, conversation spanish (automatic translators are good, but not perfect, and we want perfect). So my question is do you think we should use two different domains (englishsite.com, spanishsite.com), a subdomain (spanishsite.englishsite.com) or maybe just a separate section of the regular site (englishsite.com/spanishcontent)? My thought would be either a subdomain or a separate section so that we're not splitting PR.
Technical SEO | | atstickel120 -
XML Sitemap Issue or not?
Hi Everyone, I submitted a sitemap within the google webmaster tools and I had a warning message of 38 issues. Issue: Url blocked by robots.txt. Description: Sitemap contains urls which are blocked by robots.txt. Example: the ones that were given were urls that we don't want them to be indexed: Sitemap: www.example.org/author.xml Value: http://www.example.org/author/admin/ My issue here is that the number of URL indexed is pretty low and I know for a fact that Robot.txt aren't good especially if they block URL that needs to be indexed. Apparently the URLs that are blocked seem to be URLs that we don't to be indexed but it doesn't display all URLs that are blocked. Do you think i m having a major problem or everything is fine?What should I do? How can I fix it? FYI: Wordpress is what we use for our website Thanks
Technical SEO | | Tay19860 -
Should I be using use rel=author in this case?
We have a large blog, which it appears one of our regional blogs (managed separately) is simply scraping content off of our blog and adding it to theirs. Would adding rel=author (for all of our guest bloggers) help eliminate google seeing the regional blog content as scraped or duplicate? Is rel=author the best solution here?
Technical SEO | | VistageSEO0 -
Will training videos available on the "members only" section of a site contribute to the sites ranking?
Hello, I got asked a question recently as to whether training videos on the deeper pages of a website (that you can only access if you are a member and log in) will help with the sites ranking. On the SEOMoz software these deeper pages have been crawled as far as I can tell with errors reported on pages from the "members only" section of the site, leading me to believe the members only pages and their content will contribute to the sites overall ranking profile. I have suggested uploading the informational videos on the main pages of the site for now, making them accessible to all visitors and putting them in a more obvious place to encourage more sharing and views, however I've also said I would check it out with some experts so any information will be greatly appreciated! Many thanks 🙂 Charlotte
Technical SEO | | CharlotteWaller0 -
Recently revamped site structure - now not even ranking for brand name, but lots of content - what happened? (Yup, the site has been crawled a few times since) Any ideas? Did I make a classic mistake? Any advise appreciated :)
I've completely disappeared off Google - what happened? Even my brand name keyword does not bring up my website - I feel lost, confused and baffled on what my next steps should be. ANY advice would be welcome, since there's no going back to the way the site was set up.
Technical SEO | | JeanieWalker0 -
Using the Canonical Tag
Hi, I have an issue that can be solve with a canonical tag, but I am not sure yet, we are developing a page full of statistics, like this: www.url.com/stats/ But filled with hundreds of stats, so users can come and select only the stats they want to see and share with their friends, so it becomes like a new page with their slected stats: www.url.com/stats/?id=mystats The problems I see on this is: All pages will be have a part of the content from the main page 1) and many of them will be exactly the same, so: duplicate content. My idea was to add the canonical tag of "www.url.com/stats/" to all pages, similar as how Rand does it here: http://www.seomoz.org/blog/canonical-url-tag-the-most-important-advancement-in-seo-practices-since-sitemaps But I am not sure of this solution because the content is not exactly the same, page 2) will only have a part of the content that page 1) has, and in some cases just a very small part. Is the canonical tag useful in this case? Thank you!
Technical SEO | | andresgmontero0