Large site with content silo's - best practice for deep indexing silo content
-
Thanks in advance for any advice/links/discussion. This honestly might be a scenario where we need to do some A/B testing.
We have a massive (5 Million) content silo that is the basis for our long tail search strategy. Organic search traffic hits our individual "product" pages and we've divided our silo with a parent category & then secondarily with a field (so we can cross link to other content silo's using the same parent/field categorizations).
We don't anticipate, nor expect to have top level category pages receive organic traffic - most people are searching for the individual/specific product (long tail). We're not trying to rank or get traffic for searches of all products in "category X" and others are competing and spending a lot in that area (head).
The intent/purpose of the site structure/taxonomy is to more easily enable bots/crawlers to get deeper into our content silos. We've built the page for humans, but included link structure/taxonomy to assist crawlers.
So here's my question on best practices. How to handle categories with 1,000+ pages/pagination. With our most popular product categories, there might be 100,000's products in one category. My top level hub page for a category looks like www.mysite/categoryA and the page build is showing 50 products and then pagination from 1-1000+.
Currently we're using rel=next for pagination and for pages like www.mysite/categoryA?page=6 we make it reference itself as canonical (not the first/top page www.mysite/categoryA).
Our goal is deep crawl/indexation of our silo. I use ScreamingFrog and SEOMoz campaign crawl to sample (site takes a week+ to fully crawl) and with each of these tools it "looks" like crawlers have gotten a bit "bogged down" with large categories with tons of pagination. For example rather than crawl multiple categories or fields to get to multiple product pages, some bots will hit all 1,000 (rel=next) pages of a single category. I don't want to waste crawl budget going through 1,000 pages of a single category, versus discovering/crawling more categories.
I can't seem to find a consensus as to how to approach the issue. I can't have a page that lists "all" - there's just too much, so we're going to need pagination. I'm not worried about category pagination pages cannibalizing traffic as I don't expect any (should I make pages 2-1,000) noindex and canonically reference the main/first page in the category?). Should I worry about crawlers going deep in pagination among 1 category versus getting to more top level categories?
Thanks!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Spammy inbound links: Don't Fix It If It's Not Broken?
Hi Moz community, Our website is nearing the end of a big redesign to be mobile-responsive. We decided to delay any major changes to text content so that if we do suffer a rankings drop upon launch, we'll have some ability to isolate the cause. In the meantime I'm analyzing our current SEO strengths and weaknesses. There is a huge discrepancy between our rankings and our inbound link profile. Specifically, we do great on most of our targeted keywords and in fact had a decent surge in recent months. But Link Profiler turned up hundreds of pages of inbound links from spammy domains, many of which don't even display a webpage when I click there. (shown in uploaded image) "Don't fix it if it's not broken" is conflicting with my natural repulsion to these sorts of referrals. Assuming we don't suffer a rankings drop from the redesign, how much of a priority should this be? There are too many and most are too spammy to contact the webmasters, so we'll need to do it through a Disavow. I couldn't even open the one at the top of the list because our business web proxy identified it as adult content. It seems like a common conception is that if Google hasn't penalized us for it yet, they will eventually. Are we talking about the algorithm just stumbling upon these links and hurting us or would this be something we would find in Manual Actions? (or both?) How long after the launch should we wait before attacking these bad links? Is there a certain spam score that you'd say is a threshold for "Yes, definitely get rid of it"? And when we do, should we Disavow domains one domain at a time to monitor any potential drops or all at once? (this seems kind of obvious but if the spam score and domain authority alone is enough of a signal that it won't hurt us, we'd rather get it done asap) How important is this compared to creating fresh new content on all the product pages? Each one will have new images as well as product reviews, but the product descriptions will be the same ones we've had up for years. I have new content written but it's delayed pending any fallout from the redesign. Thanks for any help with this! d1SB2JP.jpg
Moz Pro | | jcorbo0 -
I have 702 'No-Index' warnings. Is this bad?
Moz has giving me 702 'No-Indexed Meta-descriptions' warnings. My page has quite a bit of product pages as it is a commercial chemical company which sells cleaning products for restaurants, hospitals, etc. Im wondering if this is effecting my site negatively?
Moz Pro | | ACSmt0 -
Mozscape index dosen't update
Last Mozscape index update: July 11, 2013. Next Mozscape index update: August26, 2013 Today is August26.Why mozscape doesnt updated?
Moz Pro | | vahidafshari451 -
Duplicate Content in Blog
Hi, SEOMoz on-page analysis is reporting that our blog has duplicate content when technically it doesn't. Is this something that we need to address as it will actually be hurting our ranking or is this just a SEOMoz software quirk? There is 100+ example like this but here is one example. SEOMoz is reporting http://www.invoicestudio.com/Blog/author/InvoiceStudio?page=1 and http://www.invoicestudio.com/Blog/author/InvoiceStudio?page=2 as a duplicate content and Title Tag. Thanks Andrew
Moz Pro | | Studio330 -
Is it possible to submit a page to the seomoz index?
We recently got added to dmoz and botw and would like to see those links considered in our domain authority as we are tracking our progress and comparing ourselves to other sites. Is it possible to submit links to the seomoz index manually to have those tracked? (If I am even understanding this correctly)
Moz Pro | | hyperthalamus0 -
Open Site Explorer Not Working on 99% of Sites
I'm trying to use Open Site Explorer, but whenever I try to I get the error message "Ouch! It Looks Like Something Went South". Interestingly, I don't have this problem when entering a very popular site like google.com, yahoo.com, bbc.co.uk. Is there a problem with the tool at the moment, or something wrong on my end?
Moz Pro | | EssexGirl1 -
How often is Open Site Explorer updated?
Or perhaps my question is actually how often does the seomoz bot crawl the web? I ran some benchmark link counts for a number of websites about this time last month and I noticed when I went to update them for this month the data is the same. Thanks.
Moz Pro | | WCR1 -
Duplicate page content reports duplicates, but pages don't show duplication
My duplicate page reports shows 376 pages with duplicate content. After reviewing the pages the report claims have duplicate content, i can't find duplications. could this be an error, or is there some source code that doesn't display that could be causing this issue?
Moz Pro | | noonzie0