New SEO manager needs help! Currently only about 15% of our live sitemap (~4 million url e-commerce site) is actually indexed in Google. What are best practices sitemaps for big sites with a lot of changing content?
-
In Google Search console
4,218,017 URLs submitted
402,035 URLs indexed
what is the best way to troubleshoot?
What is best guidance for sitemap indexation of large sites with a lot of changing content?
-
Hi Hamish
I'm not sure how many products you have listed on your website but I am only guessing that it is not 4m of even 400,000. I think the question you should be asking yourself is 'do I really need so many URLs?'
If you have 50,000 products in your site then frankly you only need maybe 51000 pages in total (including support pages, brands (maybe), categories and sub-categories. I am only guessing but I would suggest that the other pages are being created by tags or other attributes and that these elements are creating acres of duplicate and very skinny content.
My usual question is - 'so you have 400,000 (never mind 4m) pages in Google? - did you write or generate 400,000 pages of useful, interesting, non-duplicate and shareable content? The answer of course is usually no.
Try switching off sets of tags and canonicalizing very similar content and you'll be amazed how it helps rankings!
Just a thought
Regards Nigel
Carousel Projects.
-
This post from Search Engine Journal (https://www.searchenginejournal.com/definitive-list-reasons-google-isnt-indexing-site/118245/) is helpful for troubleshooting.
This Moz post (https://moz.com/blog/8-reasons-why-your-site-might-not-get-indexed) has some additional considerations. The 6th point the post author raises is one you should pay attention to given you're asking about a large e-commerce site. Point 6 says you might not have enough Pagerank, that "the number of pages Google crawls is roughly proportional to your pagerank".
As you probably know, Google has said they're not maintaining Pagerank anymore, but the essence of the issue raised is a solid one. Google does set a crawl budget for every website and large e-commerce sites often run into situations where they run out before the entire site is indexed. You should look at your site structure, robots tagging, and as Jason McMahon says, internal linking to make sure you are directing Google to the most important pages on your site first, and that all redundant content is canonicalized or noindexed.
I'd start with that.
-
Hi Hamish_TM,
It is hard to say without knowing the exact URL but here are some things to consider:
- Indexing Lag - How long ago did you submit the sitemaps? We usually find there can be at least a few weeks lag between when the sitemaps are submitted and when all the URL's are indexed.
- Internal Linking - What does your sites internal linking structure look like? Good internal linking like having breadcrumbs, in-text links, sidebar links and siloed URL structuring can help the indexation process.
- **Sitemap Errors - **Are there currently any sitemap errors listed in Google Search Console? Either on the dashboard or in the sitemaps section? Any issues here could be adding to your problem.
Hopefully, this is of some help and let me know how you go.
Regards,
Jason.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What is the best tool for getting a SiteMap url for a website with over 4k pages?
I have just migrated my website from HUGO to Wordpress and I want to submit the sitemap to Google Search Console (because I haven't done so in a couple years). It looks like there are many tools for getting a sitemap file built. But I think they probably vary in quality. Especially considering the size of my site.
Technical SEO | | DanKellyCockroach2 -
Youtube SEO Best Practices
Does anyone know where to find a list of SEO best practices for Youtube? Specifically...does anyone have thoughts on the SEO benefits of an @domain.com login vs @gmail.com login? Or is adding my url to the "Associated website" channel setting sufficient for SEO purposes?
Technical SEO | | brianvest0 -
My video sitemap is not being index by Google
Dear friends, I have a videos portal. I created a video sitemap.xml and submit in to GWT but after 20 days it has not been indexed. I have verified in bing webmaster as well. All videos are dynamically being fetched from server. My all static pages have been indexed but not videos. Please help me where am I doing the mistake. There are no separate pages for single videos. All the content is dynamically coming from server. Please help me. your answers will be more appreciated................. Thanks
Technical SEO | | docbeans0 -
Site removed from Google Index
Hi mozers, Two months ago we published http://aquacion.com We registered it in the Google Webmaster tools and after a few day the website was in the index no problem. But now the webmaster tools tell us the URLs were manually removed. I've look everywhere in the webmaster tools in search for more clues but haven't found anything that would help me. I sent the acces to the client, who might have been stupid enough to remove his own site from the Google index, but now, even though I delete and add the sitemap again, the website won't show in Google SERPs. What's weird is that Google Webmaster Tools tells us all the page are indexed. I'm totally clueless here... Ps. : Added screenshots from Google Webmaster Tools. Update Turns out it was my mistake after all. When my client developped his website a few months ago, he published it, and I removed the website from the Google Index. When the website was finished I submited the sitemap, thinking it would void the removal request, but it don't. How to solve In webmaster tools, in the [Google Index => Remove URLs] page, you can reinclude pages there. tGib0
Technical SEO | | RichardPicard0 -
I need help with a PHP canonical URL tags
I found a little difficult for me to do a canonical tag in my PHP. On-Page Report Card We check to make sure that IF you use canonical URL tags, it points to the right page. If the canonical tag points to a different URL, engines will not count this page as the reference resource and thus, it won't have an opportunity to rank. If you've not made this page the rel=canonical target, change the reference to this URL. NOTE: For pages not employing canonical URL tags, this factor does not apply. I don't know how to tidy my PHP Any suggestion.
Technical SEO | | lnietob0 -
Indexed pages and current pages - Big difference?
Our website shows ~22k pages in the sitemap but ~56k are showing indexed on Google through the "site:" command. Firstly, how much attention should we paying to the discrepancy? If we should be worried what's the best way to find the cause of the difference? The domain canonical is set so can't really figure out if we've got a problem or not?
Technical SEO | | Nathan.Smith0 -
Best way to redirect 3 sites to 1 new one.
Hi All We currently have 3 old sites that have tones of content. Due to brand/business consolidation we have merge all 3 to produce 1 website. The new site contains all the old content from the old 3. So, I know I need to 301 redirect all the old content from the previous sites to the equivelent content on the new sites but am confused how you do this with 3 domains? One of the domains is being replaced with the new site. So I have: www.domain1.co.uk www.domain2.co.uk www.domain3.co.uk All the content for all the sites have been imported into a new site and any duplicate content issues havce been resolved. Can anyone point me in the right direction? Thanks
Technical SEO | | EclipseLegal0 -
Why google index my IP URL
hi guys, a question please. if site:112.65.247.14 , you can see google index our website IP address, this could duplicate with our darwinmarketing.com content pages. i am not quite sure why google index my IP pages while index domain pages, i understand this could because of backlink, internal link and etc, but i don't see obvious issues there, also i have submit request to google team to remove ip address index, but seems no luck. Please do you have any other suggestion on this? i was trying to do change of address setting in Google Webmaster Tools, but didn't allow as it said "Restricted to root level domains only", any ideas? Thank you! boson
Technical SEO | | DarwinChinaSEO0