New SEO manager needs help! Currently only about 15% of our live sitemap (~4 million url e-commerce site) is actually indexed in Google. What are best practices sitemaps for big sites with a lot of changing content?
-
In Google Search console
4,218,017 URLs submitted
402,035 URLs indexed
what is the best way to troubleshoot?
What is best guidance for sitemap indexation of large sites with a lot of changing content?
-
Hi Hamish
I'm not sure how many products you have listed on your website but I am only guessing that it is not 4m of even 400,000. I think the question you should be asking yourself is 'do I really need so many URLs?'
If you have 50,000 products in your site then frankly you only need maybe 51000 pages in total (including support pages, brands (maybe), categories and sub-categories. I am only guessing but I would suggest that the other pages are being created by tags or other attributes and that these elements are creating acres of duplicate and very skinny content.
My usual question is - 'so you have 400,000 (never mind 4m) pages in Google? - did you write or generate 400,000 pages of useful, interesting, non-duplicate and shareable content? The answer of course is usually no.
Try switching off sets of tags and canonicalizing very similar content and you'll be amazed how it helps rankings!
Just a thought
Regards Nigel
Carousel Projects.
-
This post from Search Engine Journal (https://www.searchenginejournal.com/definitive-list-reasons-google-isnt-indexing-site/118245/) is helpful for troubleshooting.
This Moz post (https://moz.com/blog/8-reasons-why-your-site-might-not-get-indexed) has some additional considerations. The 6th point the post author raises is one you should pay attention to given you're asking about a large e-commerce site. Point 6 says you might not have enough Pagerank, that "the number of pages Google crawls is roughly proportional to your pagerank".
As you probably know, Google has said they're not maintaining Pagerank anymore, but the essence of the issue raised is a solid one. Google does set a crawl budget for every website and large e-commerce sites often run into situations where they run out before the entire site is indexed. You should look at your site structure, robots tagging, and as Jason McMahon says, internal linking to make sure you are directing Google to the most important pages on your site first, and that all redundant content is canonicalized or noindexed.
I'd start with that.
-
Hi Hamish_TM,
It is hard to say without knowing the exact URL but here are some things to consider:
- Indexing Lag - How long ago did you submit the sitemaps? We usually find there can be at least a few weeks lag between when the sitemaps are submitted and when all the URL's are indexed.
- Internal Linking - What does your sites internal linking structure look like? Good internal linking like having breadcrumbs, in-text links, sidebar links and siloed URL structuring can help the indexation process.
- **Sitemap Errors - **Are there currently any sitemap errors listed in Google Search Console? Either on the dashboard or in the sitemaps section? Any issues here could be adding to your problem.
Hopefully, this is of some help and let me know how you go.
Regards,
Jason.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
URLs dropping from index (Crawled, currently not indexed)
I've noticed that some of our URLs have recently dropped completely out of Google's index. When carrying out a URL inspection in GSC, it comes up with 'Crawled, currently not indexed'. Strangely, I've also noticed that under referring page it says 'None detected', which is definitely not the case. I wonder if it could be something to do with the following? https://www.seroundtable.com/google-ranking-index-drop-30192.html - It seems to be a bug affecting quite a few people. Here are a few examples of the URLs that have gone missing: https://www.ihasco.co.uk/courses/detail/sexual-harassment-awareness-training https://www.ihasco.co.uk/courses/detail/conflict-resolution-training https://www.ihasco.co.uk/courses/detail/prevent-duty-training Any help here would be massively appreciated!
Technical SEO | | iHasco0 -
Redirecting old html site to new wordpress site
Hi I'm currently updating an old (8 years old) html site to wordpress and about a month ago I redirected some url's to the new site (which is in a directory) like this... Redirect 301 /article1.htm http://mysite.net/wordpress/article1/
Technical SEO | | briandee
Redirect 301 /article2.htm http://mysite.net/wordpress/article2/
Redirect 301 /article3.htm http://mysite.net/wordpress/article3/ Google has indexed these new url's and they are showing in search results. I'm almost finished the new version of site and it is currently in a directory /wordpress I intend to move all the files from the directory to the root so new url when this is done will be http://mysite.net/article1/ etc My question is - what to I do about the redirects which are in place - do I delete them and replace with something like this? Redirect 301 /wordpress/article1/ http://mysite.net/article1/
Redirect 301 /wordpress/article2/ http://mysite.net/article2/
Redirect 301 /wordpress/article3/ http://mysite.net/article3/ Appreciate any help with this0 -
How Does Dynamic Content for a Specific URL Impact SEO?
Example URL: http://www.sja.ca/English/Community-Services/Pages/Therapy Dog Services/default.aspx The above page is generated dynamically depending on what province the visitor visits from. For example, a visitor from BC would see something quite different than a visitor from Nova Scotia; the intent is that the information shown should be relevant to the user of that province. How does this effect SEO? How (or from what location) does Googlebot decide to crawl the page? I have considered a subdirectory for each province, though that comes with its challenges as well. One such challenge is duplicate content when different provinces may have the same information for some pages. Any suggestions for this?
Technical SEO | | ey_sja0 -
Feedback needed on possible solutions to resolve indexing on ecommerce site
I’ve included the scenario and two proposed fixes I’m considering. I’d appreciate any feedback on which fixes people feel are better and why, and/or any potential issues that could be caused by these fixes. Thank you! Scenario of Problem I’m working on an ecommerce website (built on Magneto) that is having a problem getting product pages indexed by Google (and other search engines). Certain pages, like the ones I’ve included below, aren’t being indexed. I believe this is because of the way the site is configured in terms of internal linking. The site structure forces certain pages to be linked very deeply, therefore the only way for Googlebot to get to these pages is through a pagination page (such as www.acme.com/page?p=3). In addition, the link on the pagination page is really deep; generally there are more than 125 links on the page ahead of this link. One of the Pages that Google isn’t indexing: http://www.getpaper.com/find-paper/engineering-paper/bond-20-lb/430-20-lb-laser-bond-22-x-650-1-roll.html This page is linked from http://www.getpaper.com/find-paper/engineering-paper/bond-20-lb?p=5, and it is the 147<sup>th</sup> link in the source code. Potential Fixes Fix One: Add navigation tags to the template so that search engines will spend less time crawling them and will get to the deeper pages, such as the one mentioned above. Note: the navigation tags are for HTML-5; however, the Magento site in which this is built does not use HTML 5. Fix Two: Revised the Templates and CSS so that the main navigation and the sidebar navigation is on the bottom of the page rather than the top. This would put the links to the product pages in the source code ahead of the navigation links.
Technical SEO | | TopFloor0 -
Best way to manage SEO for a massive events listing website.
I run a website that tracks entertainment for the entire state of South Dakota. While I've made some fantastic strides in gaining traffic, I feel lost on how to manage all those entries in an SEO friendly manner. I have a TON of errors showing on my crawl diagnostics and I just don't know what to do. The nature of the website is such that there are going to be duplications all over the place. I know that I can help some of this by getting my canonical links setup properly (that's coming in my next version of the site's theme), but what else should I do to make those event listings friendly for the SE's?? http://www.entertainsd.com
Technical SEO | | jcherland0 -
Will Google index a site with white text? Will it give it bad ratings?
Will google not rank a site bc pretty much all the copy is white (and the background is all white)? Here's the site in question: https://www.dropbox.com/s/6w24f6h5p0zaxhg/Garrison_PLAY.vs2-static.pdf https://www.dropbox.com/sh/fwudppvwy2khpau/t43NozpG3E/Garrison_PLAY.vs3.jpg thanks--if you need me to clarify more let me know TM Humphries LocalSearched.com
Technical SEO | | CloudGuys0 -
Best way to redirect 3 sites to 1 new one.
Hi All We currently have 3 old sites that have tones of content. Due to brand/business consolidation we have merge all 3 to produce 1 website. The new site contains all the old content from the old 3. So, I know I need to 301 redirect all the old content from the previous sites to the equivelent content on the new sites but am confused how you do this with 3 domains? One of the domains is being replaced with the new site. So I have: www.domain1.co.uk www.domain2.co.uk www.domain3.co.uk All the content for all the sites have been imported into a new site and any duplicate content issues havce been resolved. Can anyone point me in the right direction? Thanks
Technical SEO | | EclipseLegal0 -
Mobile site: robots.txt best practices
If there are canonical tags pointing to the web version of each mobile page, what should a robots.txt file for a mobile site have?
Technical SEO | | bonnierSEO0