New SEO manager needs help! Currently only about 15% of our live sitemap (~4 million url e-commerce site) is actually indexed in Google. What are best practices sitemaps for big sites with a lot of changing content?
-
In Google Search console
4,218,017 URLs submitted
402,035 URLs indexed
what is the best way to troubleshoot?
What is best guidance for sitemap indexation of large sites with a lot of changing content?
-
Hi Hamish
I'm not sure how many products you have listed on your website but I am only guessing that it is not 4m of even 400,000. I think the question you should be asking yourself is 'do I really need so many URLs?'
If you have 50,000 products in your site then frankly you only need maybe 51000 pages in total (including support pages, brands (maybe), categories and sub-categories. I am only guessing but I would suggest that the other pages are being created by tags or other attributes and that these elements are creating acres of duplicate and very skinny content.
My usual question is - 'so you have 400,000 (never mind 4m) pages in Google? - did you write or generate 400,000 pages of useful, interesting, non-duplicate and shareable content? The answer of course is usually no.
Try switching off sets of tags and canonicalizing very similar content and you'll be amazed how it helps rankings!
Just a thought
Regards Nigel
Carousel Projects.
-
This post from Search Engine Journal (https://www.searchenginejournal.com/definitive-list-reasons-google-isnt-indexing-site/118245/) is helpful for troubleshooting.
This Moz post (https://moz.com/blog/8-reasons-why-your-site-might-not-get-indexed) has some additional considerations. The 6th point the post author raises is one you should pay attention to given you're asking about a large e-commerce site. Point 6 says you might not have enough Pagerank, that "the number of pages Google crawls is roughly proportional to your pagerank".
As you probably know, Google has said they're not maintaining Pagerank anymore, but the essence of the issue raised is a solid one. Google does set a crawl budget for every website and large e-commerce sites often run into situations where they run out before the entire site is indexed. You should look at your site structure, robots tagging, and as Jason McMahon says, internal linking to make sure you are directing Google to the most important pages on your site first, and that all redundant content is canonicalized or noindexed.
I'd start with that.
-
Hi Hamish_TM,
It is hard to say without knowing the exact URL but here are some things to consider:
- Indexing Lag - How long ago did you submit the sitemaps? We usually find there can be at least a few weeks lag between when the sitemaps are submitted and when all the URL's are indexed.
- Internal Linking - What does your sites internal linking structure look like? Good internal linking like having breadcrumbs, in-text links, sidebar links and siloed URL structuring can help the indexation process.
- **Sitemap Errors - **Are there currently any sitemap errors listed in Google Search Console? Either on the dashboard or in the sitemaps section? Any issues here could be adding to your problem.
Hopefully, this is of some help and let me know how you go.
Regards,
Jason.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Our clients Magento 2 site has lots of obsolete categories. Advice on SEO best practice for setting server level redirects so I can delete them?
Our client's Magento website has been running for at least a decade, so has a lot of old legacy categories for Brands they no longer carry. We're looking to trim down the amount of unnecessary URL Redirects in Magento, so my question is: Is there a way that is SEO efficient to setup permanent redirects at a server level (nginx) that Google will crawl to allow us at some point to delete the categories and Magento URL Redirects? If this is a good practice can you at some point then delete the server redirects as google has marked them as permanent?
Technical SEO | | Breemcc0 -
Getting 'Indexed, not submitted in sitemap' for around a third of my site. But these pages ARE in the sitemap we submitted.
As in the title, we have a site with around 40k pages, but around a third of them are showing as "Indexed, not submitted in sitemap" in Google Search Console. We've double-checked the sitemaps we have submitted and the URLs are definitely in the sitemap. Any idea why this might be happening? Example URL with the error: https://www.teacherstoyourhome.co.uk/german-tutor/Egham Sitemap it is located on: https://www.teacherstoyourhome.co.uk/sitemap-subject-locations-surrey.xml
Technical SEO | | TTYH0 -
Duplicate Content - Different URLs and Content on each
Seeing a lot of duplicate content instances of seemingly unrelated pages. For instance, http://www.rushimprint.com/custom-bluetooth-speakers.html?from=topnav3 is being tracked as a duplicate of http://www.rushimprint.com/custom-planners-diaries.html?resultsperpg=viewall. Does anyone else see this issue? Is there a solution anyone is aware of?
Technical SEO | | ClaytonKendall0 -
Any tips to maximize the SEO benefit of a 3d model embed service (e.g Matterport) for a realty site?
Hi All, My client grabbed a Matterport camera and I'm trying to get the most out of the embedded models from an SEO standpoint. Other than slapping alt text on iframes, the only thing I turned up was a wordpress plugin that appears to point links you share of the embed (not just the page it is on) back to your own website instead of matterport's. Any thoughts appreciated.
Technical SEO | | JFA0 -
Moved a site and changed URL structures: Looking for help with pay
Hi Gents and Ladies Before I get started, here is the website in question. www.moldinspectiontesting.ca. I apologize in advance if I miss any important or necessary details. This might actually seem like several disjointed thoughts. It is very late where I am and I am a very exhausted. No on to this monster of a post. **The background story: ** My programmer and I recently moved the website from a standalone CMS to Wordpress. The owners of the site/company were having major issues with their old SEO/designer at the time. They felt very abused and taken by this person (which I agree they were - financially, emotionally and more). They wanted to wash their hands of the old SEO/designer completely. They sought someone out to do a minor redesign (the old site did look very dated) and transfer all of their copy as affordably as possible. We took the job on. I have my own strengths with SEO but on this one I am a little out of my element. Read on to find out what that is. **Here are some of the issues, what we did and a little more history: ** The old site had a terribly unclean URL structure as most of it was machine written. The owners would make changes to one central location/page and the old CMS would then generate hundreds of service area pages that used long, parameter heavy url's (along with duplicate content). We could not duplicate this URL structure during the transfer and went with a simple, clean structure. Here is an example of how we modified the url's... Old: http://www.moldinspectiontesting.ca/service_area/index.cfm?for=Greater Toronto Area New: http://www.moldinspectiontesting.ca/toronto My programmer took to writing 301 redirects and URL rewrites (.htaccess) for all their service area pages (which tally in the hundreds). As I hinted to above, the site also suffers from a overwhelming amount of duplicate copy which we are very slowly modifying so that it becomes unique. It's also currently suffering from a tremendous amount of keyword cannibalization. This is also a result of the old SEO's work which we had to transfer without fixing first (hosting renewal deadline with the old SEO/designer forced us to get the site up and running in a very very short window). We are currently working on both of these issues now. SERPs have been swinging violently since the transfer and understandably so. Changes have cause and effect. I am bit perplexed though. Pages are indexed one day and ranking very well locally and then apparently de-indexed the next. It might be worth noting that they had some de-index problems in the months prior to meeting us. I suspect this was in large part to the duplicate copy. The ranking pages (on a url basis) are also changing up. We will see a clean url rank and then drop one week and then an unclean version rank and drop off the next (for the same city, same web search). Sometimes they rank along side each other. The terms they want to rank for are very easy to rank on because they are so geographically targeted. The competition is slim in many cases. This time last year, they were having one of the best years in the company's 20+ year history (prior to being de-indexed). **On to the questions: ** **What should we do to reduce the loss in these ranked pages? With the actions we took, can I expect the old unclean url's to drop off over time and the clean url's to pick up the ranks? Where would you start in helping this site? Is there anything obvious we have missed? I planned on starting with new keyword research to diversify what they rank on and then following that up with fresh copy across the board. ** If you are well versed with this type of problem/situation (url changes, index/de-index status, analyzing these things etc), I would love to pick your brain or even bring you on board to work with us (paid).
Technical SEO | | mattylac0 -
Why did Google stop indexing my site?
Google used to crawl my site every few minutes. Suddenly it stopped and the last week it indexed 3 pages out of thousands. https://www.google.co.il/#q=site:www.yetzira.com&source=lnt&tbs=qdr:w&sa=X&ei=I9aTUfTTCaKN0wX5moCgAw&ved=0CBgQpwUoAw&bav=on.2,or.r_cp.r_qf.&fp=cfac44f10e55f418&biw=1829&bih=938 What could cause this to happen and how can I solve this problem? Thanks!
Technical SEO | | JillB20130 -
Filter Tag Duplicate Content E-Commerce Issue
Hello, I just launched a new site for a client but am seeing some duplicate content issues in the campaign crawl. It has to do with the drill-down, filter "tags" that helps users find the product they are looking for. You can see them in the sidebar here: http://www.ssmd.com/shop/ In my crawl report this is what is showing up as duplicate content (attached image). How do I keep these widgets from generating duplicate content on the site? Also, not sure if it's important or not, but I am using Wordpress, WooCommerce and Yoast's SEO Tool. Any suggestions are appreciated! Screen%20Shot%202012-10-23%20at%202.56.00%20PM.png
Technical SEO | | kylehungate0 -
What are the SEOmoz-suggested best practices for limiting the number of 301 redirects for a given site?
I've read some vague warnings of potential problems with having a long list of 301 redirects within an htaccess file. If this is a problem, could you provide any guidance on how much is too much? And if there is a problem associated with this, what is that problem exactly?
Technical SEO | | roush0