New SEO manager needs help! Currently only about 15% of our live sitemap (~4 million url e-commerce site) is actually indexed in Google. What are best practices sitemaps for big sites with a lot of changing content?
-
In Google Search console
4,218,017 URLs submitted
402,035 URLs indexed
what is the best way to troubleshoot?
What is best guidance for sitemap indexation of large sites with a lot of changing content?
-
Hi Hamish
I'm not sure how many products you have listed on your website but I am only guessing that it is not 4m of even 400,000. I think the question you should be asking yourself is 'do I really need so many URLs?'
If you have 50,000 products in your site then frankly you only need maybe 51000 pages in total (including support pages, brands (maybe), categories and sub-categories. I am only guessing but I would suggest that the other pages are being created by tags or other attributes and that these elements are creating acres of duplicate and very skinny content.
My usual question is - 'so you have 400,000 (never mind 4m) pages in Google? - did you write or generate 400,000 pages of useful, interesting, non-duplicate and shareable content? The answer of course is usually no.
Try switching off sets of tags and canonicalizing very similar content and you'll be amazed how it helps rankings!
Just a thought
Regards Nigel
Carousel Projects.
-
This post from Search Engine Journal (https://www.searchenginejournal.com/definitive-list-reasons-google-isnt-indexing-site/118245/) is helpful for troubleshooting.
This Moz post (https://moz.com/blog/8-reasons-why-your-site-might-not-get-indexed) has some additional considerations. The 6th point the post author raises is one you should pay attention to given you're asking about a large e-commerce site. Point 6 says you might not have enough Pagerank, that "the number of pages Google crawls is roughly proportional to your pagerank".
As you probably know, Google has said they're not maintaining Pagerank anymore, but the essence of the issue raised is a solid one. Google does set a crawl budget for every website and large e-commerce sites often run into situations where they run out before the entire site is indexed. You should look at your site structure, robots tagging, and as Jason McMahon says, internal linking to make sure you are directing Google to the most important pages on your site first, and that all redundant content is canonicalized or noindexed.
I'd start with that.
-
Hi Hamish_TM,
It is hard to say without knowing the exact URL but here are some things to consider:
- Indexing Lag - How long ago did you submit the sitemaps? We usually find there can be at least a few weeks lag between when the sitemaps are submitted and when all the URL's are indexed.
- Internal Linking - What does your sites internal linking structure look like? Good internal linking like having breadcrumbs, in-text links, sidebar links and siloed URL structuring can help the indexation process.
- **Sitemap Errors - **Are there currently any sitemap errors listed in Google Search Console? Either on the dashboard or in the sitemaps section? Any issues here could be adding to your problem.
Hopefully, this is of some help and let me know how you go.
Regards,
Jason.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Migration + Change of Address Tool used - previous site de-indexed!!
OMG disaster! Recently migrated my site womencycles.com to moonrise.health. Painstakingly went through each URL manually to map out redirects, notified Google via change of address tool. Bam. My old website has disappeared from Google and my new site has thus lost all it's organic (i.e. redirected) traffic. I don't get it. I think I have done everything by the book, but it seems my old site has disappeared and no authority or link juice has been passed to my new site by the 301s, as the new site isn't ranking either. Some examples: https://www.google.com/search?q=women+cycles&oq=women+cycles&aqs=chrome..69i57j69i65j69i61l2j69i60.1834j0j1&sourceid=chrome&ie=UTF-8 'women cycles' previous position 1
Technical SEO | | tikitaka
https://www.google.com/search?q=chaffed+vagina&oq=chaffed+vagina&aqs=chrome..69i57.2370j0j1&sourceid=chrome&ie=UTF-8 - chaffed vagina, previous position 1 https://www.google.com/search?q=how+long+does+it+take+turmeric+to+shrink+fibroids&oq=how+long+does+it+take+turmeric+to+shrink+fibroids&aqs=chrome..69i57.1355j0j1&sourceid=chrome&ie=UTF-8 - how long does it take turmeric to shrink fibroids, previous position 1. Biggest traffic source pages were: https://womencycles.com/blog/top-10-home-remedies-that-claim-to-tighten-vagina-do-they-work/
https://womencycles.com/blog/sore-breasts-after-period-has-finished/
https://womencycles.com/blog/what-is-vaginal-gas-queefing/
https://womencycles.com/blog/tired-during-ovulation/
https://womencycles.com/blog/how-to-get-rid-of-saggy-vag-without-surgery/
https://womencycles.com/blog/vagina-chafing-causes-treatments-to-prevent-it-from-coming-back/
https://womencycles.com/blog/vaginal-dryness-during-pregnancy/ New blog articles on new site, with 301 redirect in place, but not ranking Screenshot shows my search traffic for my new site. Site migrated 13 June. Any ideas anyone??!Screenshot 2022-06-28 at 13.27.41.png0 -
Redirecting old html site to new wordpress site
Hi I'm currently updating an old (8 years old) html site to wordpress and about a month ago I redirected some url's to the new site (which is in a directory) like this... Redirect 301 /article1.htm http://mysite.net/wordpress/article1/
Technical SEO | | briandee
Redirect 301 /article2.htm http://mysite.net/wordpress/article2/
Redirect 301 /article3.htm http://mysite.net/wordpress/article3/ Google has indexed these new url's and they are showing in search results. I'm almost finished the new version of site and it is currently in a directory /wordpress I intend to move all the files from the directory to the root so new url when this is done will be http://mysite.net/article1/ etc My question is - what to I do about the redirects which are in place - do I delete them and replace with something like this? Redirect 301 /wordpress/article1/ http://mysite.net/article1/
Redirect 301 /wordpress/article2/ http://mysite.net/article2/
Redirect 301 /wordpress/article3/ http://mysite.net/article3/ Appreciate any help with this0 -
Sitemap url's not being indexed
There is an issue on one of our sites regarding many of the sitemap url's not being indexed. (at least 70% is not being indexed) The url's in the sitemap are normal url's without any strange characters attached to them, but after looking into it, it seems a lot of the url's get a #. + a number sequence attached to them once you actually go to that url. We are not sure if the "addthis" bookmark could cause this, or if it's another script doing it. For example Url in the sitemap: http://example.com/example-category/0246 Url once you actually go to that link: http://example.com/example-category/0246#.VR5a Just for further information, the XML file does not have any style information associated with it and is in it's most basic form. Has anyone had similar issues with their sitemap not being indexed properly ?...Could this be the cause of many of these url's not being indexed ? Thanks all for your help.
Technical SEO | | GreenStone0 -
How to Let Google Know I am a new Site Owner and to Remove or De-value all backlinks?
I am looking to buy a new domain for a brand. Problem is the domain has was registered in 1996 and has around 6k backlinks (according to ahrefs) that I need removed as the old content will have no relevance to my new site. Should I just disavow all of them? Is there anything "special" I can do to let Google know that it will be new site owner/content and to remove/discount the current links?
Technical SEO | | RichardSEO0 -
Want to Target Mobile site for Google Mobile Version and Desktop Site for Google Desktop Version
I have ecommerce site with both mobile version and desktop version. Mobile version starts with m.example.com and full version starts with www.example.com I am using same content through out both site and using 301 redirection by detecting user agent vice-versa. My both sites are accessible to crawl by any google spider. I have submitted both sites's sitemap to GWT and mobile site having mobile sitemap xml, so google can easily recognize my mobile site. Is it going to help to rank my both sites as per my expectation? I need to rank for mobile site in Google mobile and ranking for desktop site in Google desktop version. Some of pages of my mobile site are started to appearing in Google desktop version. So how I can stop them to appear in Google desktop? Your comments are highly welcome.
Technical SEO | | Hexpress0 -
Google indexing less url's then containded in my sitemap.xml
My sitemap.xml contains 3821 urls but Google (webmaster tools) indexes only 1544 urls. What may be the cause? There is no technical problem. Why does Google index less URLs then contained in my sitemap.xml?
Technical SEO | | Juist0 -
Merging several sites into one - best practice
I had 2 sites on the web (www.physicseditor.de, www.texutrepacker.com) and decided to move them all under one single domain (www.codeandweb.com) Both sites were ranking very good for several keywords. I not redirected the most important pages from the old domains with a 301 redirect to the new subpages (www.texturepacker.com => www.codeandweb.com/texturepacker) Google still delivers the old domains but the redirect take people directly to the new content. I've already submitted the new site map to google webmaster tools. Pages are already in the index but do not really show up in the search results. How long does it take until google accepts the new domain and delivers the new content in the search results? Was it ok what I did? Or is there some room for improvement? SeoMoz will of course not find any information about the new page since it is not yet directly linked in google. But I can't get ranking information for the "old" pages since SeoMoz tells me that it can't crawl the old domains....
Technical SEO | | gossi740 -
Wordpress site, combine Blog without hurting SEO - Need Expert Advice
Hi, I come from the old html days of Frontpage and then moved to Dreamweaver. I first worked with Wordpress at version 2.7 and was not all that impressed, but then recently I worked in the new version and was extremely impressed. So my knowledge of Wordpress is VERY limited and plan to build future sites with it. I need to know the best way to solve an issue for a customer. The client is http://www.nextgenrestoration.com/ Site was built years ago with Frontpage. The popularity of Blogs was hot so someone told them that if they add new content it would be better to use a blog, so they added a blog. So you have the following: www.nextgenrestoration.com (main site) then they installed wordpress in a folder (blog) www.nextgenrestoration.com/blog Original person that built the site quit. New person took over and said the main site needed to changed to Wordpress because they did not have Frontpage and all they knew was Wordpress. Main site was converted to Wordpress. They wanted to keep the original design so they did not use a stock template, they just built it with their design. I guess from looking at the Editor, they manually went in and put the design in to match. Now.. this last month, the person that had changed
Technical SEO | | Force7
the site to Wordpress quit. So I got involved because the new person they hired could not add content to the main website. If you add a page, it does not show up, you have to manually go in the php and add the link to the category. The new person knows how to use Wordpress but she knows nothing about PHP so is lost when it comes to manually adding content to the site. Here was my Thoughts. The main site needs to be rebuilt in a stock template so it automatically creates new pages, blog posts. I have to make sure that if we change the
main website that we could keep all the same links and page names. The girl
that built the site, if you hover over the links that she put it under ‘florida’,
that must be a category. But we would need to keep the same page names. I know
we could do a 301 redirect but this guy cannot lose traffic. He is already down
in hits after the last Panda update. My thought was, rebuild the main site in a stock template so
someone can actually add content easily to the site. Also build a new blog
section so it all matches. (personally the existing design looks old and dated and needs updating) If you look at the site now. The blog looks totally
different and it is not helping if a customer comes to the blog but cannot see
the navigation for the whole site. My thought was to just leave the old blog, it has a LOT of backlinks. But just add a new blog to the main site and all new content goes there. The old blog would stay just make sure we did build in some call to action so it sends them to the main site. Also, we found we cannot create a Blog on the
wordpress we have installed in the main directory. I am guessing because it
wants to name it /blog? I want to be sure we give this client the best advice on what to do without
hurting his existing seo and traffic. As you can tell, I am not qualified to really give the best advice since I am so new to Wordpress. This is a small company that really needs some help. Thanks in advance for your time! Force70