Big discrepancies between pages in Google's index and pages in sitemap
-
Hi,
I'm noticing a huge difference in the number of pages in Googles index (using 'site:' search) versus the number of pages indexed by Google in Webmaster tools. (ie 20,600 in 'site:' search vs 5,100 submitted via the dynamic sitemap.)
Anyone know possible causes for this and how i can fix?
It's an ecommerce site but i can't see any issues with duplicate content - they employ a very good canonical tag strategy. Could it be that Google has decided to ignore the canonical tag?
Any help appreciated,
Karen
-
Take a look at the pages that are indexed. Chances are that since it is a cart or CMS-based site, you just need to use robots.txt to block out some areas you don't want indexed. You also need to look at your indexed pages, to see if any of them are duplicates, meaning you have 2 or more url's that display the same content.
"It's an ecommerce site but i can't see any issues with duplicate content - they employ a very good canonical tag strategy. Could it be that Google has decided to ignore the canonical tag? "
Could be that your cms or cart is not forwarding all the pages to the canonical version. Again, check to see if you can access multiple versions of the same page. Ecom and CMS sites always have these types of errors if you dont keep a close eye on the URL's since they are database driven, vs static HTML. Look for www or non-www versions of pages, url's with and without index.php, etc.
Once you target what the offending url's are, use redirects to forward them to the proper and search engine friendly version.
-
Hi,
Thanks so much for that, it's really interesting.
I've resolved the issue now, it was a case of some (a lot!) of missing canonical tags. Phew!
Thanks for your help!
-
Sorry wrong interpretation of your question. Have you excluded the site search pages using robots.txt.? If not this might be the reason why you've that many pages indexed.
Anyway this discussion might give you more answers:
-
Just 1 at the moment.
-
Are you using just one sitemap or multiple?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Not Indexing App Content
Hello Mozzers I recently noticed that there has been an increase in crawl errors reported in Google Search console & Google has stopped indexing our app content. Could this be due to the fact that there is a mismatch between the host path name mentioned within the android deeplink (within the alternate tag) and the actual URL of the page. For instance on the following desktop page http://www.example.com.au/page-1 the android deeplink points to http://www.example.com.au/android-app://com.example/http/www.example.com.au/4652374 Please note that the content on both pages (desktop & android) is same.Is this is a correct setup or am I doing something wrong here? Any help would be much appreciated. Thank you so much in advance.
Intermediate & Advanced SEO | | InMarketingWeTrust0 -
Google Webmaster Tools -> Sitemap suddent "indexed" drop
Hello MOZ, We had an massive SEO drop in June due to unknown reasons and we have been trying to recover since then. I've just noticed this yesterday and I'm worried. See: http://imgur.com/xv2QgCQ Could anyone help by explaining what would cause this sudden drop and what does this drop translates to exactly? What is strange is that our index status is still strong at 310 pages, no drop there: http://imgur.com/a1sRAKo And when I do search on google site:globecar.com everything seems normal see: http://imgur.com/O7vPkqu Thanks,
Intermediate & Advanced SEO | | GlobeCar0 -
Google is not indexing an updated website
We just relaunched a website that has 5 years old, we maintain all the old URLs and articles but for some reason google is not picking up the new website https://www.navisyachts.com. In Google Webmaster Tools we can see the sitemap with over 1000 pages submitted but shows nothing as indexed. The site is loosing traffic rapidly and positions, from the SEO side all looks fine for me. What can be wrong? I’ll appreciate any help. The new website is built over Joomla 3.4, we have it here at MOZ and other than some minor details it doesn't show that something can be wrong with the website. Thank you.
Intermediate & Advanced SEO | | FWC_SEO0 -
Google indexing wrong pages
We have a variety of issues at the moment, and need some advice. First off, we have a HUGE indexing issue across our entire website. Website in question: http://www.localsearch.com.au/ Firstly
Intermediate & Advanced SEO | | localdirectories
In Google.com.au, if you search for 'plumbers gosford' (https://www.google.com.au/#q=plumbers+gosford), the wrong page appears - in this instance, the page ranking should be http://www.localsearch.com.au/Gosford,NSW/Plumbers I can see this across the board, across multiple locations. Secondly
Recently I've seen Google reporting in 'Crawl Errors' in webmaster tools URLs such as:
http://www.localsearch.com.au/Saunders-Beach,QLD/Electronic-Equipment-Sales-Repairs&Sa=U&Ei=xs-XVJzAA9T_YQSMgIHQCw&Ved=0CIMBEBYwEg&Usg=AFQjCNHXPrZZg0JU3O4yTGjWbijon1Q8OA This is an invalid URL, and more specifically, those query strings seem to be referrer queries from Google themselves: &Sa=U&Ei=xs-XVJzAA9T_YQSMgIHQCw&Ved=0CIMBEBYwEg&Usg=AFQjCNHXPrZZg0JU3O4yTGjWbijon1Q8OA Here's the above example indexed in Google: https://www.google.com.au/#q="AFQjCNHXPrZZg0JU3O4yTGjWbijon1Q8OA" Does anyone have any advice on those 2 errors?0 -
Google's Exact Match Algorithm Reduced Our Traffic!
Google's first Panda de-valued our Web store, www.audiobooksonline.com, and our traffic went from 2500 - 3000 (mostly organic referrals) per month to 800 - 1000. Google's under-valuing of our Web store continued to reduce our traffic to 400-500 for the past few months. From 4/5/2013 to 4/6/2013 our traffic dropped 50% more, because (I believe) of Google's "exact domain match" algorithm implementation. We were, even after Panda and up to 4/5/2013 getting a significant amount of organic traffic for search terms such as "audiobooks online," "audio books online," and "online audiobooks." We no longer get traffic for these generic keywords. What I don't understand is why a UK company, www.audiobooksonline.co.uk/, with a very similar domain name, ranks #5 for "audio books online" and #4 for "audiobooks online" while we've almost disappeared from Google rankings. By any measurement I am aware of, our site should rank higher than audiobooksonline.co.uk. Market Samurai reports for "audio books online" and "audiobooks online" shows that our Web store is significantly "stronger" than audiobooksonline.co.uk but they show up on Google's first page and we are down several pages. I also checked a few titles on audiobooksonline.co.uk and confirmed they are using the same publisher descriptions we and many other online book / audiobook merchants do = duplicate content. We have never received notice that our Web store was being penalized. Why would audiobooksonline.co.uk rank so much higher than audiobooksonline.com? Does Google treat non-USA sites different than USA sites?
Intermediate & Advanced SEO | | lbohen0 -
Why is my XML sitemap ranking on the first page of google for 100s of key words versus the actual relevant page?
I still need this question answerd and I know it's something I must have changed. But google is ranking my sitemap for 100s of key terms versus the actual page. It's great to be on the first page but not my site map...... Geeeez.....
Intermediate & Advanced SEO | | ursalesguru0 -
Why is a page with a noindex code being indexed?
I was looking through the pages indexed by Google (with site:www.mywebsite.com) and one of the results was a page with "noindex, follow" in the code that seems to be a page generated by blog searches. Any ideas why it seems to be indexed or how to de-index it?
Intermediate & Advanced SEO | | theLotter0 -
Questions regarding Google's "improved url handling parameters"
Google recently posted about improving url handling parameters http://googlewebmastercentral.blogspot.com/2011/07/improved-handling-of-urls-with.html I have a couple questions: Is it better to canonicalize urls or use parameter handling? Will Google inform us if it finds a parameter issue? Or, should we have a prepare a list of parameters that should be addressed?
Intermediate & Advanced SEO | | nicole.healthline0