Big discrepancies between pages in Google's index and pages in sitemap
-
Hi,
I'm noticing a huge difference in the number of pages in Googles index (using 'site:' search) versus the number of pages indexed by Google in Webmaster tools. (ie 20,600 in 'site:' search vs 5,100 submitted via the dynamic sitemap.)
Anyone know possible causes for this and how i can fix?
It's an ecommerce site but i can't see any issues with duplicate content - they employ a very good canonical tag strategy. Could it be that Google has decided to ignore the canonical tag?
Any help appreciated,
Karen
-
Take a look at the pages that are indexed. Chances are that since it is a cart or CMS-based site, you just need to use robots.txt to block out some areas you don't want indexed. You also need to look at your indexed pages, to see if any of them are duplicates, meaning you have 2 or more url's that display the same content.
"It's an ecommerce site but i can't see any issues with duplicate content - they employ a very good canonical tag strategy. Could it be that Google has decided to ignore the canonical tag? "
Could be that your cms or cart is not forwarding all the pages to the canonical version. Again, check to see if you can access multiple versions of the same page. Ecom and CMS sites always have these types of errors if you dont keep a close eye on the URL's since they are database driven, vs static HTML. Look for www or non-www versions of pages, url's with and without index.php, etc.
Once you target what the offending url's are, use redirects to forward them to the proper and search engine friendly version.
-
Hi,
Thanks so much for that, it's really interesting.
I've resolved the issue now, it was a case of some (a lot!) of missing canonical tags. Phew!
Thanks for your help!
-
Sorry wrong interpretation of your question. Have you excluded the site search pages using robots.txt.? If not this might be the reason why you've that many pages indexed.
Anyway this discussion might give you more answers:
-
Just 1 at the moment.
-
Are you using just one sitemap or multiple?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What's wrong with the algorithm?
Is it possible that Google is penalising a specific page and in the same time it shows unrelated page in the search results? "rent luxury car florence" shows https://lurento.com/city/munich/on the 2nd page (that's Munich, Germany) and in the same time completely ignores the related page https://lurento.com/city/florence/ How I can figure out if the specific page has been trashed and why? Thanks,
Intermediate & Advanced SEO | | lurento.com
Mike0 -
What’s the best way to handle multiple website languages in terms of metatags that should be used and pages sent on our sitemap?
Hey everyone, Has anyone here worked with SEO + website translations? When should we use canonical or alternate tag if we want the user to find our page on the language he used on Google? Should we send all pages on all the different locales on the sitemap? Looking forward to hearing from you! Thanks!
Intermediate & Advanced SEO | | allanformigoni0 -
Google index
Hello, I removed my site from google index From GWT Temporarily remove URLs that you own from search results, Status Removed. site not ranking well in google from last 2 month, Now i have question that what will happen if i reinclude site url after 1 or 2 weeks. Is there any chance to rank well when google re index the site?
Intermediate & Advanced SEO | | Getmp3songspk0 -
Website with only a portion being 'mobile friendly' -- what to tell Google?
I have a website for desktop that does a lot of things, and have converted part of it do show pages in a mobile friendly format based on the users device. Not responsive design, but actual diff code with different formatting by mobile vs desktop--but each still share the same page url name. Google allows this approach. The mobile-friendly part of the site is not as extensive as desktop, so there are pages that apply to the desktop but not for mobile. So the functionality is limited some for mobile devices, and therefore some pages should only be indexed for desktop users. How should that page be handled for Google crawlers? If it is given a 404 not found for their mobile bot will Google properly still crawl it for the desktop, or will Google see that the url was flagged as 'not found' and not crawl it for the desktop? I asked a similar question yest, but it was not stated clearly. Thanks,Ted
Intermediate & Advanced SEO | | friendoffood0 -
Does hiding responsive design elements on smaller media types impact Google's mobile crawler?
I have a responsive site and we hide elements on smaller media types. For example, we have an extensive sitemap in the footer on desktop, but when you shrink the viewport to mobile we don't show the footer. Does this practice make Google's mobile bot crawler much less efficient and therefore impact our mobile search rankings?
Intermediate & Advanced SEO | | jcgoodrich1 -
Killing 404 errors on our site in Google's index
Having moved a site across to Magento, obviously re-directs were a large part of that, ensuring all the old products and categories linked up correctly with the new site structure. However, we came up against an issue where we needed to add, delete, then re-add products. This, coupled with a misunderstanding of the csv upload processing, meant that although the old urls redirected, some of the new Magento urls changed and then didn't redirect: For Example: mysite/product would get deleted re-added and become: mysite/product-1324 We now know what we did wrong to ensure it doesn't continue to happen if we weret o delete and re-add a product, but Google contains all these old URLs in its index which has caused people to search for products on Google, click through, then land on the 404 page - far from ideal. We kind of assumed, with continual updating of sitemaps and time, that Google would realise and update the URL accordingly. But this hasn't happened - we are still getting plenty of 404 errors on certain product searches (These aren't appearing in SEOmoz, there are no links to the old URL on the site, only Google, as the index contains the old URL). Aside from going through and finding the products affected (no easy task), and setting up redirects for each one, is there any way we can tell Google 'These URLs are no longer a thing, forget them and move on, let's make a fresh start and Happy New Year'?
Intermediate & Advanced SEO | | seanmccauley0 -
Why will google not index my pages?
About 6 weeks ago we moved a subcategory out to becomne a main category using all the same content. We also removed 100's of old products and replaced these with new variation listings to remove duplicate content issues. The problem is google will not index 12 critcal pages and our ranking have slumped for the keywords in the categories. What can i do to entice google to index these pages?
Intermediate & Advanced SEO | | Towelsrus0 -
Removing pages from index
Hello, I run an e-commerce website. I just realized that Google has "pagination" pages in the index which should not be there. In fact, I have no idea how they got there. For example, www.mydomain.com/category-name.asp?page=3434532
Intermediate & Advanced SEO | | AlexGop
There are hundreds of these pages in the index. There are no links to these pages on the website, so I am assuming someone is trying to ruin my rankings by linking to the pages that do not exist. The page content displays category information with no products. I realize that its a flaw in design, and I am working on fixing it (301 none existent pages). Meanwhile, I am not sure if I should request removal of these pages. If so, what is the best way to request bulk removal. Also, should I 301, 404 or 410 these pages? Any help would be appreciated. Thanks, Alex0