Big discrepancies between pages in Google's index and pages in sitemap
-
Hi,
I'm noticing a huge difference in the number of pages in Googles index (using 'site:' search) versus the number of pages indexed by Google in Webmaster tools. (ie 20,600 in 'site:' search vs 5,100 submitted via the dynamic sitemap.)
Anyone know possible causes for this and how i can fix?
It's an ecommerce site but i can't see any issues with duplicate content - they employ a very good canonical tag strategy. Could it be that Google has decided to ignore the canonical tag?
Any help appreciated,
Karen
-
Take a look at the pages that are indexed. Chances are that since it is a cart or CMS-based site, you just need to use robots.txt to block out some areas you don't want indexed. You also need to look at your indexed pages, to see if any of them are duplicates, meaning you have 2 or more url's that display the same content.
"It's an ecommerce site but i can't see any issues with duplicate content - they employ a very good canonical tag strategy. Could it be that Google has decided to ignore the canonical tag? "
Could be that your cms or cart is not forwarding all the pages to the canonical version. Again, check to see if you can access multiple versions of the same page. Ecom and CMS sites always have these types of errors if you dont keep a close eye on the URL's since they are database driven, vs static HTML. Look for www or non-www versions of pages, url's with and without index.php, etc.
Once you target what the offending url's are, use redirects to forward them to the proper and search engine friendly version.
-
Hi,
Thanks so much for that, it's really interesting.
I've resolved the issue now, it was a case of some (a lot!) of missing canonical tags. Phew!
Thanks for your help!
-
Sorry wrong interpretation of your question. Have you excluded the site search pages using robots.txt.? If not this might be the reason why you've that many pages indexed.
Anyway this discussion might give you more answers:
-
Just 1 at the moment.
-
Are you using just one sitemap or multiple?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Not Indexing App Content
Hello Mozzers I recently noticed that there has been an increase in crawl errors reported in Google Search console & Google has stopped indexing our app content. Could this be due to the fact that there is a mismatch between the host path name mentioned within the android deeplink (within the alternate tag) and the actual URL of the page. For instance on the following desktop page http://www.example.com.au/page-1 the android deeplink points to http://www.example.com.au/android-app://com.example/http/www.example.com.au/4652374 Please note that the content on both pages (desktop & android) is same.Is this is a correct setup or am I doing something wrong here? Any help would be much appreciated. Thank you so much in advance.
Intermediate & Advanced SEO | | InMarketingWeTrust0 -
Trouble Indexing one of our sitemaps
Hi everyone thanks for your help. Any feedback is appreciated. We have three separate sitemaps: blog/sitemap.xml events.xml sitemap.xml Unfortunately we keep trying to get our events sitemap to pickup and it just isn't happening for us. Any input on what could be going on?
Intermediate & Advanced SEO | | TicketCity0 -
Google is indexing the wrong pages
I have been having problems with Google indexing my website since mid May. I haven't made any changes to my website which is wordpress. I have a page with the title 'Peterborough Cathedral wedding', I search Google for 'wedding Peteborough Cathedral', this is not a competitive search phrase and I'd expect to find my blog post on page one. Instead, half way down page 4 I find Google has indexed www.weddingphotojournalist.co.uk/blog with the title 'wedding photojournalist | Portfolio', what google has indexed is a link to the blog post and not the blog post itself. I repeated this for several other blog posts and keywords and found similar results, most of which don't make any sense at all - A search for 'Menorca wedding photography' used to bring up one of my posts at the top of page one. Now it brings up a post titled 'La Mare wedding photography Jersey" which happens to have a link to the Menorca post at the bottom of the page. A search for 'Broadoaks country house weddng photography' brings up 'weddingphotojournalist | portfolio' which has a link to the Broadoaks post. a search for 'Blake Hall wedding photography' does exactly the same. In this case Google is linking to www.weddingphotojournalist.blog again, this is a page of recent blog posts. Could this be a problem with my sitemap? Or the Yoast SEO plugin? or a problem with my wordpress theme? Or is Google just a bit confused?
Intermediate & Advanced SEO | | weddingphotojournalist0 -
Google's Stance on "Hidden" Content
Hi, I'm aware Google doesn't care if you have helpful content you can hide/unhide by user interaction. I am also aware that Google frowns upon hiding content from the user for SEO purposes. We're not considering anything similar to this. The issue is, we will be displaying only a part of our content to the user at a time. We'll load 3 results on each page initially. These first 3 results are static, meaning on each initial page load/refresh, the same 3 results will display. However, we'll have a "Show Next 3" button which replaces the initial results with the next 3 results. This content will be preloaded in the source code so Google will know about it. I feel like Google shouldn't have an issue with this since we're allowing the user action to cycle through all results. But I'm curious, is it an issue that the user action does NOT allow them to see all results on the page at once? I am leaning towards no, this doesn't matter, but would like some input if possible. Thanks a lot!
Intermediate & Advanced SEO | | kirmeliux0 -
Ecommerce URL's
I'm a bit divided about the URL structure for ecommerce sites. I'm using Magento and I have Canonical URLs plugin installed. My question is about the URL structure and length. 1st Way: If I set up Product to have categories in the URL it will appear like this mysite.com/category/subcategory/product/ - and while the product can be in multiple places , the Canonical URL can be either short or long. The advantage of having this URL is that it shows all the categories in the breadcrumbs ( and a whole lot more links over the site ) . The disadvantage is the URL Length 2nd Way: Setting up the product to have no category in the URL URL will be mysite.com/product/ Advantage: short URL. disadvantage - doesn't show the categories in the breadcrumbs if you link direct. Thoughts?
Intermediate & Advanced SEO | | s_EOgi_Bear1 -
Two Pages with the Same Name Different URL's
I was hoping someone could give me some insight into a perplexing issue that I am having with my website. I run an 20K product ecommerce website and I am finding it necessary to have two pages for my content: 1 for content category pages about wigets one for shop pages for wigets 1st page would be .com/shop/wiget/ 2nd page would be .com/content/wiget/ The 1st page would be a catalogue of all the products with filters for the customer to narrow down wigets. So ultimately the URL for the shop page could look like this when the customer filters down... .com/shop/wiget/color/shape/ The second page would be content all about the Wigets. This would be types of wigets colors of wigets, how wigets are used, links to articles about wigets etc. Here are my questions. 1. Is it bad to have two pages about wigets on the site, one for shopping and one for information. The issue here is when I combine my content wiget with my shop wiget page, no one buys anything. But I want to be able to provide Google the best experience for rankings. What is the best approach for Google and the customer? 2. Should I rel canonical all of my .com/shop/wiget/ + .com/wiget/color/ etc. pages to the .com/content/wiget/ page? Or, Should I be canonicalizing all of my .com/shop/wiget/color/etc pages to .com/shop/wiget/ page? 3. Ranking issues. As it is right now, I rank #1 for wiget color. This page on my site would be .com/shop/wiget/color/ . If I rel canonicalize all of my pages to .com/content/wiget/ I am going to loose my rankings because all of my shop/wiget/xxx/xxx/ pages will then point to .com/content/wiget/ page. I am just finding with these massive ecommerce sites that there is WAY to much potential for duplicate content, not enough room to allow Google the ability to rank long tail phrases all the while making it completely complicated to offer people pages that promote buying. As I said before, when I combine my content + shop pages together into one page, my sales hit the floor (like 0 - 15 dollars a day), when i just make a shop page my sales are like (1k+ a day). But I have noticed that ever since Penguin and Panda my rankings have fallen from #1 across the board to #15 and lower for a lot of my phrase with the exception of the one mentioned above. This is why I want to make an information page about wigets and a shop page for people to buy wigets. Please advise if you would. Thanks so much for any insight you can give me!
Intermediate & Advanced SEO | | SKP0 -
Soft 404's from pages blocked by robots.txt -- cause for concern?
We're seeing soft 404 errors appear in our google webmaster tools section on pages that are blocked by robots.txt (our search result pages). Should we be concerned? Is there anything we can do about this?
Intermediate & Advanced SEO | | nicole.healthline4 -
Pagination Question: Google's 'rel=prev & rel=next' vs Javascript Re-fresh
We currently have all content on one URL and use # and Javascript refresh to paginate pages, and we are wondering if we transition to the Google's recommended pagination if we will see an improvement in traffic. Has anyone gone though a similar transition? What was the result? Did you see an improvement in traffic?
Intermediate & Advanced SEO | | nicole.healthline0