Huge Google index on E-commerce site
-
Hi Guys,
I got a question which i can't understand.
I'm working on a e-commerce site which recently got a CMS update including URL updates.
We did a lot of 301's on the old url's (around 3000 /4000 i guess) and submitted a new sitemap (around 12.000 urls, of which 10.500 are indexed).The strange thing is.. When i check the indexing status in webmaster tools Google tells me there are over 98.000 url's indexed.
Doing the site:domainx.com Google tells me there are 111.000 url's indexed.Another strange thing which another forum member describes here :
And next to that old url's (which have a 301 for about a month now) keep showing up in the index.
Does anyone know what i could do to solve the problem?
-
Allright guys, thanks alot for the answers.
Gonna try some things out coming monday.
Canonical url's and pagination (rel=prev) will work i guess.
The hard part is, i'm working on this site with a development company that tells me they can url redirect all the 404's to the homepage while they must be redirected either to other products or category pages.
So only solution is that i have to do that by hand, one by one via a tool they build. But it's a hell of a job!
@ Andy , I checked it and it actually says :
Total indexed : 98.000
Ever crawled: 929.762And when i check the questionmark at total indexed it says:
Total number of url's added to Google index.Thanks again for your answers
-
something to check would be in WMT if you go to the advanced section of the index status chart you should see currently in the index and ever indexed, it sounds like you are just seeing the ever indexed number which could be huge for almost any website.
-
We had similar issues with too many indexed pages (about 100,000 pages) for a site with about 3500 pages.
By setting a canonical url on each page and also preventing google from indexing and crawling some of the urls (robots.txt and meta noindex) we are now down to 3500 urls, The benefit is (besides less duplicate content), much faster indexing of new pages.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
-
Hi,
A couple of things could be and probably are at work in this situation.
1. For the 301 redirects, if the site is big (12000 urls), depending on how often and much google crawls the site it could easily take more than a month for it to find and identify all the new urls/301 redirects etc and then update its cache of indexed pages. So in this case its is a matter of patience. If the 301s are implemented correctly, they will eventually be indexed.
2. You have done 3 or 4000 301s, for the rest of the the old 12000 urls what are you showing, a 404? It is a big undertaking to redirect that many pages, but worth thinking about the technical side of what is happening, part of your 98000 indexed urls could be a mix of old and new if the old ones are not being redirected to a page that clearly states that they are either somewhere else (301) or no longer available (404).
3. A common problem with e-shops is duplicate content due to various things like product filters, search string variables etc that are going to pages that are indexable and do not have rel canonical tags. A good way to see if this is the case is to search for likely url parts in your cms that could lead to this issue (maybe you have filters that result in urls like xxx?search=123 or xxx?manufacturer=23 etc) and then do a google search along the lines of site:xxx.com inurl:manufacturer which should give a good idea of if/where you have this problem. This case of duplicate content could be even more pronounced if it was occurring on your old cms urls AND your new cms urls and a combination of these are in your 98000 total.
Hope that helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Website dropped out from Google index
Howdy, fellow mozzers. I got approached by my friend - their website is https://www.hauteheadquarters.com She is saying that they dropped from google index over night - and, as you can see if you google their name, website url or even site: , most of the pages are not indexed. Home page is nowhere to be found - that's for sure. I know that they were indexed before. Google webmaster tools don't have any manual actions (at least yet). No sudden changes in content or backlink profile. robots.txt has some weird rule - disallow everything for EtaoSpider. I don't know if google would listen to that - robots checker in GWT says it's all good. Any ideas why that happen? Any ideas what I should check? P.S. Just noticed in GWT there was a huge drop in indexed pages within first week of August. Still no idea why though. P.P.S. Just noticed that there is noindex x-robots-tag in headers... Anyone knows where this can be set?
Intermediate & Advanced SEO | | DmitriiK0 -
Lazy Loading of products on an E-Commerce Website - Options Needed
Hi Moz Fans. We are in the process of re-designing our product pages and we need to improve the page load speed. Our developers have suggested that we load the associated products on the page using Lazy Loading, While I understand this will certainly have a positive impact on the page load speed I am concerned on the SEO impact. We can have upwards of 50 associated products on a page so need a solution. So far I have found the following solution online which uses Lazy Loading and Escaped Fragments - The concern here is from serving an alternate version to search engines. The solution was developed by Google not only for lazy loading, but for indexing AJAX contents in general.
Intermediate & Advanced SEO | | JBGlobalSEO
Here's the official page: Making AJAX Applications Crawlable. The documentation is simple and clear, but in a few words the solution is to use slightly modified URL fragments.
A fragment is the last part of the URL, prefixed by #. Fragments are not propagated to the server, they are used only on the client side to tell the browser to show something, usually to move to a in-page bookmark.
If instead of using # as the prefix, you use #!, this instructs Google to ask the server for a special version of your page using an ugly URL. When the server receives this ugly request, it's your responsibility to send back a static version of the page that renders an HTML snapshot (the not indexed image in our case). It seems complicated but it is not, let's use our gallery as an example. Every gallery thumbnail has to have an hyperlink like: http://www.idea-r.it/...#!blogimage=<image-number></image-number> When the crawler will find this markup will change it to
http://www.idea-r.it/...?_escaped_fragment_=blogimage=<image-number></image-number> Let's take a look at what you have to answer on the server side to provide a valid HTML snapshot.
My implementation uses ASP.NET, but any server technology will be good. var fragment = Request.QueryString[``"_escaped_fragment_"``];``if (!String.IsNullOrEmpty(fragment))``{``var escapedParams = fragment.Split(``new``[] { ``'=' });``if (escapedParams.Length == 2)``{``var imageToDisplay = escapedParams[1];``// Render the page with the gallery showing ``// the requested image (statically!)``...``}``} What's rendered is an HTML snapshot, that is a static version of the gallery already positioned on the requested image (server side).
To make it perfect we have to give the user a chance to bookmark the current gallery image.
90% comes for free, we have only to parse the fragment on the client side and show the requested image if (window.location.hash)``{``// NOTE: remove initial #``var fragmentParams = window.location.hash.substring(1).split(``'='``);``var imageToDisplay = fragmentParams[1]``// Render the page with the gallery showing the requested image (dynamically!)``...``} The other option would be to look at a recommendation engine to show a small selection of related products instead. This would cut the total number of related products down. The concern with this one is we are removing a massive chunk of content from he existing pages, Some is not the most relevant but its content. Any advice and discussion welcome 🙂0 -
Google suddenly indexing and displaying URLs that haven't existed for years?
We recently noticed google is showing approx 23,000 indexed .jsp urls for our site. These are ancient pages that haven't existed in years and have long been 301 redirected to valid urls. I'm talking 6 years. Checking the serps the other day (and our current SEOMoz pro campaign), I see that a few of these urls are now replacing our correct ones in the serps for important, competitive phrases. What the heck is going on here? Is Google suddenly ignoring rewrite rules and redirects? Here's an example of the rewrite rules that we've used for 6+ years: RewriteRule ^(.*)/xref_interlux_antifoulingoutboards&keels.jsp$ $1/userportal/search_subCategory.do?categoryName=Bottom%20Paint&categoryId=35&refine=1&page=GRID [R=301] Now, this 'bottom paint' url has been incredibly stable in the serps for over a half decade. All of a sudden, a google search for 'bottom paint' (no quotes) brings up the jsp page at position 2-3. This is just one example of something very bizarre happening. Has anyone else had something similar happen lately? Thank You <colgroup><col width="64"></colgroup>
Intermediate & Advanced SEO | | jamestown
| RewriteRule ^(.*)/xref_interlux_antifoulingoutboards&keels.jsp$ $1/userportal/search_subCategory.do?categoryName=Bottom%20Paint&categoryId=35&refine=1&page=GRID [R=301] |0 -
Why is Google Still Penalizing My Site?
We got hit pretty hard by Penguin. There were some bad link issues which we've cleared up and we also had a pretty unique situation stemming from about a year ago when we changed the name of the company and created a whole new site with similar content under a different URL. We used the same phone number and address, and left the old site up as it was still performing well. Google didn't care for that so we eventually used 301 redirects to push the link juice from the old site to the new site. That's the background, here's the problem...... We've partially recovered, but there are several keywords that haven't come back anywhere near where they were in Google. We have higher page rank and more links than our competition and are performing in the top 5 for some of our keywords. Other, similar keywords, where we used to be in the top 5, we are now down on page 4 or 5. Our website is www.hudsoncabinetrydesign.com. We build custom cabinetry and furniture in Westchester County, NY just north of NYC. Examples - For "custom built-ins new york" we are number 3 on Google, number 1 on Bing/Yahoo. For "custom kitchen cabinetry ny" we are number 3 on Bing/Yahoo, not in the top 50 on Google. For "custom radiator covers ny" we used to be #1 on Google, are currently #48, currently #2 on Bing/Yahoo. Obviously, we've done something to upset the Google, but we've run out of ideas as to what it could be. Any ideas as to what is going on? Thanks so much for your feedback, Doug B.
Intermediate & Advanced SEO | | doug_b0 -
Website is not getting indexed in Google! Not sure why?
I just came up with my new blog, its not live yet but the 1<sup>st</sup> landing page is ready, up and running… all is fine but here is the only problem is its not getting indexed in Google and I am not really sure why? .xml sitemap is there Google webmaster and analytics are there Website contain at least that much real social shares that it should get indexed in Google Few Links may be coming from Famous Bloggers and SEOmoz (both sites are very authentic in their respective domains) It’s the 4 day the website is up I don’t think website is not getting indexed in Google just because it contains 1 landing page and a thank you page! Any clue or help will be appreciated. www.setalks.com is the domain
Intermediate & Advanced SEO | | MoosaHemani0 -
Can you see the 'indexing rules' that are in place for your own site?
By 'index rules' I mean the stipulations that constitute whether or not a given page will be indexed. If you can see them - how?
Intermediate & Advanced SEO | | Visually0 -
How to let Search engines index login-first SNS sites?
What's the Effective way to let major search engine to index Login-first SNS sites? the reason of asking that is because i saw a search engines index Millon of SNS pages but most of them requested to login, how search engine get through this? http://www.baidu.com/s?wd=site%3Akaixin001.com&pn=50 thanks Boson
Intermediate & Advanced SEO | | DarwinChinaSEO0 -
Image optimization for e-commerce
Regarding image optimization for an ecommerce site.
Intermediate & Advanced SEO | | triplelootz
In your "category" pages you list your products with a small thumbnails / miniature image. When the user clicks on the product name or on the thumnails, he lands on the product page with the real size product image. How do you optimize the thumbnail image? Do you use a different ALT? Is Google smart enough to index the real size image? On one hand the image located on the "product" page has lot more content around, is bigger & more interesting for both the user and Google. On the other hand the "category" page has more autority ( links) than the product page... To reformulate my questions: Do you think ALT tag is important for your thumbnail image on your category pages. Do you write different ALT tag for your thumbnail image ( on your category pages) & and your real size image (on your product page)? Which ALT tag / image do you think is the most interesting for Google? What do you think? Cheers, Ludo0