Huge Google index on E-commerce site
-
Hi Guys,
Refering back to my original post I would first like to thank you guys for all the advice.
We implemented canonical url's all over the site and noindexed some url's with robots.txt and the site already went from 100.000+ url's indexed to 87.000 urls indexed in GWT.
My question: Is there way to speed this up?
I do know about the way to remove url's from index (with noindex of robots.txt condition) but this is a very intensive way to do so.I was hoping you guys maybe have a solution for this..
-
Hi,
A few weeks later now and index is now on 63.000 url's so that's a good thing.
Another weird thing is the following.
There's a (old) url still in the index. When i visit it redirects me to the new url, which is good. Cache date is 2 weeks ago but Google still shows the old url.
How is this possible? The 301 redirect is already in place since April 2013.
-
Hi allen Jarosz!
Thanks for your reply
I've actually done all the things you said in the last few weeks. Site is totally indexed but the main problem is that are over 85.000 url's indexed but the site only exists of 13.000 urls.
So the main question is wether i can speed things up in one way or another to get those 70.000 url's deindexed.Are any options besides noindex, robots.txt and removing some url's ? Because now it's just waiting.
It looks like we are going the right way when you check the image.
-
SSiebn,
I have had some success in speeding things up, but only to a point.
Google webmaster tools is a GREAT tool that fortunately for us Google allows us to use, and its free!
I'm sure you probably already use the service, but I have found a few ways to use the tools to improve their scan rate. First block the spiders from crawling any pages you don't want indexed, for instance your backend files, this allows more time to be spent on the pages you want indexed. Second ensure you pages link to each other in the site, this allows pages to be linked by flowing through to each other, (no dead ends). Third use "Fetch as Google" from WMT, you are allowed up to 10 fetches. These fetches can be configured to follow linking pages, once crawled, you may submit the results to the Google index, with up to 500 fetches. It may be beneficial to submit for "Fetch as Google" your main categories. Lastly check your "Crawl Rate" to ensure that you have chosen "<label for="recommendedType">Let Google optimize for my site (recommended)</label>"
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Trying to get Google to stop indexing an old site!
Howdy, I have a small dilemma. We built a new site for a client, but the old site is still ranking/indexed and we can't seem to get rid of it. We setup a 301 from the old site to the new one, as we have done many times before, but even though the old site is no longer live and the hosting package has been cancelled, the old site is still indexed. (The new site is at a completely different host.) We never had access to the old site, so we weren't able to request URL removal through GSC. Any guidance on how to get rid of the old site would be very appreciated. BTW, it's been about 60 days since we took these steps. Thanks, Kirk
Intermediate & Advanced SEO | | kbates0 -
Why did Google cache & index a different domain than my own?
We own www.homemenorca.com, a real estate website based in Spain. Pages from this domain are not being indexed: https://www.google.com/search?q=site%3Awww.homemenorca.com&oq=site%3Awww.homemenorca.com&aqs=chrome..69i57j69i58j69i59l2.3504j0j7&sourceid=chrome&ie=UTF-8Please notice that the URLs are Home Menorca, but the titles are not Home Menorca, they are Fincas Mantolan, a completely different domain and company: http://www.fincasmantolan.com/. Furthermore, when we look at Google's cache of Home Menorca, we see a different website: http://webcache.googleusercontent.com/search?q=cache%3Awww.homemenorca.com%2Fen&oq=cache%3Awww.homemenorca.com%2Fen&aqs=chrome..69i57j69i58j69i59.1311j0j4&sourceid=chrome&ie=UTF-8We reviewed Google Search Console, Google Fetch, the canonical tags, the XML sitemap, and many more items. Google Search Console accepted our XML sitemap, but is only indexing 5-10% of the pages. Google is fetching and rendering the pages properly. However, we are not seeing the correct content being indexed in Google. We have seen issues with page loading times, loading content longer than 4 seconds, but are unsure why Google would be indexing a different domain.If you have suggestions or thoughts, we would very much appreciate it.Additional Language Issue:When a user searches "Home Menorca" from America or the UK with "English" selected in their browser as their default language, they are given a Spanish result. It seems to have accurate hreflang annotations within the head section on the HTML pages, but it is not working properly. Furthermore, Fincas Mantolan's search result is listed immediately below Home Menorca's Spanish result. We believe that if we fix the issue above, we will also fix the language issue. Please let us know any thoughts or recommendations that can help us. Thank you very much!
Intermediate & Advanced SEO | | CassG12340 -
Google Rich Snippets in E-commerce Category Pages
Hello Best Practice for rich snippets / structured data in ecommerce category pages? I put structured markup in the category pages and it seems to have negatively impacted SEO. Webmaster tools is showing about 2.5:1 products to pages ratio. Should I be putting structured data in category Pages at all? Thanks for your time 🙂
Intermediate & Advanced SEO | | s_EOgi_Bear0 -
Only 285 of 2,266 Images Indexed by Google
Only 285 of 2,266 Images Indexed by Google. Images for our site are hosted on Amazons CDN cloud based hosting service. Our Wordpress site is on a virtual private server and has its' own IP address. The number of indexed images has dropped substantially in the last year. Our site is for a real estate brokerage firm. There are about 250 listing pages set to "no-index". Perhaps these contain 400 photos, so they do not account for why so few photos have been indexed. The concern is that the low number of indexed images could be affecting overall ranking. The site URL is www.nyc-officespace-leader.com. Is this issue something that we should be concerned about? Thanks,
Intermediate & Advanced SEO | | Kingalan1
Alan0 -
How to make sure dev site is not index in wordpress and how would it be affected?
hi guys! I'm currently having a dev version of my site (dev.website.com) that once everything is done i would move the dev to the public domain (website.com). But since is a total duplicate content of my real site would it affect the seo? if so, i tried setting the reading privacy in wordpress so google would not index it but im afraid when i live it in the future and revert the setting back to normal it would affect the site seo. any opinion and suggestion on this?
Intermediate & Advanced SEO | | andrewwatson920 -
Investigating Google's treatment of different pages on our site - canonicals, addresses, and more.
Hey all - I hesitate to ask this question, but have spent weeks trying to figure it out to no avail. We are a real estate company and many of our building pages do not show up for a given address. I first thought maybe google did not like us, but we show up well for certain keywords 3rd for Houston office space and dallas office space, etc. We have decent DA and inbound links, but for some reason we do not show up for addresses. An example, 44 Wall St or 44 Wall St office space, we are no where to be found. Our title and description should allow us to easily picked up, but after scrolling through 15 pages (with a ton of non relevant results), we do not show up. This happens quite a bit. I have checked we are being crawled by looking at 44 Wall St TheSquareFoot and checking the cause. We have individual listing pages (with the same titles and descriptions) inside the buildings, but use canonical tags to let google know that these are related and want the building pages to be dominant. I have worked though quite a few tests and can not come up with a reason. If we were just page 7 and never moved it would be one thing, but since we do not show up at all, it almost seems like google is punishing us. My hope is there is one thing that we are doing wrong that is easily fixed. I realize in an ideal world we would have shorter URLs and other nits and nats, but this feels like something that would help us go from page 3 to page 1, not prevent us from ranking at all. Any thoughts or helpful comments would be greatly appreciated. http://www.thesquarefoot.com/buildings/ny/new-york/10005/lower-manhattan/44-wall-st/44-wall-street We do show up one page 1 for this building - http://www.thesquarefoot.com/buildings/ny/new-york/10036/midtown/1501-broadway, but is the exception. I have tried investigating any differences, but am quite baffled.
Intermediate & Advanced SEO | | AtticusBerg10 -
Google webmaster tools showing "no data available" for links to site, why?
In my google webmaster account I'm seeing all the data in other categories except links to my site. When I click links to my site I get a "no data available" message. Does anyone know why this is happening? And if so, what to do to fix it? Thanks.
Intermediate & Advanced SEO | | Nicktaylor10 -
Is google rolling out a huge update this week?
I am seeing some huge shifts in SERPS at the moment, for some keywords such as web design. Nearly every single result on the homepage is a different company than 2 weeks ago. We are seeing some clients have huge jumps in ranks but also some are dropping. Seems like we could have a big Panda/Penguin like update rolling out.
Intermediate & Advanced SEO | | tempowebdesign1