Lots of incorrect urls indexed - Googlebot found an extremely high number of URLs on your site
-
Hi,
Any assistance would be greatly appreciated.
Basically, our rankings and traffic etc have been dropping massively recently google sent us a message stating " Googlebot found an extremely high number of URLs on your site".
This first highligted us to the problem that for some reason our eCommerce site has recently generated loads (potentially thousands) of rubbish urls hencing giving us duplication everywhere which google is obviously penalizing us with in the terms of rankings dropping etc etc.
Our developer is trying to find the route cause of this but my concern is, How do we get rid of all these bogus urls ?. If we use GWT to remove urls it's going to take years.
We have just amended our Robot txt file to exclude them going forward but they have already been indexed so I need to know do we put a redirect 301 on them and also a HTTP Code 404 to tell google they don't exist ? Do we also put a No Index on the pages or what .
what is the best solution .?
A couple of example of our problems are here :
In Google type -
site:bestathire.co.uk inurl:"br"
You will see 107 results. This is one of many lot we need to get rid of.
Also -
site:bestathire.co.uk intitle:"All items from this hire company"
Shows 25,300 indexed pages we need to get rid of
Another thing to help tidy this mess up going forward is to improve on our pagination work. Our Site uses Rel=Next and Rel=Prev but no concanical.
As a belt and braces approach, should we also put concanical tags on our category pages whereby there are more than 1 page. I was thinking of doing it on the Page 1 of our most important pages or the View all or both ?. Whats' the general consenus ?
Any advice on both points greatly appreciated?
thanks
Sarah.
-
Ahhh, I see what you mean now. Yes, good idea .
Will get that implement to.
Yes, everything is duplicated.It's all the same apart from the url which seems to be bringing in to different locations instead of one.
Odd url Generated(notice it has 2 locations in it)
http://www.bestathire.co.uk/rent/Vacuum_cleaners/Walsall/250/Alfreton
Correct location specific urls -
http://www.bestathire.co.uk/rent/Vacuum_cleaners/Walsall/250
http://www.bestathire.co.uk/rent/Vacuum_cleaners/Alfreton/250
thanks
Sarah.
-
Since (I assume this is what is happening) your ecommerce platform is duplicating the entire page, code and all, and putting it at these new URLs, having the canonical tag of the original page URL in the code for the right/real page will mean that, when it gets duplicated, the canonical tag will get duplicated as well and point back to the original URL. Make sense?
Can you talk to your ecommerce platform provider? This can't be an intended feature!
-
Thanks Ruth for the very comprehensive answer. Greatly Appreciated !.
Just to clarify your suggestion about the Rel=Canonical tag. Put it on the preferred pages . When the duplicate odd urls get generated, they Wont have a canonical tag so google will know there are not the original page ?.. Is that correct.
Sorry I just got a bit confused as you said the duplicate pages will have a concanical tag as well ?
As for the existing pages, they are very recent so wouldn't assume they would have any pr to warrent a 301 as opposed to a 404 but guess either would be ok.
Also adding the Meta name no index tag as you suggested to sounds very wise so will get that done to.
We also can't find how these urls were created and then indexed so just hoping a debug file we just created may shed some light.
Will keep you posted....
Many thanks
Sarah
-
Oh how frustrating!
There are a couple of things that you can do. Updating your robots.txt is a good start since the next time your site is crawled, Google should find that and drop at least some of the offending pages from the index. I would also go in to every page of your site and add in a rel=canonical tag to the original version of the URL. That way, even if your ecommerce platform is generating odd versions of the URL, that canonical tag will be on the duplicate versions letting engines know they're not the original page.
For the existing pages, you could just 301 them all back to the original versions, or add the canonical tag pointing back to the original versions. I would also add the tag to these pages to let Google know not to include them in the index.
With pagination and canonicalization there are a few different approaches, and each has its pros and cons. Dr. Pete wrote a really great post on canonicalization that just went out, you can read it here: http://www.seomoz.org/blog/which-page-is-canonical. I also recommend reading Adam Audette's post on pagination options at Search Engine Land: http://searchengineland.com/the-latest-greatest-on-seo-pagination-114284. I hope that helps!
-
As long as you think the sitemap is done right it should be fine.
-
Yes we submitted mini site maps to webmaster originally a couple of months back as our site is 60K pages so we broke is down to categories it etc.
We have not submitted a new map since finding this problem.
We are in the process of using the sitemap generator to generator new site map to see if it picks up anything usual.
Are u suggesting to resubmit ?
thanks
Sarah
-
In the short term I would definitely use canonicals to let Google know which are the right pages until you can fix your problem. Also, have you submitted a sitemap to Webmasters?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
SEO impact when micro site are hosted on third party url
Have a look at this url
Intermediate & Advanced SEO | | kjerstibakke
http://www.bekkevold-as.no/.
On the front page they are providing links to a website created by one of their suppliers. (Three cirkle icons in the middle of the page).
The links make the customers leave the site and open on a different domain, even the branding of the page is for Bekkevold Elektro (see log top right)
The question from me: How will this impact the SEO for Bekkevold Elektro. Would it be better to create a subdomain for Bekkevold Elektro and ask the supplier to point to this? Or is it ok to leave it as it is, having links back to the web site for Bekkevold Elektro? Just changing the target of the link to open in a new window.
Thank you very much for your support on this. I would appreciate any suggestion for improvement for my customer, Bekkevold Elektro. Best regards
Kjersti Bakke0 -
Removing Parameterized URLs from Google Index
We have duplicate eCommerce websites, and we are in the process of implementing cross-domain canonicals. (We can't 301 - both sites are major brands). So far, this is working well - rankings are improving dramatically in most cases. However, what we are seeing in some cases is that Google has indexed a parameterized page for the site being canonicaled (this is the site that is getting the canonical tag - the "from" page). When this happens, both sites are being ranked, and the parameterized page appears to be blocking the canonical. The question is, how do I remove canonicaled pages from Google's index? If Google doesn't crawl the page in question, it never sees the canonical tag, and we still have duplicate content. Example: A. www.domain2.com/productname.cfm%3FclickSource%3DXSELL_PR is ranked at #35, and B. www.domain1.com/productname.cfm is ranked at #12. (yes, I know that upper case is bad. We fixed that too.) Page A has the canonical tag, but page B's rank didn't improve. I know that there are no guarantees that it will improve, but I am seeing a pattern. Page A appears to be preventing Google from passing link juice via canonical. If Google doesn't crawl Page A, it can't see the rel=canonical tag. We likely have thousands of pages like this. Any ideas? Does it make sense to block the "clicksource" parameter in GWT? That kind of scares me.
Intermediate & Advanced SEO | | AMHC0 -
How much is the effect of redirecting an old URL to another URL under a new domain?
Example: http://www.olddomain.com/buy/product-type/region/city/area http://www.newdomain.com/product-type-for-sale/city/area Thanks in advance!
Intermediate & Advanced SEO | | esiow20130 -
Want to merge high ranking niche websites into a new mega site, but don't want to lose authority from old top level pages
I have a few older websites that SERP well, and I am considering merging some or all of them into a new related website that I will be launching regardless. My old websites display real estate listings and not much else. Each website is devoted to showing homes for sale in a specific neighborhood. The domains are all in the form of Neighborhood1CityHomes.com, Neighborhood2CityHomes.com, etc. These sites SERP well for searches like "Neighborhood1 City homes for sale" and also "Neighborhood1 City real estate" where some or all of the query is in the domain name. Google simply points to the top of the domain although each site has a few interior pages that are rarely used. There is next to zero backlinking to the old domains, but each links to the other with anchor text like "Neighborhood1 Cityname real estate". That's pretty much the extent of the link profile. The new website will be a more comprehensive search portal where many neighborhoods and cities can be searched. The domain name is a nonsense word .com not related to actual key words. The structure will be like newdomain.com/cityname/neighborhood-name/ where the neighborhood real estate listings are that would replace the old websites, and I'd 301 the old sites to the appropriate internal directories of the new site. The content on the old websites is all on the home page of each, at least the content for searches that matter to me and rank well, and I read an article suggesting that Google assigns additional authority for top level pages (can I link to that here?). I'd be 301-ing each old domain from a top level to a 3rd level interior page like www. newdomain/cityname/neighborhood1/. The new site is better than the old sites by a wide margin, especially on mobile, but I don't want to lose all my top positions for some tough phrases. I'm not running analytics on the old sites in question, but each of the old sites has extensive past history with AdWords (which I don't run any more). So in theory Google knows these old sites are good quality.
Intermediate & Advanced SEO | | Gogogomez0 -
Technical Question on Image Links - Part of Addressing High Number of Outbound Links
Hi - I've read through the forum, and have been reading online for hours, and can't quite find an answer to what I'm searching for. Hopefully someone can chime in with some information. 🙂 For some background - I am looking closely at four websites, trying to bring them up to speed with current guidelines, and recoup some lost traffic and revenue. One of the things we are zeroing in on is the high amount of outbound links in general, as well as inter-site linking, and a nearly total lack of rel=nofollow on any links. Our current CMS doesn't allow an editor to add them, and it will require programming changes to modify any past links, which means I'm trying to ask for the right things, once, in order to streamline the process. One thing that is nagging at me is that the way we link to our images could be getting misconstrued by a more sensitive Penguin algorithm. Our article images are all hosted on one separate domain. This was done for website performance reasons. My concern is that we don't just embed the image via , which would make this concern moot. We also have an href tag on each to a 'larger view' of the image that precedes the img src in the code, for example - We are still running the numbers, but as some articles have several images, and we currently have about 85,000 articles on those four sites... well, that's a lot of href links to another domain. I'm suggesting that one of the steps we take is to rel=nofollow the image hrefs. Our image traffic from Google search, or any image search for that matter, is negligible. On one site it represented just .008% of our visits in July. I'm getting a little pushback on that idea as having a separate image server is standard for many websites, so I thought I'd seek additional information and opinions. Thanks!
Intermediate & Advanced SEO | | MediaCF0 -
It appears that Googlebot Mobile will look for mobile redirects from the desktop site, but still use the SEO from the desktop site.
Is the above statement correct? I've read that its better to have different SEO titles & descriptions for mobile sites as users search differently on mobile devices. I've also read it's good to link build, keep text content on mobile sites etc to get the mobile site to rank. If I choose to not have titles & descriptions on my mobile site will Google just rank our desktop version & then redirect a user on a mobile device to our mobile site or should I be adding in titles & descriptions into the mobile site? Thanks so much for any help!
Intermediate & Advanced SEO | | DCochrane0 -
How to let Search engines index login-first SNS sites?
What's the Effective way to let major search engine to index Login-first SNS sites? the reason of asking that is because i saw a search engines index Millon of SNS pages but most of them requested to login, how search engine get through this? http://www.baidu.com/s?wd=site%3Akaixin001.com&pn=50 thanks Boson
Intermediate & Advanced SEO | | DarwinChinaSEO0 -
New Domain Name For Site That Ranks Highly on Key Terms
Here's my problem -- which is actually a pretty good problem to have. My client is a speciality service provider in an extremely competitive field. It charges 3 to 5 times what others do for providing a super-premium level of service. It doesn't have -- nor does it want -- many customers. I can't go into details, but let's just say the business model is a bit like the charity or premium newsletter publishing model. It is extremely hard to recruit new members -- but once recruited, members tend to stay for a long time at high price points. Personal referral is key. As result of my efforts over the last 90 days, the client's SEO results have skyrocketed. After a couple of false starts, we have focussed on key terms the target demographic is likely to search, rather than the generic terms others in the industry use. We have also had great success with a social media strategy -- since the few people likely to be interested in paying such high prices know like-minded folks. For the first time, my client is getting "walk in" prospects. They are delighted! But they are not really walk-ins. They have already found the site -- either through SERPs or Facebook or Twitter. Now we need to get to the next level. Here's the problem: the client's domain name sucks. It is short, but combines an acronym with one of the words in its long-version name. It uses the British spelling version of the long name fragment, even though most Canadians now use American spelling. And it is a .ca, rather than a dot.com So I think we have to bite the bullet and change to the long, dot com version of the name, which is available and has the additional benefit of having embedded within it a key search term. I am basically an editorial/content guy and not a tech guy. The IT guys at my firm are strongly encouraging me to make the change...in very "colorful" language. We can certainly do 301 redirects at the page level. But I would like some additional validation before proceeding. My questions are: how much link juice might we lose? I've seen the figure of 10% bandied around. Is it accurate? might we see a temporary dip in results? If so, how long would it last? what questions did I forget to ask? What additional info do you need to offer informed advice ?
Intermediate & Advanced SEO | | DanielFreedman0