Lots of incorrect urls indexed - Googlebot found an extremely high number of URLs on your site
-
Hi,
Any assistance would be greatly appreciated.
Basically, our rankings and traffic etc have been dropping massively recently google sent us a message stating " Googlebot found an extremely high number of URLs on your site".
This first highligted us to the problem that for some reason our eCommerce site has recently generated loads (potentially thousands) of rubbish urls hencing giving us duplication everywhere which google is obviously penalizing us with in the terms of rankings dropping etc etc.
Our developer is trying to find the route cause of this but my concern is, How do we get rid of all these bogus urls ?. If we use GWT to remove urls it's going to take years.
We have just amended our Robot txt file to exclude them going forward but they have already been indexed so I need to know do we put a redirect 301 on them and also a HTTP Code 404 to tell google they don't exist ? Do we also put a No Index on the pages or what .
what is the best solution .?
A couple of example of our problems are here :
In Google type -
site:bestathire.co.uk inurl:"br"
You will see 107 results. This is one of many lot we need to get rid of.
Also -
site:bestathire.co.uk intitle:"All items from this hire company"
Shows 25,300 indexed pages we need to get rid of
Another thing to help tidy this mess up going forward is to improve on our pagination work. Our Site uses Rel=Next and Rel=Prev but no concanical.
As a belt and braces approach, should we also put concanical tags on our category pages whereby there are more than 1 page. I was thinking of doing it on the Page 1 of our most important pages or the View all or both ?. Whats' the general consenus ?
Any advice on both points greatly appreciated?
thanks
Sarah.
-
Ahhh, I see what you mean now. Yes, good idea .
Will get that implement to.
Yes, everything is duplicated.It's all the same apart from the url which seems to be bringing in to different locations instead of one.
Odd url Generated(notice it has 2 locations in it)
http://www.bestathire.co.uk/rent/Vacuum_cleaners/Walsall/250/Alfreton
Correct location specific urls -
http://www.bestathire.co.uk/rent/Vacuum_cleaners/Walsall/250
http://www.bestathire.co.uk/rent/Vacuum_cleaners/Alfreton/250
thanks
Sarah.
-
Since (I assume this is what is happening) your ecommerce platform is duplicating the entire page, code and all, and putting it at these new URLs, having the canonical tag of the original page URL in the code for the right/real page will mean that, when it gets duplicated, the canonical tag will get duplicated as well and point back to the original URL. Make sense?
Can you talk to your ecommerce platform provider? This can't be an intended feature!
-
Thanks Ruth for the very comprehensive answer. Greatly Appreciated !.
Just to clarify your suggestion about the Rel=Canonical tag. Put it on the preferred pages . When the duplicate odd urls get generated, they Wont have a canonical tag so google will know there are not the original page ?.. Is that correct.
Sorry I just got a bit confused as you said the duplicate pages will have a concanical tag as well ?
As for the existing pages, they are very recent so wouldn't assume they would have any pr to warrent a 301 as opposed to a 404 but guess either would be ok.
Also adding the Meta name no index tag as you suggested to sounds very wise so will get that done to.
We also can't find how these urls were created and then indexed so just hoping a debug file we just created may shed some light.
Will keep you posted....
Many thanks
Sarah
-
Oh how frustrating!
There are a couple of things that you can do. Updating your robots.txt is a good start since the next time your site is crawled, Google should find that and drop at least some of the offending pages from the index. I would also go in to every page of your site and add in a rel=canonical tag to the original version of the URL. That way, even if your ecommerce platform is generating odd versions of the URL, that canonical tag will be on the duplicate versions letting engines know they're not the original page.
For the existing pages, you could just 301 them all back to the original versions, or add the canonical tag pointing back to the original versions. I would also add the tag to these pages to let Google know not to include them in the index.
With pagination and canonicalization there are a few different approaches, and each has its pros and cons. Dr. Pete wrote a really great post on canonicalization that just went out, you can read it here: http://www.seomoz.org/blog/which-page-is-canonical. I also recommend reading Adam Audette's post on pagination options at Search Engine Land: http://searchengineland.com/the-latest-greatest-on-seo-pagination-114284. I hope that helps!
-
As long as you think the sitemap is done right it should be fine.
-
Yes we submitted mini site maps to webmaster originally a couple of months back as our site is 60K pages so we broke is down to categories it etc.
We have not submitted a new map since finding this problem.
We are in the process of using the sitemap generator to generator new site map to see if it picks up anything usual.
Are u suggesting to resubmit ?
thanks
Sarah
-
In the short term I would definitely use canonicals to let Google know which are the right pages until you can fix your problem. Also, have you submitted a sitemap to Webmasters?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Need help in de-indexing URL parameters in my website.
Hi, Need some help.
Intermediate & Advanced SEO | | ImranZafar
So this is my website _https://www.memeraki.com/ _
If you hover over any of the products, there's a quick view option..that opens up a popup window of that product
That popup is triggered by this URL. _https://www.memeraki.com/products/never-alone?view=quick _
In the URL you can see the parameters "view=quick" which is infact responsible for the pop-up. The problem is that the google and even your Moz crawler is picking up this URL as a separate webpage, hence, resulting in crawl issues, like missing tags.
I've already used the webmaster tools to block the "view" parameter URLs in my website from indexing but it's not fixing the issue
Can someone please provide some insights as to how I can fix this?0 -
Troubled QA Platform - Site Map vs Site Structure
I'm running a Q&A forum that was built prioritizing UX over SEO. This decision has cause a bit of a headache as we're 6 months into the project with 2278 Q&A pages with extremely minimal traffic coming from search engines. The structure has the following hiccups: A. The category navigation from the main Q&A page is entirely javascript and only navigable by users. B. We identify Google bots and send them to another version of the Q&A platform w/o javascript. Category links don't exist in this google bot version of the main Q&A page. On this Google version of the main Q&A page, the Pinterest-like tiles displaying individual Q&As are capped at 10. This means that the only way google bot can identify link juice being passed down to individual QAs (after we've directed them to this page) is through 10 random Q&As. C. All 2278 of the QAs are currently indexed in search. They are just indexed very very poorly in SERPs. My personal assumption, is that Google can't pass link juice to any of the Q&As (poor SERP) but registers them from the site map so it gets included in Google's index. My dilemma has me struggling between two different decisions: 1. Update the navigation in the header to remove the javascript and fundamentally change the look and feel of the Q&A platform. This will allow Google bot to navigate through Expert category links to pass link juice to all Q&As. or 2. Update the redirected main Q&A page to include hard coded category links with 100s of hard coded Q&As under each category page. Make it similar, ugly, flat and efficient for the crawling bots. Any suggestions would be greatly appreciated. I need to find a solution as soon as possible.
Intermediate & Advanced SEO | | TQContent0 -
Can URLs blocked with robots.txt hurt your site?
We have about 20 testing environments blocked by robots.txt, and these environments contain duplicates of our indexed content. These environments are all blocked by robots.txt, and appearing in google's index as blocked by robots.txt--can they still count against us or hurt us? I know the best practice to permanently remove these would be to use the noindex tag, but I'm wondering if we leave them they way they are if they can still hurt us.
Intermediate & Advanced SEO | | nicole.healthline0 -
PDF on financial site that duplicates ~50% of site content
I have a financial advisor client who has a downloadable PDF on his site that contains about 9 pages of good info. Problem is much of the content can also be found on individual pages of his site. Is it best to noindex/follow the pdf? It would be great to let the few pages of original content be crawlable, but I'm concerned about the duplicate content aspect. Thanks --
Intermediate & Advanced SEO | | 540SEO0 -
Googlebot Can't Access My Sites After I Repair My Robots File
Hello Mozzers, A colleague and I have been collectively managing about 12 brands for the past several months and we have recently received a number of messages in the sites' webmaster tools instructing us that 'Googlebot was not able to access our site due to some errors with our robots.txt file' My colleague and I, in turn, created new robots.txt files with the intention of preventing the spider from crawling our 'cgi-bin' directory as follows: User-agent: * Disallow: /cgi-bin/ After creating the robots and manually re-submitting it in Webmaster Tools (and receiving the green checkbox), I received the same message about Googlebot not being able to access the site, only difference being that this time it was for a different site that I manage. I repeated the process and everything, aesthetically looked correct, however, I continued receiving these messages for each of the other sites I manage on a daily-basis for roughly a 10-day period. Do any of you know why I may be receiving this error? is it not possible for me to block the Googlebot from crawling the 'cgi-bin'? Any and all advice/insight is very much welcome, I hope I'm being descriptive enough!
Intermediate & Advanced SEO | | NiallSmith1 -
Googlebot found an extremely high number of URLs on your site
I keep getting the "Googlebot found an extremely high number of URLs on your site" message in the GWMT for one of the sites that I manage. The error is as below- Googlebot encountered problems while crawling your site. Googlebot encountered extremely large numbers of links on your site. This may indicate a problem with your site's URL structure. Googlebot may unnecessarily be crawling a large number of distinct URLs that point to identical or similar content, or crawling parts of your site that are not intended to be crawled by Googlebot. As a result Googlebot may consume much more bandwidth than necessary, or may be unable to completely index all of the content on your site. I understand the nature of the message - the site uses a faceted navigation and is genuinely generating a lot of duplicate pages. However in order to stop this from becoming an issue we do the following; No-index a large number of pages using the on page meta tag. Use a canonical tag where it is appropriate But we still get the error and a lot of the example pages that Google suggests are affected by the issue are actually pages with the no-index tag. So my question is how do I address this problem? I'm thinking that as it's a crawling issue the solution might involve the no-follow meta tag. any suggestions appreciated.
Intermediate & Advanced SEO | | BenFox0 -
Does a page on a site with high domain authority build page authority easier? i.e. less inbound links?
Is this also why people build backlinks to their BBB profiles, Yellowpages Profiles, etc. i.e. why do people build backlinks to other pages that link to them? Wouldn't it be more beneficial to just build that backlink directly to your target?
Intermediate & Advanced SEO | | adriandg0 -
Best practice to change the URL of all my site pages
Hi, I need to change all my site pages URL as a result of moving the site into another CMS platform that has its own URL structure: Currently the site is highly ranked for all relevant KWs I am targeting. All pages have backlinks Content and meta data should remain exactly the same. The domain should stay the same The plan is as follow: Set up the new site using a temporary domain name Copy over all content and meta data Set up all redirects (301) Update the domain name and point the live domain to the new one Watch closely for 404 errors and add any missing redirects Questions: Any comments on the plan? Is there a way (the above plan or any other) to make sure ranking will not be hurt What entries should I add to the sitemap.xml: new pages only or new pages and the pages from the old site? Thanks, Guy.
Intermediate & Advanced SEO | | jid1