URL Parameter Handling In GWT to Treat Overindexation - how aggressive?
-
Hi,
My client recently launched a new site and their index went from about 20K up to about 80K - which is a severe over indexation.
I believe this was caused by parameter handling as some category pages now have 700 pages in the results for "site:domain.com/category1" - and apart from the top result, they are all parameters being indexed.
My question is how active/aggressive should I be in blocking these parameters in Google Webmaster Tools? Currently, everything is set to 'let googlebot decide'.
-
Hi! Did these answers take care of your question, or do you still have some questions?
-
Hey There
I would use a robots meta noindex on them (except for the top page of course) and use rel = prev/next to show they are paginated.
I would prefer to do that than use WMT. Also, WMT crawl settings will stop the crawling, but not remove them from the index. Plus, WMT will only handle Google, not other engines like Bing etc. Not that Bing matters, but always better to have a universal solution.
-Dan
-
Hello Search Guys,
Here is some food for thought taken from: http://www.quora.com/Does-Google-limit-the-number-of-pages-it-indexes-for-a-particular-site
Summary:
"Google says they crawl the web in "roughly decreasing PageRank order" and thus, pages that have not achieved widespread link popularity, particularly on large, deep sites, may not be crawled or indexed."
"Indexation
There is no limit to the number of pages Google may index (meaning available to be served in search results) for a site. But just because your site is crawled doesn't mean it will be indexed.Crawl
The ability, speed and depth for which Google crawls your site and retrieves pages can be dependent on a number of factors: PageRank, XML sitemaps, robots.txt, site architecture, status codes and speed.""For a zero-backlink domain with 80.000+ pages, in conjunction with rel=canonical and an xml-sitemap (You do submit a sitemap, don't you?), after submitting the domain to Google for a crawl, a little less than 10k pages remained in index. A few crawls later this was reduced to a mere 250 (very good job on Google's side).
This leads me to believe the indexation cap for a newer site with low to zero pagerank/authority is around 10k."
Another interesting article: http://searchenginewatch.com/article/2062851/Google-Upping-101K-Page-Index-Limit
Hope this helps, and easy response is to limit crawling to the most needed pages as aggressive as possible to remove the unneeded links leaving only needed ones
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should I disallow all URL query strings/parameters in Robots.txt?
Webmaster Tools correctly identifies the query strings/parameters used in my URLs, but still reports duplicate title tags and meta descriptions for the original URL and the versions with parameters. For example, Webmaster Tools would report duplicates for the following URLs, despite it correctly identifying the "cat_id" and "kw" parameters: /Mulligan-Practitioner-CD-ROM
Intermediate & Advanced SEO | | jmorehouse
/Mulligan-Practitioner-CD-ROM?cat_id=87
/Mulligan-Practitioner-CD-ROM?kw=CROM Additionally, theses pages have self-referential canonical tags, so I would think I'd be covered, but I recently read that another Mozzer saw a great improvement after disallowing all query/parameter URLs, despite Webmaster Tools not reporting any errors. As I see it, I have two options: Manually tell Google that these parameters have no effect on page content via the URL Parameters section in Webmaster Tools (in case Google is unable to automatically detect this, and I am being penalized as a result). Add "Disallow: *?" to hide all query/parameter URLs from Google. My concern here is that most backlinks include the parameters, and in some cases these parameter URLs outrank the original. Any thoughts?0 -
Replicating keywords in the URL - bad?
Our site URL structure used to be (example site) frogsforsale.com/cute-frogs-for-sale/blue-frogs wherefrogsforsale.com/cute-frogs-for-sale/ was in front of every URL on the site. We changed it by removing the for-sale part of the URL to be frogsforsale.com/cute-frogs/blue-frogs. Would that have hurt our rankings and traffic by removing the for-sale? Or was having for-sale in the URL twice (once in domain, again in URL) hurting our site? The business wants to change the URLs again to put for-sale back in, but in a new spot such as frogsforsale.com/cute-frogs/blue-frogs-for-sale as they are convinced that is the cause of the rankings and traffic drop. However the entire site was redesigned at the same time, the site architecture is very different, so it is very hard to say whether the traffic drop is due to this or not.
Intermediate & Advanced SEO | | CFSSEO0 -
How To Organise my URLS - Which is Optimal?
Hi all, I am currently in the process of re-writing my companies website URL structure. Compared to the way the website is structured at the minute, there's going to be a lot more URL's as the previous structure has missed out on a lot of search avenues that i intend to include within the rebuild. one of my issues is basically deciding under which category certain URL's come under, I can think of reasons for both sides but can't quite decide on which is optimal. My company is an automotive/car dealer so we sell cars for certain manufactures as well as offering a number of other services. what I'm curious about is what makes more sense in terms of the category that comes first in the URL. Here's what I am torn between; /(car manufacturer)/servicing OR /servicing/(car-manufacturer) To give you some more info that might influence the decision; In terms of generic keyword targeting, the majority would search in the order of '(car manufacturer) service' as opposed to 'service for (car manufacturer)'. Currently on our site, the sections /(manufacturer) are some of the most authoritative pages that we have on the website, but we've done very little work on /service in the past. For me, this would suggest that naturally the pages flowing from that URL would get an advantage in terms of authority/ranking. With either URL structure, the URL's are eventually going to cross paths - I just need to decide which one is best and should therefore feature first. Hopefully this is somewhat clear. I'd appreciate any suggestions or if you don't quite understand what I'm asking for then general URL advice is also appreciated. Many thanks Sam
Intermediate & Advanced SEO | | Sandicliffe0 -
What are partial urls and why this is causing a sitemap error?
Hi mozzers, I have a client that recorded 7 errors when generating Xml sitemap. One of the errors appear to be coming from partial urls and apparently I would need to exclude them from sitemap. What are they exactly and why would they cause an error in the sitemap. Thanks!
Intermediate & Advanced SEO | | Ideas-Money-Art0 -
Capitals in URLs
Hello Mozzers. I've just been looking at a site with capitals in the URL - capitals are used in the product descriptions, so you'll have a URL structure like this: www.company.com/directory1/Double-Beds-Luxury (such URLs do not work if I lower the case of the capitals). There are 50,000 such products on the site. Clearly one drawback is potential customers might type in, or link to, the lower case of the URL and get a "not found" result (though the urls are relatively long so not that likely I'm thinking). Are there any additional drawbacks with the use of capitals outlined here?
Intermediate & Advanced SEO | | McTaggart0 -
HTML for URL markup
Hi, We are changing our URLs to be more SEO friendly. Is there any negative impact or pitfall of using <base> HTML-tag? Our developers are considering it as a possible solution for relative URLs inside HTML-markup in the Friendly URL context.
Intermediate & Advanced SEO | | theLotter0 -
Webmaster tool parameters
Hey forum, About my site, idealchooser.com. Few weeks ago I've defined a parameter "sort" at the Google Webmaster tool that says effect: "Sorts" and Crawl: "No URLs". The logic is simple, I don't want Google to crawl and index the same pages with a different sort parameter, only the default page without this parameter. The weird thing is that under "HTML Improvement" Google keeps finding "Duplicate Title Tag" for the exact same pages with a different sort parameter. For example: /shop/Kids-Pants/16//shop/Kids-Pants/16/?sort=Price/shop/Kids-Pants/16/?sort=PriceHi These aren't old pages and were flagged by Google as duplicates weeks after the sort parameter was defined. Any idea how to solve it? It seems like Google ignores my parameters handling requests. Thank you.
Intermediate & Advanced SEO | | corwin0 -
URL rewriting with "-" or with a space ?
Hi Which url should i use for my web site ? and why ? 1 : http://www.test.com/how-are-you.html 2 : http://www.test.com/how are you.html thanks
Intermediate & Advanced SEO | | nipponx0