How to stop URLs that include query strings from being indexed by Google
-
Hello Mozzers
Would you use rel=canonical, robots.txt, or Google Webmaster Tools to stop the search engines indexing URLs that include query strings/parameters. Or perhaps a combination?
I guess it would be a good idea to stop the search engines crawling these URLs because the content they display will tend to be duplicate content and of low value to users.
I would be tempted to use a combination of canonicalization and robots.txt for every page I do not want crawled or indexed, yet perhaps Google Webmaster Tools is the best way to go / just as effective??? And I suppose some use meta robots tags too.
Does Google take a position on being blocked from web pages.
Thanks in advance, Luke
-
WIthout a specific example, there are a couple of options here. I am going to assume that you have an ecommerce site where parameters are being used for sort functions on search results or different options on a given product.
I know you may not be able to do this, but using parameters in this case is just a bad idea to start with. If you can (and I know this can be difficult) find a way to rework this so that your site functions without the use of parameters.
You could use canonicals, but then Google would still be crawling all those pages and then go through the process of using the canonical link to find out what page is canonical. That is a big waste of Google's time. Why waste Googlebots time on crawling a bunch of pages that you do not want to have crawled anyway? I would rather Googlebot focus on crawling your most important pages.
You can use the robots.txt file to stop Google from crawling sections of your site. The only issue with this is that if some of your pages with a bunch of parameters in them are ranking, once you tell Google to stop crawling it, you would then lose traffic.
It is not that Google does not "like" robot.txt to block them, or that they do not "like" the use of the canonical tag, it is just that there are directives that Google will follow in a certain way and so if not implemented correctly or in the wrong sequence can cause negative results because you have basically told Google to do something without fully understanding what will happen.
Here is what I would do. Long version for long term success
-
Look at Google Analytics (or other Analytics) and Moz tools and see what pages are ranking and sending you traffic. Make note of your results.
-
Think of the most simple way that you could organize your site that would be logical to your users and would allow Google to crawl every page you deem important. Creating a hierarchical sitemap is a good way to do this. How does this relate to what you found in #1.
-
Rework your URL structure to reflect what you found in #2 without using parameters. If you have to use parameters, then make sure Google can crawl your basic sitemap without using any of the parameters. Use robots.txt to then block the crawling of any parameters on your site. You have now ensured that Google can crawl and will rank pages without parameters and you are not hiding any important pages or page information on a page that uses parameters.
There are other reasons not to use parameters (e.g. easier for users remember, tend to be shorter, etc), so think about if you want to get rid of them.
- 301 redirect all your main traffic pages from the old URL structure to the new URL structure. Show 404s for all the old pages including the ones with parameters. That way all the good pages will move to the new URL structure and the bad ones will go away.
Now, if you are stuck using parameters. I would do a variant of the above. Still see if there are any important or well ranked pages that use parameters. Consider if there is a way to use the canonical on those pages to get Google to the right page to know what should rank. All the other pages I would use the noindex directive to get them out of the Google index, then later use robots to block Google crawling them. You want to do this in sequence as if you block Google first, it will never see the noindex directive.
Now, everything I said above is generally "correct" but depending on your situation, things may need to be tweaked. I hope the information I gave might help with you being able to work out the best options for what works for your site and your customers.
Good luck!
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Old product URLs still indexed and maybe causing problems?
Hi all, Need some expertise here: We recently (3 months ago) launched a newly updated site with the same domain. We also added an SSL and dropped the www (with proper redirects). We went from http://www.mysite.com to https://mysite.com. I joined the company about a week after launch of the new site. All pages I want indexed are indexed, on the sitemap and submitted (submitted in July but processes regularly). When I check site:mysite.com everything is there, but so are pages from the old site that are not on the sitemap. These do have 301 redirects. I am finding our non-product pages are ranking with no problem (including category pages) but our product pages are not, unless I type in the title almost exactly. We 301 redirected all old urls to new comparable product, or if the product is not available anymore to the home page. For better or worse, as it turns out and prior to my arrival, in building the new site the team copied much of the content (descriptions, reviews, etc) from the old site to create the new product pages. After some frustration and research I am finding the old pages are still indexed and possibly causing a duplicate content issue. Now, I gather there is supposedly no "penalty", per se, for duplicate content but a page or site will simply not show in the SERPs. Understandable and this seems to be the case. We also sell a lot of product wholesale and it turns out many dealers are using the same descriptions we have (and have had) on our site. Some are much larger than us so I'd expect to be pushed down a bit but we don't even show in the top 10 pages...for our own product. How long will it take for Google to drop the old and rank the new as unique? I have re-written some pages but much is technical specifications and tough to paraphrase or re-write. I know I could do this in Search Console but I don't have access to the old site any longer. Should I remove the 301s a few at a time and see if the old get dropped faster? Maybe just re-write ALL the content? Wait? As a site note, I'm also on a Drupal CMS with a Shopify ecommerce module so maybe the shop.mysite.com vs mysite.com is throwing it off with the products(?) - (again the Drupal non-product AND category pages rank fine). Thoughts on this would be much appreciated. Thx so much!
Intermediate & Advanced SEO | | mcampanaro0 -
Should I include URLs that are 301'd or only include 200 status URLs in my sitemap.xml?
I'm not sure if I should be including old URLs (content) that are being redirected (301) to new URLs (content) in my sitemap.xml. Does anyone know if it is best to include or leave out 301ed URLs in a xml sitemap?
Intermediate & Advanced SEO | | Jonathan.Smith0 -
Organic Listings showing Google Tag Manager + Google Page Title...?
I'm a bit stumped with this. I optimise all my titles etc for Australia - and now the organic liatings are showing something strange. For example ( we sell health supplements ) Meta title = "My Product , Buy Online Australia" If I type "My Product" - the title in the organic listings says "My Product - My Company Limited" - and the only place I can see it getting that from is a combination of Meta Data used in Google Tag Manager + the Name on my Google places page. This is much more obvious for categories.. but it's a pain in the butt. If I type "My Product Australia" Then the original "My Product , Buy Online Australia" comes up. Any ideas on policy etc? I have taken the "Limited" off the Google business page - so hopefully this will change over time - but I can't find any information on why google would do something like this. If you had shed any light on this - would be much appreciated.
Intermediate & Advanced SEO | | s_EOgi_Bear0 -
Incorrect URL shown in Google search results
Can anyone offer any advice on how Google might get the url which it displays in search results wrong? It currently appears for all pages as: <cite>www.domainname.com › Register › Login</cite> When the real url is nothing like this. It should be: www.domainname.com/product-type/product-name. This could obviously affect clickthroughs. Google has indexed around 3,000 urls on the site and they are all like this. There are links at the top of the page on the website itself which look like this: Register » Login » which presumably could be affecting it? Thanks in advance for any advice or help!
Intermediate & Advanced SEO | | Wagada0 -
Why are some pages indexed but not cached by Google?
The question is simple but I don't understand the answer. I found a webpage that was linking to my personal site. The page was indexed in Google. However, there was no cache option and I received a 404 from Google when I tried using cache:www.thewebpage.com/link/. What exactly does this mean? Also, does it have any negative implication on the SEO value of the link that points to my personal website?
Intermediate & Advanced SEO | | mRELEVANCE0 -
How long does it take before URL's are removed from Google?
Hello, I recently changed our websites url structures removing the .html at the end. I had about 55 301's setup from the old url to the new. Within a day all the new URL's were listed in Google, but the old .html ones still have not been removed a week later. Is there something I am missing? Or will it just take time for them to get de-indexed? As well, so far the Page Authority hasn't transfered from the old pages to the new, is this typical? Thanks!
Intermediate & Advanced SEO | | SeanConroy0 -
How can we get a site reconsidered for Google indexing?
We recently completed a re-design for a site and are having trouble getting it indexed. This site may have been penalized previously. They were having issues getting it ranked and the design was horrible. Any advise on how to get the new site reconsidered to get the rank where it should be? (Yes, Webmaster Tools is all set up with the sitemap linked) Many thanks for any help with this one!
Intermediate & Advanced SEO | | d25kart0 -
Export list of urls in google's index?
Is there a way to export an exact list of urls found in Google's index?
Intermediate & Advanced SEO | | nicole.healthline0