Blocking Certain Site Parameters from Google's Index - Please Help
-
Hello,
So we recently used Google Webmaster Tools in an attempt to block certain parameters on our site from showing up in Google's index. One of our site parameters is essentially for user location and accounts for over 500,000 URLs. This parameter does not change page content in any way, and there is no need for Google to index it. We edited the parameter in GWT to tell Google that it does not change site content and to not index it. However, after two weeks, all of these URLs are still definitely getting indexed. Why? Maybe there's something we're missing here. Perhaps there is another way to do this more effectively. Has anyone else ran into this problem?
The path we used to implement this action:
Google Webmaster Tools > Crawl > URL ParametersThank you in advance for your help!
-
Thanks! We will probably test this solution.
-
Continuing from EGOL's comment #3 if you do need the parameters for on-site search or categories then another option (admittedly it relies on Google obeying it) is to use the robots.txt and disallow the parameters for example:
Disallow: /*categoryFilter=*
Disallow: /*?utm_
As with any change to that could affect the visibility of your site to the search engines always test first.
-
Thanks, we have a few thousand parent pages that relate to these 500,000 URLs that have the parameters. Is there a quick way to canonicalise thousands of pages at once? It may not be scalable...
-
I recently posted about this problem here..
In summary, I have three points...
-
The parameters control in Google Webmaster Tools is unreliable. It did not work for me. And, it does not work for any other search engine. Find a different solution, is what I recommend.
-
Using rel=canonical relies on Google to obey it. From my experience it works well at present time. But we know that Google says how they are going to do things and then changes their mind without tellin' anybody. I would not rely on this.
-
If you really want to control these parameters, use htaccess to strip them off at the server level. That is doing it where you control it and not relying on what anybody says that they are going to do. Take control.
The only reservation about #3 is that you might need parameters for on-site search or category page sorting on your own site. These can be excluded from being stripped in your htaccess file.
Don't allow search engines to do anything for you that you can do for yourself. They can screw it up or quit doing it at any time and not say anything about it.
-
-
That was the link I was going to sugest simply from the title you set this up with.
Have you also canonicalised the page in question so that Google only determines that the parent page is the main source. it may help.
More details on setting it up here - Use Canonical URLs
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should m-dot sites be indexed at all
I have a client with a site with a m-dot mobile version. They will move it to a responsive site sometime next year but in meanwhile I have a massive doubt. This m-dot site has some 30k indexed pages in Google. Each of this page is bidirectionally linked to the www. version (rel="alternate on the www, rel canonical on the m-dot) There is no noindex on the m-dot site, so I understand that Google might decide to index the m-dot pages regardless of the canonical to the www site. But my doubts stays: is it a bad thing that both the version are indexed? Is this having a negative impact on the crawling budget? Or risking some other bad consequence? and how is the mobile-first going to impact on this? Thanks
Intermediate & Advanced SEO | | newbiebird0 -
Indexed Pages Different when I perform a "site:Google.com" site search - why?
My client has an ecommerce website with approx. 300,000 URLs (a lot of these are parameters blocked by the spiders thru meta robots tag). There are 9,000 "true" URLs being submitted to Google Search Console, Google says they are indexing 8,000 of them. Here's the weird part - When I do a "site:website" function search in Google, it says Google is indexing 2.2 million pages on the URL, but I am unable to view past page 14 of the SERPs. It just stops showing results and I don't even get a "the next results are duplicate results" message." What is happening? Why does Google say they are indexing 2.2 million URLs, but then won't show me more than 140 pages they are indexing? Thank you so much for your help, I tried looking for the answer and I know this is the best place to ask!
Intermediate & Advanced SEO | | accpar0 -
No images in Google index
No images are indexed on this site (client of ours): http://www.rubbermagazijn.nl/. We've tried everything (descriptive alt texts, image sitemaps, fetch&render, check robots) but a site:www.rubbermagazijn.nl shows 0 image results and the sitemap report in Search Console shows 0 images indexed. We're not sure how to proceed from here. Is there anyone with an idea what the problem could be?
Intermediate & Advanced SEO | | Adriaan.Multiply0 -
How do we decide which pages to index/de-index? Help for a 250k page site
At Siftery (siftery.com) we have about 250k pages, most of them reflected in our sitemap. Though after submitting a sitemap we started seeing an increase in the number of pages Google indexed, in the past few weeks progress has slowed to a crawl at about 80k pages, and in fact has been coming down very marginally. Due to the nature of the site, a lot of the pages on the site likely look very similar to search engines. We've also broken down our sitemap into an index, so we know that most of the indexation problems are coming from a particular type of page (company profiles). Given these facts below, what do you recommend we do? Should we de-index all of the pages that are not being picked up by the Google index (and are therefore likely seen as low quality)? There seems to be a school of thought that de-indexing "thin" pages improves the ranking potential of the indexed pages. We have plans for enriching and differentiating the pages that are being picked up as thin (Moz itself picks them up as 'duplicate' pages even though they're not. Thanks for sharing your thoughts and experiences!
Intermediate & Advanced SEO | | ggiaco-siftery0 -
Partial Match or RegEx in Search Console's URL Parameters Tool?
So I currently have approximately 1000 of these URLs indexed, when I only want roughly 100 of them. Let's say the URL is www.example.com/page.php?par1=ABC123=&par2=DEF456=&par3=GHI789= All the indexed URLs follow that same kinda format, but I only want to index the URLs that have a par1 of ABC (but that could be ABC123 or ABC456 or whatever). Using URL Parameters tool in Search Console, I can ask Googlebot to only crawl URLs with a specific value. But is there any way to get a partial match, using regex maybe? Am I wasting my time with Search Console, and should I just disallow any page.php without par1=ABC in robots.txt?
Intermediate & Advanced SEO | | Ria_0 -
Google isn't seeing the content but it is still indexing the webpage
When I fetch my website page using GWT this is what I receive. HTTP/1.1 301 Moved Permanently
Intermediate & Advanced SEO | | jacobfy
X-Pantheon-Styx-Hostname: styx1560bba9.chios.panth.io
server: nginx
content-type: text/html
location: https://www.inscopix.com/
x-pantheon-endpoint: 4ac0249e-9a7a-4fd6-81fc-a7170812c4d6
Cache-Control: public, max-age=86400
Content-Length: 0
Accept-Ranges: bytes
Date: Fri, 14 Mar 2014 16:29:38 GMT
X-Varnish: 2640682369 2640432361
Age: 326
Via: 1.1 varnish
Connection: keep-alive What I used to get is this: HTTP/1.1 200 OK
Date: Thu, 11 Apr 2013 16:00:24 GMT
Server: Apache/2.2.23 (Amazon)
X-Powered-By: PHP/5.3.18
Expires: Sun, 19 Nov 1978 05:00:00 GMT
Last-Modified: Thu, 11 Apr 2013 16:00:24 +0000
Cache-Control: no-cache, must-revalidate, post-check=0, pre-check=0
ETag: "1365696024"
Content-Language: en
Link: ; rel="canonical",; rel="shortlink"
X-Generator: Drupal 7 (http://drupal.org)
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=utf-8 xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:dc="http://purl.org/dc/terms/"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:og="http://ogp.me/ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:sioc="http://rdfs.org/sioc/ns#"
xmlns:sioct="http://rdfs.org/sioc/types#"
xmlns:skos="http://www.w3.org/2004/02/skos/core#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"> <title>Inscopix | In vivo rodent brain imaging</title>0 -
Meeting Google's needs 100% with dynamic pages
We have bought into a really powerful search, very exciting We can define really detailed product based 'landing pages' by creating a search that pulles on required attributeseghttp://www.OURDOMAIN.com//search/index.php?sortprice=asc&followSearch=9673&q=red+coats+short-length Pop that in a link Short Red Coats on a previous page and wonderful, that gives a page of short red coats in price ascending order, one happy consumer, straight to a page that meets their needs Question 1 however unhappy Google right? Question 2 can we meet Google's needs 100% with a redirect permanent in an .htaccess file E.G redirect permanent /short-red-coats/ http://www.OURDOMAIN.com//search/index.php?sortprice=asc&followSearch=9673&q=red+coats+short-length
Intermediate & Advanced SEO | | GeezerG
Many thanks
CB0 -
How do you rank in the "brands for:" section in Google's search results ?
There's a "brands for:" section that appears above the first organic listing for certain search queries. For example, if you search for "dedicated servers" in Google, you will see that a "brands for:" appears. How do you get listed there? Thanks, Brian
Intermediate & Advanced SEO | | InMotionHosting0