Removing duplicate &var=1 etc var name urls from google

Adsau

Hi I had a huge drop in traffic around the 11th of july over 50% down with no recovery as yet... ~5000 organic visits per day down to barley over 2500.

I fixed up a problem that one script was introducing that had caused high bounce rates.

Now i have identified that google has indexed the entire news section 4 times, same content but with var=0 var=1 2 3 etc around 40,000 urls in total.

Now this would have to be causing problems.

I have fixed the problem and those url's 404 now, no need for 301's as they are not linked to from anywhere.

How can I get them out of the index? I cant do it one by one with the url removal request.. I cant remove a directory from url removal tool as the reuglar content is still there..

If I ban it in robots.txt those urls, wont it never try to index them again and thus not ever discover they are 404ing?

These urls are no longer linked to from anywhere, so how can google ever reach them by crawling to find them 404ing?

RyanKent

yes

Adsau

Hi thanks, so if it cant find a page and finds no more links to a page. does that mean that it should drop out of the index within a month?

RyanKent

The definition of a 404 page is a page which cannot be found. So in that sense, no Google can't find the page.

Google's crawlers follow links. If there is not a link to the page, then there is no issue. If Google locates a link, they will attempt to follow that link.

Adsau

Hi Thanks, so if a page is 404'ing but not linked to from anywhere google will still find it?

RyanKent

Hi Adam.

The preferred method to handle this issue would have been to only offer one version of the URL. Once you realized the other versions were active, you have a couple options to deal with the problem:

Use a 301 to redirect all the versions of the page to the main URL. This method would have allowed your existing Google links to work. Users would still find the correct page. Google would have noticed the 301 and adjusted their links.

Another option to consider IF the pages were helpful would be to keep them and use the canonical tag to indicate the URL of the primary page. This method would offer the same advantages mentioned above.

By removing the pages and allowing them to 404, everyone loses for the next month. Users who click on a search result will be taken to a 404 page rather then finding the content they seek. Google wont be offering the search results users are seeking. You will experience a high bounce rate as many users do not like 404 pages, and it will take a month for an average site to be fully crawled and the issue corrected.

If you block the pages in robots.txt, then Google wont attempt to crawl the links. In general, your robots.txt should not be used in this manner.

My recommendation is to fix this issue either with the proper 301s. If that is not an option, be sure your 404 page is helpful and as user friendly as possible. Include a site search option along with your main navigation. Google will crawl a small percent of your site each day. You will notice the number of 404 links diminish over time.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Removing duplicate &var=1 etc var name urls from google

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Website blog is hacked. Whats the best practice to remove bad urls

URL Structure & SEO - Should we be using sub-folders?

Removed Subdomain Sites Still in Google Index

Why are my 301 redirects and duplicate pages (with canonicals) still showing up as duplicates in Webmaster Tools?

How to optimize for different google seach center (google.de, google.ch) ?

Wordpress URL weirdness - why is google registering non-pretty URLS?

Help removing duplicate content from the index?

Why do I see dramatic differences in impressions between Google Webmaster Tools and Google Insights for Search?