Removing duplicate &var=1 etc var name urls from google
-
Hi I had a huge drop in traffic around the 11th of july over 50% down with no recovery as yet... ~5000 organic visits per day down to barley over 2500.
I fixed up a problem that one script was introducing that had caused high bounce rates.
Now i have identified that google has indexed the entire news section 4 times, same content but with var=0 var=1 2 3 etc around 40,000 urls in total.
Now this would have to be causing problems.
I have fixed the problem and those url's 404 now, no need for 301's as they are not linked to from anywhere.
How can I get them out of the index? I cant do it one by one with the url removal request.. I cant remove a directory from url removal tool as the reuglar content is still there..
If I ban it in robots.txt those urls, wont it never try to index them again and thus not ever discover they are 404ing?
These urls are no longer linked to from anywhere, so how can google ever reach them by crawling to find them 404ing?
-
yes
-
Hi thanks, so if it cant find a page and finds no more links to a page. does that mean that it should drop out of the index within a month?
-
The definition of a 404 page is a page which cannot be found. So in that sense, no Google can't find the page.
Google's crawlers follow links. If there is not a link to the page, then there is no issue. If Google locates a link, they will attempt to follow that link.
-
Hi Thanks, so if a page is 404'ing but not linked to from anywhere google will still find it?
-
Hi Adam.
The preferred method to handle this issue would have been to only offer one version of the URL. Once you realized the other versions were active, you have a couple options to deal with the problem:
Use a 301 to redirect all the versions of the page to the main URL. This method would have allowed your existing Google links to work. Users would still find the correct page. Google would have noticed the 301 and adjusted their links.
Another option to consider IF the pages were helpful would be to keep them and use the canonical tag to indicate the URL of the primary page. This method would offer the same advantages mentioned above.
By removing the pages and allowing them to 404, everyone loses for the next month. Users who click on a search result will be taken to a 404 page rather then finding the content they seek. Google wont be offering the search results users are seeking. You will experience a high bounce rate as many users do not like 404 pages, and it will take a month for an average site to be fully crawled and the issue corrected.
If you block the pages in robots.txt, then Google wont attempt to crawl the links. In general, your robots.txt should not be used in this manner.
My recommendation is to fix this issue either with the proper 301s. If that is not an option, be sure your 404 page is helpful and as user friendly as possible. Include a site search option along with your main navigation. Google will crawl a small percent of your site each day. You will notice the number of 404 links diminish over time.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
URL Structure On Site - Currently it's domain/product-name NOT domain/category/product name is this bad?
I have a eCommerce site and the site structure is domain/product-name rather than domain/product-category/product-name Do you think this will have a negative impact SEO Wise? I have seen that some of my individual product pages do get better rankings than my categories.
Technical SEO | | the-gate-films0 -
Hashtag in url seems to remove the google plus one
My site has a catalogue page (catalog in US) with #anchors so that customers can get straight to them. I even link from other pages to the #anchor on the catalogue page. so I have for example: www.example.co.uk/catalogue.htm www.example.co.uk/catalogue.htm#blueitems www.example.co.uk/catalogue.htm#redtems I understand google doesn't index after the #, here is the post I found: http://moz.com/community/q/hashtag-anchor-text-within-content#reply_91192 So I shouldn't have an seo problem. BUT, if I navigate to www.example.co.uk/catalogue.htm and plusone the page it will show the plusone and then I navigate to www.example.co.uk/catalogue.htm#blueitems the plus one is gone. The same happens in reverse, if i plusone www.example.co.uk/catalogue.htm#redtems, then that plusone doesn't show in www.example.co.uk/catalogue.htm. I added rel=canonical and that fixed the plusone problem, now if you plus one /catalogue.htm#redtems it still shows on catalogue.htm This seems a bit extreme and did I do the right things??
Technical SEO | | Peter24680 -
My site was Not removed from google, but my most visited page was. what does that mean?
Help. My most important page http://hoodamath.com/games/ has disappeared from google, why the rest of my site still remains. i can't find anything about this type of ban. any help would be appreciated ( i would like to sleep tonight)
Technical SEO | | hoodamath0 -
New EMD update effected my mom's legit author page? From page 1 in SERP to nowhere for her name
I think my mom's site, MargaretTerry.com was hit by this update for her name "Margaret Terry". Went from bouncing around the first page on google.com and .ca all the time to nowhere on the index. The results are now very strange, a mix of Youtube, linked in, and small book stores that she has done events at recently to promote her first book. I was checking after some of my SEO buddys were freaking out about their EMD's getting hit on Sunday. She is an aspiring author with a book coming out this month. There is obviously no ads or spam content on the site... I have never done SEO for it either except a bit of on page I guess. It sucks that people might be grabbing her book soon and when they Google her name nothing shows up. This couldn't have really happened at a worse time. Not to mention the hours spent building the site to her liking, free of charge of course 🙂 Is there anyone I can contact there to help me out? Shouldn't and EMD that is someones name still rank when you search their name?
Technical SEO | | Operatic0 -
How to remove crawl errors in google webmaster tools
In my webmaster tools account it says that I have almost 8000 crawl errors. Most of which are http 403 errors The urls are http://legendzelda.net/forums/index.php?app=members§ion=friends&module=profile&do=remove&member_id=224 http://legendzelda.net/forums/index.php?app=core&module=attach§ion=attach&attach_rel_module=post&attach_id=166 And similar urls. I recently blocked crawl access to my members folder to remove duplicate errors but not sure how i can block access to these kinds of urls since its not really a folder thing. Any idea on how to?
Technical SEO | | NoahGlaser780 -
Different TLD's same content - duplicate content? - And a problem in foreign googles?
Hi, Operating from the Netherlands with customers troughout Europe we have for some countries the same content. In the netherlands and Belgium Dutch is spoken and in Germany and Switserland German is spoken. For these countries the same content is provided. Does Google see this as duplicate content? Could it be possible that a german customer gets the Swiss website as a search result when googling in the German Google? Thank you for your assistance! kind regards, Dennis Overbeek Dennis@acsi.eu
Technical SEO | | SEO_ACSI0 -
How do I get Google to display categories instead of the URL in results?
I've seen that for some domains Google will show a nice clickable site heirarchy in place of the actual URL of a search result. See attached for an example. How do I go about achieving this type of results? categorized.png
Technical SEO | | Carlito-2569610