Duplicate content from development website
-
Hi all - I've been trawling for duplicate content and then I stumbled across a development URL, set up by a previous web developer, which nearly mirrors current site (few content and structure changes since then, but otherwise it's all virtually the same). The developer didn't take it down when the site was launched.
I'm guessing the best thing to do is tell him to take down the development URL (which is specific to the pizza joint btw, immediately. Is there anything else I should ask him to do?
Thanks, Luke
-
Well when I did it I put one removal request in for the whole domain and also put a disallow in the robots.txt for the whole site. Matt appears to be referring to putting in to many removal requests, but if you want your whole site removing you only need one so this wouldn't be an issue - you put your domain URL in. When you say your page has no snippet have you checked what your meta description is as this can help influence your snippet text. I would work at getting your development site removed a.s.a.p and then seeing what happens with your snippet - I think that there is a good chance it could be down to duplicate content issues. Have you checked what the cache for your homepage is in Googles results?
-
Hello Max!
Thank you very much for your answer!
First of all... no, i didn't have analytics or webmaster tools on the development site, i just set up google webmaster tools yesterday to put the removal request. There are ~1800 pages from the dev site indexed and i was removing them one by one when i found this artlicle bu Matt Cutts so i stopped removing:
http://www.mattcutts.com/blog/overdoing-url-removals/
Do you think it would be a good idea to keep doing it?
As far as i have seen, the development site is not outranking the main site but my concern is that the main site home page is showing up in SERP with no snippet so i'm wondering if it´s related somehow with the duplicated content issue.
Regarding your suggestion, DEFINITELLY... that's the type of things that you assume the development company would take care of... I already asked them to add HTTP authentication to the development site!
I really hope Google gets the change soon!
Thank you very much for your help, i really appreciate it!
Un abrazo
-
Hi Max
A couple of questions to understand your situation better - do you have both Google Analytics and Google Webmaster Tools installed on your development site? Is your development site out ranking your main site for any of your key terms?
In my experience unless your development site is out ranking your main site I would add a robots.txt file to disallow all bot access and then I would also put in a removal request for your domain on Google Webmaster Tools. I found this fix very quick - within a matter of days everything was fixed.
However if you feel that you are getting traffic to your development site and it is out ranking your main site, so you have decided that the rel canonical option is best I would still remove your development site when rankings swap around (as Marie pointed out this took a week or so for her).
In regards to your development site I would always aim to have it removed from the index and when you have your issues sorted I would place a password on the whole site so that nobody can access it other than you or someone that has the password. This will allow you to use your development site to its full potential and not have to worry about competitors that have found the URL monitoring your development site even when it is de-indexed!
BTW when I had this issue I had several thousand pages indexed in Google from my development site. Unfortunately I can't give you an exact time as to how long it will take to fix this issue as it all depends on the current crawl rates to your sites.
Hope this helps!
-
I'm having a very similar problem... the development site got crawled and it has 1700+ pages indexed in Google. I'm working to redirect every page from the development site to its equivalent in the production site.
There's something else that i don't understand... the home page of the production site is not showing any snippet in SERPs.. do you think this can be caused by the duplication issue with the development site?
After redirecting from development to production, how long do you think it will take google to reindex everything and understand that there's no duplicated content anymore?
I would really appreciate your opinions!
Un abrazo
-
Thanks so much Matt, Kerie & Marie - brilliant advice there - really brilliant. With your help it's all removed now.
Blimey, that discovery sure set my heart racing (eeeek.)
-
Thanks Keri, great advice on the use of a code monitor - I have known the situation to occur where code changes have been made to development sites and the robots.txt has been changed or removed by mistake causing the development site to be indexed again. Monitoring this would have helped react to this situation so much quicker!
-
I had a similar situation where I had developed a site for a landscaping client. I thought I had gotten rid of the files but somehow Google found them. My development site ranked #1 for his terms and his site was on something like page 6 because it was duplicate content. Here's what I did:
I didn't want to take down my site right away because his company was ranking #1 for his keywords. (Even though they landed on the development site they still had his phone number to call.)
I added a rel canonical to the development site that told Google that the correct site to index was actually the client's site.
Within a week or so, the proper site was ranking #1. At that point I deleted the files for the development site.
-
Excellent advice here. If it's on a subdomain, the subdomain can be claimed in GWT as its own site. You can put a robots.txt on the subdomain then request the entire subdomain be removed from the index.
You may want to go one step further and use something like PolePosition's Code Monitor that checks the code of any page once per day and alerts you if there's a change. In a similar situation, I had it monitor the robots.txt for the live and all development sites for where I was working, so I knew if the developers changed something and could react quickly.
-
Hi Luke
I had the same problem and this is how I fixed it - I registered the development domain with GWT and then put in a removal request. I also got our developers to setup a robot.txt file to tell search engines not to index any of the site - the contents of the robots.txt file are as follows:
User-agent: * Disallow: /
This soon fixed the issue for us. Hope this helps - obviously you don't need the robots.txt if your are just going to take the site down completely as there will be no worry of people finding it in search engines and mistaking it for your live site or search engines finding duplicate content. I used this strategy as we still use the development site for testing etc before going live.
Can I just check is the URL on a separate domain? If it isn't and it is part of your existing domain for instance you can still block that URL using either a robots.txt file or a no index, no follow meta tag. You can also request removal of specific URL's within a site in GWT.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Woocommerce SEO & Duplicate content?
Hi Moz fellows, I'm new to Woocommerce and couldn't find help on Google about certain SEO-related things. All my past projects were simple 5 pages websites + a blog, so I would just no-index categories, tags and archives to eliminate duplicate content errors. But with Woocommerce Product categories and tags, I've noticed that many e-Commerce websites with a high domain authority actually rank for certain keywords just by having their category/tags indexed. For example keyword 'hippie clothes' = etsy.com/category/hippie-clothes (fictional example) The problem is that if I have 100 products and 10 categories & tags on my site it creates THOUSANDS of duplicate content errors, but If I 'non index' categories and tags they will never rank well once my domain authority rises... Anyone has experience/comments about this? I use SEO by Yoast plugin. Your help is greatly appreciated! Thank you in advance. -Marc
Intermediate & Advanced SEO | | marcandre1 -
How would you handle this duplicate content - noindex or canonical?
Hello Just trying look at how best to deal with this duplicated content. On our Canada holidays page we have a number of holidays listed (PAGE A)
Intermediate & Advanced SEO | | KateWaite
http://www.naturalworldsafaris.com/destinations/north-america/canada/suggested-holidays.aspx We also have a more specific Arctic Canada holidays page with different listings (PAGE B)
http://www.naturalworldsafaris.com/destinations/arctic-and-antarctica/arctic-canada/suggested-holidays.aspx Of the two, the Arctic Canada page (PAGE B) receives a far higher number of visitors from organic search. From a user perspective, people expect to see all holidays in Canada (PAGE A), including the Arctic based ones. We can tag these to appear on both, however it will mean that the PAGE B content will be duplicated on PAGE A. Would it be the best idea to set up a canonical link tag to stop this duplicate content causing an issue. Alternatively would it be best to no index PAGE A? Interested to see others thoughts. I've used this (Jan 2011 so quite old) article for reference in case anyone else enters this topic in search of information on a similar thing: Duplicate Content: Block, Redirect or Canonical - SEO Tips0 -
Duplicate content for hotel websites - the usual nightmare? is there any solution other than producing unique content?
Hiya Mozzers I often work for hotels. A common scenario is the hotel / resort has worked with their Property Management System to distribute their booking availability around the web... to third party booking sites - with the inventory goes duplicate page descriptions sent to these "partner" websites. I was just checking duplication on a room description - 20 loads of duplicate descriptions for that page alone - there are 200 rooms - so I'm probably looking at 4,000 loads of duplicate content that need rewriting to prevent duplicate content penalties, which will cost a huge amount of money. Is there any other solution? Perhaps ask booking sites to block relevant pages from search engines?
Intermediate & Advanced SEO | | McTaggart0 -
Do you bother cleaning duplicate content from Googles Index?
Hi, I'm in the process of instructing developers to stop producing duplicate content, however a lot of duplicate content is already in Google's Index and I'm wondering if I should bother getting it removed... I'd appreciate it if you could let me know what you'd do... For example one 'type' of page is being crawled thousands of times, but it only has 7 instances in the index which don't rank for anything. For this example I'm thinking of just stopping Google from accessing that page 'type'. Do you think this is right? Do you normally meta NoIndex,follow the page, wait for the pages to be removed from Google's Index, and then stop the duplicate content from being crawled? Or do you just stop the pages from being crawled and let Google sort out its own Index in its own time? Thanks FashionLux
Intermediate & Advanced SEO | | FashionLux0 -
Wordpress Duplicate Content Due To Allocating Two Post Categories
It looks like google has done a pretty deep crawl of my site and is now showing around 40 duplicate content issues for posts that I have tagged in two seperate categories for example: http://www.musicliveuk.com/latest-news/live-music-boosts-australian-economy http://www.musicliveuk.com/live-music/live-music-boosts-australian-economy I use the all in one SEO pack and have checked the no index for categories, archive, and tag archive boxes so google shouldn't even crawl this content should it? . I guess the obvious answer is to only put each post in one category but I shouldn't have to should I? Some posts are relevant in more than once category.
Intermediate & Advanced SEO | | SamCUK0 -
Having a hard time with duplicate page content
I'm having a hard time redirecting website.com/ to website.com The crawl report shows both versions as duplicate content. Here is my htaccess: RewriteEngine On
Intermediate & Advanced SEO | | cgman
RewriteBase /
#Rewrite bare to www
RewriteCond %{HTTP_HOST} ^mywebsite.com
RewriteRule ^(([^/]+/)*)index.php$ http://www.mywebsite.com/$1 [R=301,L] RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^(.*)$ $1.php [NC,L]
RewriteCond %{HTTP_HOST} !^.localhost$ [NC]
RewriteRule ^(.+)/$ http://%{HTTP_HOST}$1 [R=301,L] I added the last 2 lines after seeing a Q&A here, but I don't think it has helped.0 -
Duplicate content
I have just read http://www.seomoz.org/blog/duplicate-content-in-a-post-panda-world and I would like to know which option is the best fit for my case. I have the website http://www.hotelelgreco.gr and every image in image library http://www.hotelelgreco.gr/image-library.aspx has a different url but is considered duplicate with others of the library. Please suggest me what should i do.
Intermediate & Advanced SEO | | socrateskirtsios0 -
Multi-language, multi-country localized website with duplicate content penalty
My company website is multi-language and multi-country. Content created for the Global (English-language only, root directory) site is automatically used when no localization exists for the language and country choice (i.e. Brazil). I'm concerned this may be harming our SEO through dupe content penalties. Can anyone confirm this is possible? Any recommendations on how to solve the issue? Maybe the canonical tag? Thanks very much!
Intermediate & Advanced SEO | | IanTreviranus0