Help With Preferred Domain Settings, 301 and Duplicate Content
-
I've seen some good threads developed on this topic in the Q&A archives, but feel this topic deserves a fresh perspective as many of the discussion were almost 4 years old.
My webmaster tools preferred domain setting is currently non www. I didn't set the preferred domain this way, it was like this when I first started using WM tools.
However, I have built the majority of my links with the www, which I've always viewed as part of the web address.
When I put my site into an SEO Moz campaign it recognized the www version as a subdomain which I thought was strange, but now I realize it's due to the www vs. non www preferred domain distinction.
A look at site:mysite.com shows that Google is indexing both the www and non www version of the site. My site appears healthy in terms of traffic, but my sense is that a few technical SEO items are holding me back from a breakthrough.
QUESTION to the SEOmoz community:
What the hell should I do? Change the preferred domain settings? 301 redirect from non www domain to the www domain?
Google suggests this: "Once you've set your preferred domain, you may want to use a 301 redirect to redirect traffic from your non-preferred domain, so that other search engines and visitors know which version you prefer."
Any insight would be greatly appreciated.
-
The worst thing you can do is nothing.
Above is 5 examples of URLs which COULD all lead to the same page. There are numerous other possibilities as well. If you don't let Google know which version of the page is correct, then you will suffer the consequences of duplicate content.
What happens is Google doesn't know which page is correct. They will pick one of the non-www versions because that is what your Google WMT is set up to do. Meanwhile other versions of the pages are being used.
You are sending your link juice to a page, but it is a complete waste as it is not being considered by Google for SERP. You MUST resolve this issue if you care about SEO at all.
-
Thanks Ryan. So, if most of the links (including all internal links) are built with the www format then it is wise to change preferred domain settings to www and redirect the non www to the www domain?
Am I likely to damage rankings/traffic by doing this? What happens if I just leave it as is?
-
You are welcome to do so. Go to Google WMT, change your current option to the www, then adjust your .htaccess file as Steven suggested.
Also, canonicalize your pages to help ensure this issue can't happen again. Your .htaccess changes will work as long as the file is there, but things happen so it's better to be covered.
-
Guys,
Thanks for the input. I just want to do what is best for traffic and the site. I don't want to do anything that is going to tank my rankings and visitors.
I don't get alot of type in traffic.
www is the main way the links have been built, why not just redirect those to the non www version?
-
As Ryan said, make a decision. The easiest way to make sure either of your decisions sticks is to use an htaccess file and rewrite to your preferred.
If using the www version:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^[0-9]+(.[0-9]+){3} [OR]
RewriteCond %{HTTP_HOST} ^mydomain.com [NC]
RewriteRule ^(.*)$ http://www.mydomain.com/$1 [L,R=301]if using the non www version:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^www.mydomain.com [NC]
RewriteRule ^(.*)$ http://mydomain.com/$1 [L,R=301]A few other questions to keep in mind:
Do you get a lot of type-in traffic?
Do they tend to type the www?
In the SERP it is easier to read the domain name with out the www if looking for a specific domain name. Do you have a brand built where people just say your domain name?
-
You need to make a decision. Do you want your site address to be seen with or without the www?
Try to assess which version of your URL would require the least number of re-directs. You mentioned the links you built mostly include the www. Take a look at all of your links. You may have a higher number of organic links without the www. Evaluate all the links, then make a decision.
Once you make a decision, stick with it. Canonicalize all your pages with the correct version of the URL. Search your site for all internal links and standardize them.
While you are on this project standardize whether you use a "/" on the end of your url as well. www.mysite.com is not the same as www.mysite.com/. I make this suggestion because if you will go through the painful process of standardizing your site for the www issue, you should resolve all issues at once.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate content hidden behind tabs
Just looking at an ecommerce website and they've hidden their product page's duplicate content behind tabs on the product pages - not on purpose, I might add. Is this a legitimate way to hide duplicate content, now that Google has lowered the importance and crawlability of content hidden behind tabs? Is this a legitimate tactic to tackle duplicate content? Your thoughts would be welcome. Thanks, Luke
Intermediate & Advanced SEO | | McTaggart0 -
Duplicate Content... Really?
Hi all, My site is www.actronics.eu Moz reports virtually every product page as duplicate content, flagged as HIGH PRIORITY!. I know why. Moz classes a page as duplicate if >95% content/code similar. There's very little I can do about this as although our products are different, the content is very similar, albeit a few part numbers and vehicle make/model. Here's an example:
Intermediate & Advanced SEO | | seowoody
http://www.actronics.eu/en/shop/audi-a4-8d-b5-1994-2000-abs-ecu-en/bosch-5-3
http://www.actronics.eu/en/shop/bmw-3-series-e36-1990-1998-abs-ecu-en/ate-34-51 Now, multiply this by ~2,000 products X 7 different languages and you'll see we have a big dupe content issue (according to Moz's Crawl Diagnostics report). I say "according to Moz..." as I do not know if this is actually an issue for Google? 90% of our products pages rank, albeit some much better than others? So what is the solution? We're not trying to deceive Google in any way so it would seem unfair to be hit with a dupe content penalty, this is a legit dilemma where our product differ by as little as a part number. One ugly solution would be to remove header / sidebar / footer on our product pages as I've demonstrated here - http://woodberry.me.uk/test-page2-minimal-v2.html since this removes A LOT of page bloat (code) and would bring the page difference down to 80% duplicate.
(This is the tool I'm using for checking http://www.webconfs.com/similar-page-checker.php) Other "prettier" solutions would greatly appreciated. I look forward to hearing your thoughts. Thanks,
Woody 🙂1 -
Avoiding Duplicate Content with Used Car Listings Database: Robots.txt vs Noindex vs Hash URLs (Help!)
Hi Guys, We have developed a plugin that allows us to display used vehicle listings from a centralized, third-party database. The functionality works similar to autotrader.com or cargurus.com, and there are two primary components: 1. Vehicle Listings Pages: this is the page where the user can use various filters to narrow the vehicle listings to find the vehicle they want.
Intermediate & Advanced SEO | | browndoginteractive
2. Vehicle Details Pages: this is the page where the user actually views the details about said vehicle. It is served up via Ajax, in a dialog box on the Vehicle Listings Pages. Example functionality: http://screencast.com/t/kArKm4tBo The Vehicle Listings pages (#1), we do want indexed and to rank. These pages have additional content besides the vehicle listings themselves, and those results are randomized or sliced/diced in different and unique ways. They're also updated twice per day. We do not want to index #2, the Vehicle Details pages, as these pages appear and disappear all of the time, based on dealer inventory, and don't have much value in the SERPs. Additionally, other sites such as autotrader.com, Yahoo Autos, and others draw from this same database, so we're worried about duplicate content. For instance, entering a snippet of dealer-provided content for one specific listing that Google indexed yielded 8,200+ results: Example Google query. We did not originally think that Google would even be able to index these pages, as they are served up via Ajax. However, it seems we were wrong, as Google has already begun indexing them. Not only is duplicate content an issue, but these pages are not meant for visitors to navigate to directly! If a user were to navigate to the url directly, from the SERPs, they would see a page that isn't styled right. Now we have to determine the right solution to keep these pages out of the index: robots.txt, noindex meta tags, or hash (#) internal links. Robots.txt Advantages: Super easy to implement Conserves crawl budget for large sites Ensures crawler doesn't get stuck. After all, if our website only has 500 pages that we really want indexed and ranked, and vehicle details pages constitute another 1,000,000,000 pages, it doesn't seem to make sense to make Googlebot crawl all of those pages. Robots.txt Disadvantages: Doesn't prevent pages from being indexed, as we've seen, probably because there are internal links to these pages. We could nofollow these internal links, thereby minimizing indexation, but this would lead to each 10-25 noindex internal links on each Vehicle Listings page (will Google think we're pagerank sculpting?) Noindex Advantages: Does prevent vehicle details pages from being indexed Allows ALL pages to be crawled (advantage?) Noindex Disadvantages: Difficult to implement (vehicle details pages are served using ajax, so they have no tag. Solution would have to involve X-Robots-Tag HTTP header and Apache, sending a noindex tag based on querystring variables, similar to this stackoverflow solution. This means the plugin functionality is no longer self-contained, and some hosts may not allow these types of Apache rewrites (as I understand it) Forces (or rather allows) Googlebot to crawl hundreds of thousands of noindex pages. I say "force" because of the crawl budget required. Crawler could get stuck/lost in so many pages, and my not like crawling a site with 1,000,000,000 pages, 99.9% of which are noindexed. Cannot be used in conjunction with robots.txt. After all, crawler never reads noindex meta tag if blocked by robots.txt Hash (#) URL Advantages: By using for links on Vehicle Listing pages to Vehicle Details pages (such as "Contact Seller" buttons), coupled with Javascript, crawler won't be able to follow/crawl these links. Best of both worlds: crawl budget isn't overtaxed by thousands of noindex pages, and internal links used to index robots.txt-disallowed pages are gone. Accomplishes same thing as "nofollowing" these links, but without looking like pagerank sculpting (?) Does not require complex Apache stuff Hash (#) URL Disdvantages: Is Google suspicious of sites with (some) internal links structured like this, since they can't crawl/follow them? Initially, we implemented robots.txt--the "sledgehammer solution." We figured that we'd have a happier crawler this way, as it wouldn't have to crawl zillions of partially duplicate vehicle details pages, and we wanted it to be like these pages didn't even exist. However, Google seems to be indexing many of these pages anyway, probably based on internal links pointing to them. We could nofollow the links pointing to these pages, but we don't want it to look like we're pagerank sculpting or something like that. If we implement noindex on these pages (and doing so is a difficult task itself), then we will be certain these pages aren't indexed. However, to do so we will have to remove the robots.txt disallowal, in order to let the crawler read the noindex tag on these pages. Intuitively, it doesn't make sense to me to make googlebot crawl zillions of vehicle details pages, all of which are noindexed, and it could easily get stuck/lost/etc. It seems like a waste of resources, and in some shadowy way bad for SEO. My developers are pushing for the third solution: using the hash URLs. This works on all hosts and keeps all functionality in the plugin self-contained (unlike noindex), and conserves crawl budget while keeping vehicle details page out of the index (unlike robots.txt). But I don't want Google to slap us 6-12 months from now because it doesn't like links like these (). Any thoughts or advice you guys have would be hugely appreciated, as I've been going in circles, circles, circles on this for a couple of days now. Also, I can provide a test site URL if you'd like to see the functionality in action.0 -
Wordpress and duplicate content
Hi, I have recently installed wordpress and started a blog but now loads of duplicate pages are cropping up for tags and authors and dates etc. How do I do the canonical thing in wordpress? Thanks Ian
Intermediate & Advanced SEO | | jwdl0 -
Virtual Domains and Duplicate Content
So I work for an organization that uses virtual domains. Basically, we have all our sites on one domain and then these sites can also be shown at a different URL. Example: sub.agencysite.com/store sub.brandsite.com/store Now the problem comes up often when we move the site to a brand's URL versus hosting the site on our URL, we end up with duplicate content. Now for god knows what damn reason, I currently cannot get my dev team to implement 301's but they will implement 302's. (Dont ask) I also am left with not being able to change the robots.txt file for our site. They say if we allowed people to go in a change this stuff it would be too messy and somebody would accidentally block a site that was not supposed to be blocked on our domain. (We are apparently incapable toddlers) Now I have an old site, sub.agencysite.com/store ranking for my terms while the new site is not showing up. So I am left with this question: If I want to get the new site ranking what is the best methodology? I am thinking of doing a 1:1 mapping of all pages and set up 302 redirects from the old to the new and then making the canonical tags on the old to reflect the new. My only thing here is how will Google actually view this setup? I mean on one hand I am saying
Intermediate & Advanced SEO | | DRSearchEngOpt
"Hey, Googs, this is just a temp thing." and on the other I am saying "Hey, Googs, give all the weight to this page, got it? Graci!" So with my limited abilities, can anybody provide me a best case scenario?0 -
Duplicate blog content and NOINDEX
Suppose the "Home" page of your blog at www.example.com/domain/ displays your 10 most recent posts. Each post has its own permalink page (where you have comments/discussion, etc.). This obviously means that the last 10 posts show up as duplicates on your site. Is it good practice to use NOINDEX, FOLLOW on the blog root page (blog/) so that only one copy gets indexed? Thanks, Akira
Intermediate & Advanced SEO | | ahirai0 -
What constitutes duplicate content?
I have a website that lists various events. There is one particular event at a local swimming pool that occurs every few months -- for example, once in December 2011 and again in March 2012. It will probably happen again sometime in the future too. Each event has its own 'event' page, which includes a description of the event and other details. In the example above the only thing that changes is the date of the event, which is in an H2 tag. I'm getting this as an error in SEO Moz Pro as duplicate content. I could combine these pages, since the vast majority of the content is duplicate, but this will be a lot of work. Any suggestions on a strategy for handling this problem?
Intermediate & Advanced SEO | | ChatterBlock0 -
What will happen after I 301 this domain?
A while back I created a new website. Somehow my "scratch" copies of the site got indexed even though I didn't have links built to them. (In the future I will use noindex tags when I am playing around with designing). Now, I have three versions of the site online...let's call them TheRealSite.com and Practice1.com and Practice2.com. Practice1.com and Practice2.com now rank #1 for their main keyword. (It's a relatively uncompetitive niche). TheRealSite.com is somewhere lower than page 20 despite having an exact keyword match domain name. I'm assuming that Google considered it duplicate content as it is the exact same thing as Practice1 and 2. I had considered simply removing Practice1 and 2 from the server, but I was worried that if I did that, I would lose my #1 rankings if TheRealSite didn't recover. So, what I've done is 301 redirect Practice1 and Practice2 to TheRealSite. I'm guessing that over time TheRealSite will come back to #1 and then I can just remove the files from Practice1 and Practice2. Is this the best way to handle this situation?
Intermediate & Advanced SEO | | MarieHaynes1