Noindexing Duplicate (non-unique) Content
-
When "noindex" is added to a page, does this ensure Google does not count page as part of their analysis of unique vs duplicate content ratio on a website? Example: I have a real estate business and I have noindex on MLS pages. However, is there a chance that even though Google does not index these pages, Google will still see those pages and think "ah, these are duplicate MLS pages, we are going to let those pages drag down value of entire site and lower ranking of even the unique pages". I like to just use "noindex, follow" on those MLS pages, but would it be safer to add pages to robots.txt as well and that should - in theory - increase likelihood Google will not see such MLS pages as duplicate content on my website?
On another note: I had these MLS pages indexed and 3-4 weeks ago added "noindex, follow". However, still all indexed and no signs Google is noindexing yet.....
-
Canonical pages don't have to be the same.
it will merge the content to look like one page.
Good luck
-
thx, Alan. I am already using re=next prev. However, that means all those paginated pages will still be indexed. I am adding the "noindex, follow" to page 2-n and only leaving page 1 indexed. Canonical: I don't think that will work. Each page in the series shows different properties, which means pages 1 - n are all different......
-
Ok if you use follow, that will be ok. but I would be looking at canonical or next previous first
-
I am trying to rank for those MLS duplicate alike pages, since that is what users want (they don't want my guide pages with lots of unique data, when they are searching "....for sale"). I will add unique data to page 1 of these MLS result pages. However, page 2-50 will NOT change (stay duplicate alike looking). If I have page 1-50 indexed, the unique content on page 1 may look like a drop in the ocean to G, and that is why I feel including "noindex, follow" on pages 2-50 may make sense.
-
That's correct.
you wont rank for duplicate pages, but unless most of your site is duplicate you wont be penalized
-
http://moz.com/blog/handling-duplicate-content-across-large-numbers-of-urls - that is Rand's whiteboard Friday a few weeks ago and I quote from the transcripts:
"So what happens, basically, is you get a page like this. I'm at BMO's Travel Gadgets. It's a great website where I can pick up all sorts of travel supplies and gear. The BMO camera 9000 is an interesting one because the camera's manufacturer requires that all websites which display the camera contain a lot of the same information. They want the manufacturer's description. They have specific photographs that they'd like you to use of the product. They might even have user reviews that come with those.
Because of this, a lot of the folks, a lot of the e-commerce sites who post this content find that they're getting trapped in duplicate content filters. Google is not identifying their content as being particularly unique. So they're sort of getting relegated to the back of the index, not ranking particularly well. They may even experience problems like Google Panda, which identifies a lot of this content and says, "Gosh, we've seen this all over the web and thousands of their pages, because they have thousands of products, are all exactly the same as thousands of other websites' other products."
-
There is nothing wrong with having duplicate content. It becomes a problem when you have a site that is all or almost all duplicate or thin content.
Having a page that is on every other competitors site will not harm you, you just may not rank for it.
but no indexing can cause lose of link juice as all links pointing to non indexed pages waste there link juice. Using noindex,follow will return most of this, but still there in no need to no-index
-
http://www.honoluluhi5.com/oahu-condos/ - this is an "MLS result page". That URL will soon have some statistics and it will be unique (I will include in index). All the paginated pages (2 to n) hardly has any unique content. It is great layout, users love it (ADWords campaign average user spends 9min and views 16 pages on site), but since it is MLS listings (shared amongst thousands of Realtors) Google will see "ah, these are duplicate pages, nothing unique". That is why I plan to index page 1 (the URL I list) but all paginated pages like: http://www.honoluluhi5.com/oahu-condos/page-2) I will keep as "noindex, follow". Also, I want to rank for this URL: http://www.honoluluhi5.com/oahu/honolulu-condos/ which is a sub-category of the first URL and 100% of the content is exactly the same as the 1st URL. So, I will focus on indexing just the 1st page and not the paginated pages. Unfortunately, G cannot see value in layout and design and I can see how keeping all pages indexed could hurt my site.
Would be happy to hear your thoughts on this. I launched site 4 months ago, more unique and quality content than 99% of other firms I am up against, yet nothing happens ranking wise yet. I suspect all these MLS pages is the issue. Time will show!
-
If you no index, I don't think Next Previous will have any affect.
If they are different then and if the keywords are all important why no-index?
-
Thx ,Philip. I am using already, but I thought adding "noindex, follow" to those paginated pages (on top of rel=next prev") will increase likelihood G will NOT see all those MLS result pages as a bunch of duplicate content. Page 1 may look thin, but with some statistical data I will soon include it is unique and that uniqueness may offset lack of indexed MLS result pages.....not sure if my reasoning is sound. Would be happy to hear if you feel differently
-
Sounds like you should actually be using rel=next and rel=prev.
More info here: http://googlewebmastercentral.blogspot.com/2011/09/pagination-with-relnext-and-relprev.html
-
Hi Alan, thx for your comment. Let me give you an example and if you have a though that's be great:
- Condos on Island: http://www.honoluluhi5.com/oahu-condos/
- Condos in City: http://www.honoluluhi5.com/oahu/honolulu-condos/
- Condos in Region: http://www.honoluluhi5.com/oahu/honolulu/metro-condos/
Properties on the result page for 3) are all in 2) and all properties within 2) is within 1). Furthermore, for each of those URL, the paginated pages (2 to n) are all different, since each property is different, so using canonical tags would not be accurate. 1 + 2 + 3 are all important keywords.
Here is what I am planning: add some unique content to the first page in the series for each of those URL and include just the 1st page in the serious to the index, but pages 2 to n I will keep "noindex, follow" on. Argument could be "your MLS result pages will look too thin and not rank" but the other way of looking at it is "with potentially 500 or more properties on each URL, a bit of stats on page 1 will not offset all the MLS duplicate data, so even though the page may look thin, only indexing page 1 is best way forward".
-
Remember that if you no-index pages, any link you have on your site pointing to those pages is wasting its link juice.
This looks like a job for Canonical tag
-
lol - good answer Philip. I hear you. What makes it difficult is the lack of crystal clear guidelines from search engines....it is almost like they don't know themselves and each case is sort of on a "what feels right" basis.....
-
Good find. I've never seen this part of the help section. Their resonating reason behind all of the examples seems to be "You don’t need to manually remove URLs; they will drop out naturally over time."
I have never had an issue, nor have I ever heard of anyone having an issue, removing URLs with the Removal Tool. I guess if you don't feel safe doing it, you can wait for Google's crawler to catch up, although it could take over a month. If you're comfortable waiting it out, have no reasons to rush it, AND feel like playing it super safe... you can disregard everything I've said
We all learn something new every day!
-
based on Google's own guidelines it appears to be a bad idea to use the removal tool under normal circumstances (which I believe my site falls under): https://support.google.com/webmasters/answer/1269119
It starts with: "The URL removal tool is intended for pages that urgently need to be removed—for example, if they contain confidential data that was accidentally exposed. Using the tool for other purposes may cause problems for your site."
-
thx, Philip. Most helpful. I will get on it
-
Yes. It will remove /page-52 and EVERYTHING that exists in /oahu/honolulu/metro/waikiki-condos/. It will also remove everything that exists in /page-52/ (if anything). It trickles down as far as the folders in that directory will go.
**Go to Google search and type this in: **site:honoluluhi5.com/oahu/honolulu/metro/waikiki-condos/
That will show you everything that's going to be removed from the index.
-
Yep, you got it.
You can think of it exactly like Windows folders, if that helps you stay focused. If you have C:\Website\folder1 and C:\Website\folder12. "noindexing" \folder1\ would leave \folder12\ alone because they're not in the same directory.
-
for some MLS result pages I have a BUNCH of pages and I want to remove from index with 1 click as opposed to having to include each paginated page. Example: http://www.honoluluhi5.com/oahu/honolulu/metro/waikiki-condos/page-52 I simply include"/oahu/honolulu/metro/waikiki-condos/" and that will ALSO remove from index this page: http://www.honoluluhi5.com/oahu/honolulu/metro/waikiki-condos/page-52 - is that correct?
-
removing directory "/oahu/waianae-makaha-condos/" will NOT remove "/oahu/waianae-makaha/maili-condos/" because the silo "waianae-makaha" and "waianae-makaha-condos" are different.
HOWEVER,
removing directory " /oahu/waianae-makaha/maili-condos/" will remove "/oahu/waianae-makaha/maili-condos/page-2" because they share this silo "waianae-makaha"Is that correctly understood?
-
Yep. Just last week I had an entire website deindexed (on purpose, it's a staging website) by entering just / into the box and selecting directory. By the next morning the entire website was gone from the index
It works for folders/directories too. I've used it many times.
-
so I will remove directory for "/oahu/waianae-makaha/maili-condos/" and that will ensure removal of "/oahu/waianae-makaha/maili-condos/page-2" as well?
-
thx, Philip. So you are saying if I use the directory option that will ensure the paginated pages will also be taken out of the index like this page: /oahu/waianae-makaha/maili-condos/page-2
-
I'm not 100% sure Google will understand you if you leave off the slashes. I've always added them and have never had a problem, so you want to to type: /oahu/waianae-makaha-condos/
Typing that would NOT include the neighborhood URL, in your example. It will only remove everything that exists in the /waianae-makaha-condos/ folder (including that main category page itself).
edit >> To remove the neighborhood URL and everything in that folder as well, type /oahu/waianae-makaha/maili-condos/ and select the option for "directory".
edit #2 >> I just want to add that you should be very careful with this. You don't want to use the directory option unless you're 100% sure there's nothing in that directory that you want to stay indexed.
-
thx. I have a URL like this for a REGION: http://www.honoluluhi5.com/oahu/waianae-makaha-condos/ and for a "NEIGHBORHOOD" I have this: http://www.honoluluhi5.com/oahu/waianae-makaha/maili-condos/
As you can see Region has "waianae-makaha-condos" directory, whereas the Neighborhood has "waianae-makaha" without the "condos" for that region directory part.
Question: when I go to GWT and remove can I simply type "oahu/waianae-makaha-condos" and select the directory option and that will ALSO exclude the neighborhood URL? OR, since the region part in the URL within the neighborhood URL is different I have to submit individually?
-
Yep! After you remove the URL or directory of URLs, there is a "Reinclude" button you can get to. You just need to switch your "Show:" view so it shows URLs removed. The default is to show URLs PENDING removal. Once they're removed, they will disappear from that view.
-
good one, Philip. Last BIG question: if I remove URL's from GWT, is it possible to "unremove" without issue? I am planning to index some of these MLS pages in the future when I have more unique content on.
-
When "noindex" is added to a page, does this ensure Google does not count page as part of their analysis of unique vs duplicate content ratio on a website? Yes, that will tell Google that you understand the pages don't belong in the index. They will not penalize your site for duplicate content if you're explicitly telling Google to noindex them.
Is there a chance that even though Google does not index these pages, Google will still see those pages and think "ah, these are duplicate MLS pages, we are going to let those pages drag down value of entire site and lower ranking of even the unique pages". No, there's no chance these will hurt you if they're set to noindex. That is exactly what the noindex tag is for. You're doing what Google wants you to do.
I like to just use "noindex, follow" on those MLS pages, but would it be safer to add pages to robots.txt as well and that should - in theory - increase likelihood Google will not see such MLS pages as duplicate content on my website? You could add them to your robots.txt but that won't increase your likelihood of Google not penalizing you because there is already no worry about being penalized for pages not being indexed.
On another note: I had these MLS pages indexed and 3-4 weeks ago added "noindex, follow". However, still all indexed and no signs Google is noindexing yet.....
Donna's advice is perfect here. Use the Remove URLs tool. Every time I've used the tool, Google has removed the URLs from the index in less than 12-24 hours. I of course made sure to have a noindex tag in place first. Just make sure you enter everything AFTER the TLD (.com, .net, etc) and nothing before it. Example: You'd want to ask Google to remove /mls/listing122 but not example.com/mls/listing122. The ladder will not work properly because Google automatically adds "example.com" to it (they just don't make this very clear). -
thx, Donna. My question was mainly around whether Google will NOT consider MLS pages as duplicate content when I place the "noindex" on. We can all guess, but does anyone have anything concrete on this, to make me understand reality of this. Can we with 90% certainty say "yes, if you place noindex on a duplicate content page, then Google will not consider that duplicate content, hence it will not count towards how Google views duplicate vs unique site content". This is the big question: If we are left in uncertainty, then only way forward may be to password protect such pages and not offer users without creating an account.....
Removal on GWT: I plan to index some of these MLS pages in the future (when I get more unique content on them) and I am concerned if once submitted to GWT for removal, then it is tough to get such pages indexed again.
-
Hi khi5,
I think excluding those MLS listings from your site using the robots.txt file would be over kill.
As I'm sure you well know, Google does what it wants. I think tagging the pages you don't want indexed with "noindex follow" AND adding them to the robots.txt file doesn't make the likelihood that Google will respect your wishes any higher. You might want to consider canonicalizing them though, so links to and bookmarks and shares of said pages get credited to your site.
As to how long it takes for Google to deindex said pages, it can take a very long time. In my experience, "a very long time" can run 6-8 months. You do have the option however, of using Google Webmaster Tools > Google Index > Remove URLs to ask to have them deindexed faster. Again, no guarantees that Google will do as you ask, but I've found them to be pretty responsive when I use the tool.
I'd love to hear if anyone else feels differently.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Fix Duplicate Content Before Migration?
My client has 2 Wordpress sites (A and B). Each site is 20 pages, with similar site structures, and 12 of the pages on A having nearly 100% duplicate content with their counterpart on B. I am not sure to what extent A and/or B is being penalized for this. In 2 weeks (July 1) the client will execute a rebrand, renaming the business, launching C, and taking down A and B. Individual pages on A and B will be 301 redirected to their counterpart on C. C will have a similar site structure to A and B. I expect the content will be freshened a bit, but may initially be very similar to the content on A and B. I have 3 questions: Given that only 2 weeks remain before the switchover - is there any purpose in resolving the duplicate content between A and B prior to taking them down? Will 301 redirects from penalized pages on A or B actually hurt the ranking of the destination page on C? If a page on C has the same content as its predecessor on A or B, could it be penalized for that, even though the page on A or B has since been taken down and replaced with a 301 redirect?
Intermediate & Advanced SEO | | futumara0 -
Duplicate content in external domains
Hi,
Intermediate & Advanced SEO | | teconsite
I have been asking about this case before, but now my question is different.
We have a new school that offers courses and programs . Its website is quite new (just a five months old) It is very common between these schools to publish the courses and programs in training portals to promote those courses and to increase the visibility of them. As the website is really new, I found when I was doing the technical audit, that when I googled a text snipped from the site, the new school website was being omitted, and instead, the course portals are being shown. Of course, I know that the best recommendation would be to create a different content for that purpose, but I would like to explore if there is more options. Most of those portals doesn't allow to place a link to the website in the content and not to mention canonical. Of course most of them are older than the new website and their authority is higher. so,... with this situation, I think the only solution is to create a different content for the website and for the portals.
I was thinking that maybe, If we create the content first in the new website, send it to the index, and wait for google to index it, and then send the content to the portals, maybe we would have more opportunites to not be ommited by Google in search results. What do you think? Thank you!0 -
How do I use public content without being penalized for duplication?
The NHTSA produces a list of all recalls for automobiles. In their "terms of use" it states that the information can be copied. I want to add that to our site, so there is an up-to-date list for our audience to see. However, I'm just copying and pasting. I'm allowed to according to NHTSA, but google will probably flag it right? Is there a way to do this without being penalized? Thanks, Ruben
Intermediate & Advanced SEO | | KempRugeLawGroup1 -
SEO structure question: Better to add similar (but distinct) content to multiple unique pages or make one unique page?
Not sure which approach would be more SEO ranking friendly? As we are a music store, we do instrument repairs on all instruments. Currently, I don't have much of any content about our repairs on our website... so I'm considering a couple different approaches of adding this content: Let's take Trumpet Repair for example: 1. I can auto write to the HTML body (say, at the end of the body) of our 20 Trumpets (each having their own page) we have for sale on our site, the verbiage of all repairs, services, rates, and other repair related detail. In my mind, the effect of this may be that: This added information does uniquely pertain to Trumpets only (excludes all other instrument repair info), which Google likes... but it would be duplicate Trumpet repair information over 20 pages.... which Google may not like? 2. Or I could auto write the repair details to the Trumpet's Category Page - either in the Body, Header, or Footer. This definitely reduces the redundancy of the repeating Trumpet repair info per Trumpet page, but it also reduces each Trumpet pages content depth... so I'm not sure which out weighs the other? 3. Write it to both category page & individual pages? Possibly valuable because the information is anchoring all around itself and supporting... or is that super duplication? 4. Of course, create a category dedicated to repairs then add a subcategory for each instrument and have the repair info there be completely unique to that page...- then in the body of each 20 Trumpets, tag an internal link to Trumpet Repair? Any suggestions greatly appreciated? Thanks, Kevin
Intermediate & Advanced SEO | | Kevin_McLeish0 -
Is an RSS feed considered duplicate content?
I have a large client with satellite sites. The large site produces many news articles and they want to put an RSS feed on the satellite sites that will display the articles from the large site. My question is, will the rss feeds on the satellite sites be considered duplicate content? If yes, do you have a suggestion to utilize the data from the large site without being penalized? If no, do you have suggestions on what tags should be used on the satellite pages? EX: wrapped in tags? THANKS for the help. Darlene
Intermediate & Advanced SEO | | gXeSEO0 -
Duplicate content resulting from js redirect?
I recently created a cname (e.g. m.client-site .com) and added some js (supplied by mobile site vendor to the head which is designed to detect if the user agent is a mobi device or not. This is part of the js: var CurrentUrl = location.href var noredirect = document.location.search; if (noredirect.indexOf("no_redirect=true") < 0){ if ((navigator.userAgent.match(/(iPhone|iPod|BlackBerry|Android.*Mobile|webOS|Window Now... Webmaster Tools is indicating 2 url versions for each page on the site - for example: 1.) /content-page.html 2.) /content-page.html?no_redirect=true and resulting in duplicate page titles and meta descriptions. I am not quite adept enough at either js or htaccess to really grasp what's going on here... so an explanation of why this is occurring and how to deal with it would be appreciated!
Intermediate & Advanced SEO | | SCW0 -
News sites & Duplicate content
Hi SEOMoz I would like to know, in your opinion and according to 'industry' best practice, how do you get around duplicate content on a news site if all news sites buy their "news" from a central place in the world? Let me give you some more insight to what I am talking about. My client has a website that is purely focuses on news. Local news in one of the African Countries to be specific. Now, what we noticed the past few months is that the site is not ranking to it's full potential. We investigated, checked our keyword research, our site structure, interlinking, site speed, code to html ratio you name it we checked it. What we did pic up when looking at duplicate content is that the site is flagged by Google as duplicated, BUT so is most of the news sites because they all get their content from the same place. News get sold by big companies in the US (no I'm not from the US so cant say specifically where it is from) and they usually have disclaimers with these content pieces that you can't change the headline and story significantly, so we do have quite a few journalists that rewrites the news stories, they try and keep it as close to the original as possible but they still change it to fit our targeted audience - where my second point comes in. Even though the content has been duplicated, our site is more relevant to what our users are searching for than the bigger news related websites in the world because we do hyper local everything. news, jobs, property etc. All we need to do is get off this duplicate content issue, in general we rewrite the content completely to be unique if a site has duplication problems, but on a media site, im a little bit lost. Because I haven't had something like this before. Would like to hear some thoughts on this. Thanks,
Intermediate & Advanced SEO | | 360eight-SEO
Chris Captivate0 -
How to deal with category browsing and duplicate content
On an ecommerce site there are typically a lot of pages that may appear to be duplications due to category browse results where the only difference may be the sorting by price or number of products per page. How best to deal with this? Add nofollow to the sorting links? Set canonical values that ignore these variables? Set cononical values that match the category home page? Is this even a possible problem with Panda or spiders in general?
Intermediate & Advanced SEO | | IanTheScot0