Should I robots block site directories with primarily duplicate content?
-
Our site, CareerBliss.com, primarily offers unique content in the form of company reviews and exclusive salary information. As a means of driving revenue, we also have a lot of job listings in ouir /jobs/ directory, as well as educational resources (/career-tools/education/) in our. The bulk of this information are feeds, which exist on other websites (duplicate). Does it make sense to go ahead and robots block these portions of our site?
My thinking is in doing so, it will help reallocate our site authority helping the /salary/ and /company-reviews/ pages rank higher, and this is where most of the people are finding our site via search anyways.
ie.
http://www.careerbliss.com/jobs/cisco-systems-jobs-812156/
http://www.careerbliss.com/jobs/jobs-near-you/?l=irvine%2c+ca&landing=true
http://www.careerbliss.com/career-tools/education/education-teaching-category-5/
-
Personally I'm a fan of don't block them via robots. Because if you do and you don't remove the URLs from index (remove the directory, Matt said here ), they still will be indexed.
There was a good posting time ago at seomoz blog and you will see, your pages won't leave the index.
I think, you should set these pages "noindex, follow". It worked good for me.
Patrick
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate content across different domains
Hi Guys, Looking for some advice regarding duplicate content across different domains. I have reviewed some previous Q&A on this topic e.g. https://moz.com/community/q/two-different-domains-exact-same-content but just want to confirm if I'm missing anything. Basically, we have a client which has 1 site (call this site A) which has solids rankings. They have decided to build a new site (site B), which contains 50% duplicate pages and content from site A. Our recommendation to them was to make the content on site B as unique as possible but they want to launch asap, so not enough time. They will eventually transfer over to unique content on the website but in the short-term, it will be duplicate content. John Mueller from Google has said several times that there is no duplicate content penalty. So assuming this is correct site A should be fine, no ranking losses. Any disagree with this? Assuming we don't want to leave this to chance or assume John Mueller is correct would the next best thing to do is setup rel canonical tags between site A and site B on the pages with duplicate content? Then once we have unique content ready, execute that content on the site and remove the canonical tags. Any suggestions or advice would be very much appreciated! Cheers, Chris
Intermediate & Advanced SEO | | jayoliverwright0 -
Block in robots.txt instead of using canonical?
When I use a canonical tag for pages that are variations of the same page, it basically means that I don't want Google to index this page. But at the same time, spiders will go ahead and crawl the page. Isn't this a waste of my crawl budget? Wouldn't it be better to just disallow the page in robots.txt and let Google focus on crawling the pages that I do want indexed? In other words, why should I ever use rel=canonical as opposed to simply disallowing in robots.txt?
Intermediate & Advanced SEO | | YairSpolter0 -
Joomla Duplicate Page content fix for mailto component?
Hi, I am currently working on my site and have the following duplicate page content issues: My Uni Essays http://www.myuniessays.co.uk/component/mailto/?tmpl=component&template=it_university&link=2631849e33 My Uni Essays http://www.myuniessays.co.uk/component/mailto/?tmpl=component&template=it_university&link=2edd30f8c6 This happens 15 times Any ideas on how to fix this please? Thank you
Intermediate & Advanced SEO | | grays01800 -
Duplicate Page Content / Titles Help
Hi guys, My SEOmoz crawl diagnostics throw up thousands of Dup Page Content / Title errors which are mostly from the forum attached to my website. In-particular it's the forum user's profiles that are causing the issue, below is a sample of the URLs that are being penalised: http://www.mywebsite.com/subfolder/myforum/pop_profile.asp?mode=display&id=1308 I thought that by adding - http://www.mywebsite.com/subfolder/myforum/pop_profile.asp to my robots.txt file under 'Ignore' would cause the bots to overlook the thousands of profile pages but the latest SEOmoz crawl still picks them up. My question is, how can I get the bots to ignore these profile pages (they don't contain any useful content) and how much will this be affecting my rankings (bearing in mind I have thousands of errors for dup content and dup page titles). Thanks guys Gareth
Intermediate & Advanced SEO | | gaz33420 -
How do I fix the error duplicate page content and duplicate page title?
On my site www.millsheating.co.uk I have the error message as per the question title. The conflict is coming from these two pages which are effectively the same page: www.millsheating.co.uk www.millsheating.co.uk/index I have added a htaccess file to the root folder as I thought (hoped) it would fix the problem but I doesn't appear to have done so. this is the content of the htaccess file: Options +FollowSymLinks RewriteEngine On RewriteCond %{HTTP_HOST} ^millsheating.co.uk RewriteRule (.*) http://www.millsheating.co.uk/$1 [R=301,L] RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.html\ HTTP/ RewriteRule ^index\.html$ http://www.millsheating.co.uk/ [R=301,L] AddType x-mapp-php5 .php
Intermediate & Advanced SEO | | JasonHegarty0 -
How are they avoiding duplicate content?
One of the largest stores in USA for soccer runs a number of whitelabel sites for major partners such as Fox and ESPN. However, the effect of this is that they are creating duplicate content for their products (and even the overall site structure is very similar). Take a look at: http://www.worldsoccershop.com/23147.html http://www.foxsoccershop.com/23147.html http://www.soccernetstore.com/23147.html You can see that practically everything is the same including: product URL product title product description My question is, why is Google not classing this as duplicate content? Have they coded for it in a certain way or is there something I'm missing which is helping them achieve rankings for all sites?
Intermediate & Advanced SEO | | ukss19840 -
10,000 New Pages of New Content - Should I Block in Robots.txt?
I'm almost ready to launch a redesign of a client's website. The new site has over 10,000 new product pages, which contain unique product descriptions, but do feature some similar text to other products throughout the site. An example of the page similarities would be the following two products: Brown leather 2 seat sofa Brown leather 4 seat corner sofa Obviously, the products are different, but the pages feature very similar terms and phrases. I'm worried that the Panda update will mean that these pages are sand-boxed and/or penalised. Would you block the new pages? Add them gradually? What would you recommend in this situation?
Intermediate & Advanced SEO | | cmaddison0 -
Duplicate block of text on category listings
Fellows, We are deciding whether we should include our category description on all pages of the category listing - for example; page 1, page 2, page 3... The category description is currently a few paragraphs of text that sits on page 1 of the category only at present. It also includes an image (linked to a large version of it) with appropriate ALT text. Would we benefit from including this introductory text on the rest of the pages in the category? Or should we leave it on the first page only? Would it flag up duplicate signals? Ideas please! Thanks.
Intermediate & Advanced SEO | | Peter2640