Block subdomain directory in robots.txt
-
Instead of block an entire sub-domain (fr.sitegeek.com) with robots.txt, we like to block one directory (fr.sitegeek.com/blog).
'fr.sitegeek.com/blog' and 'wwww.sitegeek.com/blog' contain the same articles in one language only labels are changed for 'fr' version and we suppose that duplicate content cause problem for SEO. We would like to crawl and index 'www.sitegee.com/blog' articles not 'fr.sitegeek.com/blog'.so, suggest us how to block single sub-domain directory (fr.sitegeek.com/blog) with robot.txt?
This is only for blog directory of 'fr' version even all other directories or pages would be crawled and indexed for 'fr' version.
Thanks,
Rajiv -
Hi Rajiv,
If you post the same content on both FR & EN version:
-
if both are written in English (or mainly written in English) - best option would be to have a canonical pointing to the EN version
Example: https://fr.sitegeek.com/category/shared-hosting - most of the content is in English - so in this case I would point a canonical to the EN version -
if the FR version is in French - you can use the HREF lang tag - you can use this tool to generate them, check here for common mistakes and doublecheck the final result here.
Just some remarks:
-
partially translated pages offer little value for users - so it's best to fully translate them or only refer to the EN version
-
I have a strong impression that the EN version was machine translated to the FR version. (ex. French sites never use 'Maison' to link to the Homepage - they use Acceuil). Be aware that Google is perfectly capable to detect auto-translated pages and they consider it to be bad practice (check this video of Matt Cutts - starts at 1:50). So you might want to invest in proper translation or proofreading by a native French speaker.
rgds
Dirk
-
-
Thanks Dirk,
we will fix the issue as you suggested.
Could you explain more on duplicate content if we post articles on both 'FR' and 'EN' versions?
Thanks,
Rajiv
-
Just to add to this, if your subdomain has more than /blog on it, and you only want to block /blog, change Dirk's robots.txt to:
User-agent: Googlebot
Disallow: /blogor to block more than just google:
User-agent:*
Disallow: /blog -
The easiest way would be to put the robots.txt in the root of your subdomain & block access for search engines
User-agent: Googlebot
Disallow: /If you subdomain & the main domain are sharing the same root - this option is not possible. In that case, rather than working with robots.txt I would add a canonical on each page pointing to the main domain, or block all pages in the header (if this is technically possible)
You could also check these similar questions: http://moz.com/community/q/block-an-entire-subdomain-with-robots-txt and http://moz.com/community/q/blocking-subdomain-from-google-crawl-and-index - but the answers given are the same as the options above.
Apart from the technical question, qiven the fact that only the labels are translated, these pages make little sense for human users. It would probably make more sense to link to the normal (English) version of the blog (and put (en Anglais) next to the link.
rgds,
Dirk
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can my affiliate subdomain hurt in any way?
Hello everyone, My main website is: http://www.virtualsheetmusic.com Whereas the above site's related "affiliate" website is located on the subdomain below: http://affiliates.virtualsheetmusic.com I was wondering if having that "affiliate section" on a subdomain could affect the main website negatively in some way... or would be better to put it in a sub-folder on the main website, or even on a totally different domain. Thanks in advance for any advice!
Intermediate & Advanced SEO | | fablau0 -
How to rank if you are an aggregator or a directory of resource?
Most of the SEO suggestions (great quality content, long form content, engagement rate/time on the page, authority inbound links ) apply to content oriented site. But what should you do if you are an aggregator or a resource directory? You aim is to send the user faster to other site they are looking for or provide ranking about the resources. In fact at a very basic level you are competing for search engine traffic because they are doing same things. You may have done a hand crafted, human created resource that is better than what algorithms are showing. And your site likely to have lot more outgoing links than content. You know you are better (or getting better) since repeat visitors keep coming back. So in these days of Search engines, what a resource directory or aggregator site do to rank? Because even directories need first time visitors till they start coming back again.
Intermediate & Advanced SEO | | Maayboli0 -
Subdomains + SEO
Hi everyone, So a little background - my company launched a new website (http://www.everyaction.com). The homepage is currently hosted on an amazon s3 bucket while the blog and landing pages are hosted within Hubspot. My question is - is that going to end up hurting our SEO in the long run? I've seen a much slower uptick in search engine traffic than I'm used to seeing when launching new sites and I'm wondering if that's because people are sharing the blog.everyaction.com url on social (which then wouldn't benefit just everyaction.com?) Anyways, a little help on what I should be considering when it comes to subdomains would be very helpful. Thanks, Devon
Intermediate & Advanced SEO | | EveryActionHQ0 -
Block a country, will affect my ranking?
Dear Mozzers, I intend to block some certain countries from viewing my website (including proxy), will it affect my Google ranking? Thank you for your help. BR/Tran
Intermediate & Advanced SEO | | SteveTran20130 -
Robot.txt error
I currently have this under my robot txt file: User-agent: *
Intermediate & Advanced SEO | | Rubix
Disallow: /authenticated/
Disallow: /css/
Disallow: /images/
Disallow: /js/
Disallow: /PayPal/
Disallow: /Reporting/
Disallow: /RegistrationComplete.aspx WebMatrix 2.0 On webmaster > Health Check > Blocked URL I copy and paste above code then click on Test, everything looks ok but then logout and log back in then I see below code under Blocked URL: User-agent: * Disallow: / WebMatrix 2.0 Currently, Google doesn't index my domain and i don't understand why this happening. Any ideas? Thanks Seda0 -
Subdomain blog vs. subfolder blog in 2013.
Having read this ( http://www.seomoz.org/q/blog-on-a-subdomain-vs-subfolder ) & countless of blog posts on never to put your blog on a domain because a subdomain is treated as a different site & your blog traffic won't help with your main sites authority. I've always pushed for subfolder blogs. However I've been seeing a lot of blogs now and days saying that Google is now treating subdomains as the same site as your main site. http://www.brafton.com/news/subdomains-vs-subdirectories-for-seo-no-serp-benefits-for-subdomains-anymore http://webmasters.stackexchange.com/questions/34173/subdomains-vs-subdirectory-status-as-of-2012/34366#34366 ETC... What does everyone think? Is it acceptable to have a blog in a subdomain in 2013? Thanks!
Intermediate & Advanced SEO | | DCochrane0 -
Subdirectory vs. Subdomain
I work for a large franchise organization that is weighing the pros and cons of using subdomains versus subdirectories for our franchisee locations. What are the pros and cons of each approach?
Intermediate & Advanced SEO | | Glassdoctordfw0 -
Should I robots block this directory?
There's about 43k pages indexed in this directory, and while helpful to end users, I don't see it being a great source of unique content for search engines. Would you robots block or meta noindex nofollow these pages in the /blissindex/ directory? ie. http://www.careerbliss.com/blissindex/petsmart-index-980481/ http://www.careerbliss.com/blissindex/att-index-1043730/ http://www.careerbliss.com/blissindex/facebook-index-996632/
Intermediate & Advanced SEO | | CareerBliss0