Block subdomain directory in robots.txt
-
Instead of block an entire sub-domain (fr.sitegeek.com) with robots.txt, we like to block one directory (fr.sitegeek.com/blog).
'fr.sitegeek.com/blog' and 'wwww.sitegeek.com/blog' contain the same articles in one language only labels are changed for 'fr' version and we suppose that duplicate content cause problem for SEO. We would like to crawl and index 'www.sitegee.com/blog' articles not 'fr.sitegeek.com/blog'.so, suggest us how to block single sub-domain directory (fr.sitegeek.com/blog) with robot.txt?
This is only for blog directory of 'fr' version even all other directories or pages would be crawled and indexed for 'fr' version.
Thanks,
Rajiv -
Hi Rajiv,
If you post the same content on both FR & EN version:
-
if both are written in English (or mainly written in English) - best option would be to have a canonical pointing to the EN version
Example: https://fr.sitegeek.com/category/shared-hosting - most of the content is in English - so in this case I would point a canonical to the EN version -
if the FR version is in French - you can use the HREF lang tag - you can use this tool to generate them, check here for common mistakes and doublecheck the final result here.
Just some remarks:
-
partially translated pages offer little value for users - so it's best to fully translate them or only refer to the EN version
-
I have a strong impression that the EN version was machine translated to the FR version. (ex. French sites never use 'Maison' to link to the Homepage - they use Acceuil). Be aware that Google is perfectly capable to detect auto-translated pages and they consider it to be bad practice (check this video of Matt Cutts - starts at 1:50). So you might want to invest in proper translation or proofreading by a native French speaker.
rgds
Dirk
-
-
Thanks Dirk,
we will fix the issue as you suggested.
Could you explain more on duplicate content if we post articles on both 'FR' and 'EN' versions?
Thanks,
Rajiv
-
Just to add to this, if your subdomain has more than /blog on it, and you only want to block /blog, change Dirk's robots.txt to:
User-agent: Googlebot
Disallow: /blogor to block more than just google:
User-agent:*
Disallow: /blog -
The easiest way would be to put the robots.txt in the root of your subdomain & block access for search engines
User-agent: Googlebot
Disallow: /If you subdomain & the main domain are sharing the same root - this option is not possible. In that case, rather than working with robots.txt I would add a canonical on each page pointing to the main domain, or block all pages in the header (if this is technically possible)
You could also check these similar questions: http://moz.com/community/q/block-an-entire-subdomain-with-robots-txt and http://moz.com/community/q/blocking-subdomain-from-google-crawl-and-index - but the answers given are the same as the options above.
Apart from the technical question, qiven the fact that only the labels are translated, these pages make little sense for human users. It would probably make more sense to link to the normal (English) version of the blog (and put (en Anglais) next to the link.
rgds,
Dirk
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google blocks certain articles on my website ... !
Hello I have a website with more than 350 unique articles, Most of them are crawled by Google without a problem, but I find out certain articles are never indexed by Google. I tried to rewrite them, adding fresh images and optimizing them but it gets me nowhere. Lately, I rewrite an article of those and tried to (fetch and render) through Google Webmasters, and I found this result, can you tell me if there is anything to do to fix that? BMVh4
Intermediate & Advanced SEO | | Evcindex0 -
Paid for Yahoo Directory months ago, yet does not display in OSE
Hi Mozzers, How long does it normally take for your yahoo directory link to show up in Open Site Explorer? We signed up and got our link on dir.yahoo early January and still see nothing in OSE. Should I call them up and ask them to fetch the page for Google?
Intermediate & Advanced SEO | | Travis-W
Get a refund? Helps!
Thanks!0 -
How to do geo targeting for domain and sub directories in Webmaster tool?
Hello All, How can i do geo targeting in multiple countries on my ** root domain and sub **directories in Webmaster tool. My domain is "abc.com" and i want to target three countries UAE , Kuwait, Saudi arabia. So, Can i assign geo targeting in Webmaster tool , Root domain for UAE country and make other two sub directories for Kuwait and saudi ? abc.com - UAE (geo targeting) abc.com/kw - Kuwait (geo targeting) abc.com/sa - Saudi (geo targeting) Or Root doamain should be not assign for any country and Make three sub directories for UAE, Kuwait , and saudi and targeting them there geo locations. abc.com - Unlisted (geo targeting) abc.com/uae/ - UAE (geo targeting) abc.com/kw/ - Kuwait (geo targeting) abc.com/sa/ - Saudi (geo targeting)
Intermediate & Advanced SEO | | rahul110 -
Why should I add URL parameters where Meta Robots NOINDEX available?
Today, I have checked Bing webmaster tools and come to know about Ignore URL parameters. Bing webmaster tools shows me certain parameters for URLs where I have added META Robots with NOINDEX FOLLOW syntax. I can see canopy_search_fabric parameter in suggested section. It's due to following kind or URLs. http://www.vistastores.com/patio-umbrellas?canopy_fabric_search=1728 http://www.vistastores.com/patio-umbrellas?canopy_fabric_search=1729 http://www.vistastores.com/patio-umbrellas?canopy_fabric_search=1730 http://www.vistastores.com/patio-umbrellas?canopy_fabric_search=2239 But, I have added META Robots NOINDEX Follow to disallow crawling. So, why should it happen?
Intermediate & Advanced SEO | | CommercePundit0 -
Will links to a subdomain help it rank?
I have an affiliate subdomain on a larger company's domain. (For example I have: www.victor.company.com on www.company.com). Would working to attain backlinks to the subdomain help it rank or will I just be putting forth my effort and helping the domain rank?
Intermediate & Advanced SEO | | VictorVC0 -
URL Structure for Directory Site
We have a directory that we're building and we're not sure if we should try to make each page an extension of the root domain or utilize sub-directories as users narrow down their selection. What is the best practice here for maximizing your SERP authority? Choice #1 - Hyphenated Architecture (no sub-folders): State Page /state/ City Page /city-state/ Business Page /business-city-state/
Intermediate & Advanced SEO | | knowyourbank
4) Location Page /locationname-city-state/ or.... Choice #2 - Using sub-folders on drill down: State Page /state/ City Page /state/city Business Page /state/city/business/
4) Location Page /locationname-city-state/ Again, just to clarify, I need help in determining what the best methodology is for achieving the greatest SEO benefits. Just by looking it would seem that choice #1 would work better because the URL's are very clear and SEF. But, at the same time it may be less intuitive for search. I'm not sure. What do you think?0 -
Block all but one URL in a directory using robots.txt?
Is it possible to block all but one URL with robots.txt? for example domain.com/subfolder/example.html, if we block the /subfolder/ directory we want all URLs except for the exact match url domain.com/subfolder to be blocked.
Intermediate & Advanced SEO | | nicole.healthline0 -
Duplicate Content On A Subdomain
Hi, We have a client who is currently close to completing a site specifically aimed at the UK market (they're doing this in-house so we've had no say in how it will work). The site will almost be a duplicate (in terms of content, targeted keywords etc.) of a section of the main site (that sits on the root domain) - the main site is targeted toward the US. The only difference will be certain spellings and currency type. If this new UK site were to sit on a sub domain of the main site, which is a .com, will this cause duplicate content issues? I know that there wouldn't be an issue if the new site were to be on a separate .co.uk domain (according to Matt Cutts), but it looks like the client wants it to be on a sub domain. Any help/advice would be greatly appreciated.
Intermediate & Advanced SEO | | jasarrow0