Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robots.txt in subfolders and hreflang issues
-
A client recently rolled out their UK business to the US. They decided to deploy with 2 WordPress installations:
UK site - https://www.clientname.com/uk/ - robots.txt location: UK site - https://www.clientname.com/uk/robots.txt
US site - https://www.clientname.com/us/ - robots.txt location: UK site - https://www.clientname.com/us/robots.txtWe've had various issues with /us/ pages being indexed in Google UK, and /uk/ pages being indexed in Google US.
They have the following hreflang tags across all pages:
We changed the x-default page to .com 2 weeks ago (we've tried both /uk/ and /us/ previously).
Search Console says there are no hreflang tags at all.
Additionally, we have a robots.txt file on each site which has a link to the corresponding sitemap files, but when viewing the robots.txt tester on Search Console, each property shows the robots.txt file for https://www.clientname.com only, even though when you actually navigate to this URL (https://www.clientname.com/robots.txt) you’ll get redirected to either https://www.clientname.com/uk/robots.txt or https://www.clientname.com/us/robots.txt depending on your location.
Any suggestions how we can remove UK listings from Google US and vice versa?
-
Hi there!
Ok, it is difficult to know all the ins and outs without looking at the site, but the immediate issue is that your robots.txt setup is incorrect. robots.txt files should be one per subdomain, and cannot exist inside sub-folders:
A **
robots.txt**file is a file at the root of your site that indicates those parts of your site you don’t want accessed by search engine crawlersFrom Google's page here: https://support.google.com/webmasters/answer/6062608?hl=en
You shouldn't be blocking Google from either site, and attempting to do so may be the problem with why your hreflang directives are not being detected. You should move to having a single robots.txt file located at https://www.clientname.com/robots.txt, with a link to a single sitemap index file. That sitemap index file should then link to each of your two UK & US sitemap files.
You should ensure you have hreflang directives for every page. Hopefully after these changes you will see things start to get better. Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Subdomain or subfolder?
Hello, We are working on a new site. The idea of the site is to have an ecommerce shop, but the homepage will be a content page, basically a blog page.
Technical SEO | | pinder325
My developer wants to have the blog (home) page on a subdomain, so blog.example.com, because it will be easier to make a nice content page this way, and the the rest of the site will just be on the root domain (example.com). I'm just worried that this will be bad for our SEO efforts. I've always thought it was better to use a sub folder rather than a subdomain. If we get links to the content on the subdomain, will the link juice flow to the shop, on the root domain? What are your thoughts?0 -
Duplicate titles from hreflang variations
Hi, I am working on a large global site which has around 9 different language variations. We have setup the hreflang tags and referenced the corresponding content as follows: (We have not implemented a version X-default reference, as we felt it was not necessary) Using DeepCrawl and Search Console, we can see that these language variations are causing duplicate title issues. Many of them. My assumption was that the hreflang would have alleviated this issue and informed Google what is going on, however i wanted to see if anyone has any experience with this kind of thing before. It would be good to understand what the best practice approach is to deal with the problem. Is it even an issue at all, or just the tools being over-sensitive? Thank you in advance.
Technical SEO | | NickG-1230 -
Google serp pagination issue
We are a local real estate company and have landing pages for different communities and cities around our area that display the most recent listings. For example: www.mysite.com/wa/tumwater is our landing page for the city of Tumwater homes for sale. Google has indexed most of our landing pages, but for whatever reason they are displaying either page 2, 3, 4 etc... instead of page 1. Our Roy, WA landing page is another example. www.mysite.com/wa/roy has recently been showing up on page 1 of Google for "Roy WA homes for sale", but now we are much further down and www.mysite.com/wa/roy?start=80 (page 5) is the only page in the serps. (coincidentally we no longer have 5 pages worth of listings for this city, so this link now redirects to www.mysite.com/wa/roy.) We haven't made any major recent changes to the site. Any help would be much appreciated! *You can see what my site is in the attached image... I just don't want this post to show up when someone google's the actual name of the business 🙂 nTTrSMx.jpg C4mhfgh.jpg
Technical SEO | | summithomes0 -
Is there a limit to how many URLs you can put in a robots.txt file?
We have a site that has way too many urls caused by our crawlable faceted navigation. We are trying to purge 90% of our urls from the indexes. We put no index tags on the url combinations that we do no want indexed anymore, but it is taking google way too long to find the no index tags. Meanwhile we are getting hit with excessive url warnings and have been it by Panda. Would it help speed the process of purging urls if we added the urls to the robots.txt file? Could this cause any issues for us? Could it have the opposite effect and block the crawler from finding the urls, but not purge them from the index? The list could be in excess of 100MM urls.
Technical SEO | | kcb81780 -
Simple 301 redirect a subfolder to another subfolder
Hi, I have a number of sub-folders that I have to move, each of which contains a number of files. subfolder A has files a, b & c subfolder B has files d, e & f
Technical SEO | | aactive
subfolder C has files g, h & i A, B & C folders need to be X, Y & Z Will the following work? RewriteRule ^subfolder-A/* http://www.domain.com/subfolder-X/ [R=301,L]
RewriteRule ^subfolder-B/* http://www.domain.com/subfolder-Y/ [R=301,L]
RewriteRule ^subfolder-C/* http://www.domain.com/subfolder-Z/ [R=301,L] will this result in visitors to http://www.domain.com/subfolder-B/f.html being redirected to http://www.domain.com/subfolder-Y/f.html? All on the same domain. in reality we are talking hundreds of sub folders and thousands of files so we don't want to have to reference every file individually in the htaccess. Thanks0 -
No indexing url including query string with Robots txt
Dear all, how can I block url/pages with query strings like page.html?dir=asc&order=name with robots txt? Thanks!
Technical SEO | | HMK-NL0 -
Allow or Disallow First in Robots.txt
If I want to override a Disallow directive in robots.txt with an Allow command, do I have the Allow command before or after the Disallow command? example: Allow: /models/ford///page* Disallow: /models////page
Technical SEO | | irvingw0 -
Robots.txt file getting a 500 error - is this a problem?
Hello all! While doing some routine health checks on a few of our client sites, I spotted that a new client of ours - who's website was not designed built by us - is returning a 500 internal server error when I try to look at the robots.txt file. As we don't host / maintain their site, I would have to go through their head office to get this changed, which isn't a problem but I just wanted to check whether this error will actually be having a negative effect on their site / whether there's a benefit to getting this changed? Thanks in advance!
Technical SEO | | themegroup0