Yup, an "empty" robots.txt file should look like this:
User-Agent: *
Disallow:
Welcome to the Q&A Forum
Browse the forum for helpful insights and fresh discussions about all things SEO.
Job Title: Head of SEO
Company: Yoast
Yup, an "empty" robots.txt file should look like this:
User-Agent: *
Disallow:
Looks like the site is missing a robots.txt file; that could cause some issues (hit tip, Tim Holmes). I'd get one in place, just for housekeeping - search engines have been known behave oddly sometimes when they can't find a robots.txt file.
I'm struggling to see anything obvious that would cause this behaviour, but equally, I wouldn't be too concerned about it - the page is clearly indexed and discoverable, so this may just be a quirk.
This sounds like the best approach. Typically, most crawlers won't save or interact with cookies, so Google etc will just see the site they've requested.
Hi Sam,
Apologies for the slow response. Your question slipped through the net.
This is an interesting case!
In an ideal world, you'd specify the relationship between all of those pages, in each direction. That's 150+ tags per page, though, which is going to cause some headaches. Even if you shift the tagging to an XML sitemap, that's a _lot _of weight and processing.
Anecdotally, I know that hreflang tagging starts to break at those kinds of scales (even more so on large sites, at that kind of scale, when the resultant XML sitemaps can reach the size of many gigabytes, or when Google is crawling faster than it's processing the hreflang directives), and so tagging everything isn't going to be a viable approach.
I'd suggest picking out and implementing hreflang for _only _the primary combinations*, as you suggest, and reducing the site-wide mapping to the primary variant in each case.
For the atypical variants, I think that you have a few options:
Use meta robots (or x-robots) tags to set noindex attributes. This will keep them out of the index, but doesn't guarantee that you're effectively managing/consolidating value across near duplicates - you may be quietly harming performance without realising it, as those pages represent points of crawl and value wastage/leakage.
Use robots.txt to prevent Google from accessing the atypical variants. That won't necessarily stop them from showing up in search results, though, and isn't without problems - you risk you creating crawl dead-ends, writing off the value of any inbound links to those pages, and other issues.
You use canonical URLs on all of the atypical variations, referencing the nearest primary version, to attempt to consolidate value/relevance etc. However, that risks the wrong language/content showing up in the wrong country, as you're explicitly _un_optimising the location component.
I think that #1 is the best approach, as per your thinking. That removes the requirement to do anything clever or manipulative with hreflang tagging, and fits neatly with the idea that the atypical combinations aren't useful/valuable enough to warrant their own identities - Google should be smart enough to fall back to the nearest 'generic' equivalent.
I'd also take care to set up your Google Search Console country targeting for each country-level folder, to reduce the risk of people ending up in the wrong sections.
Ah, a very interesting question!
I'd not be too concerned; you're loading the content in through a data attribute rather than directly as text. However, there are definitely a few options you could consider:
I'd be keen to explore #2 - feels like you should be able to achieve the effect you're after with an image which isn't ridiculously huge.
Yup, that makes sense to me.
It's a bit of a grey area and an unusual case, but I think that this approach makes more sense - otherwise you're actively trying to stop people who aren't in the 'correct' country for a phone number to find/access that page.
Hey Thomas,
Did you have a chance to think about this?
Hmm. This is potentially a little complex.
It sounds like what you're describing _isn't _a case for hreflang / internationalisation.
If I understand correctly, you have one website, which has information about (things in) different regions, but your website isn't explicitly targeting users in different regions?
Where does language come into this?
What happens if I'm a user in Spain, who's Google'd a phone number from Germany? What does the current experience look like (what pages might they see, in what language), and what's the optimal experience look like?
Ah, all very helpful, thanks!
Some interesting bits to pull apart here, I think:
hreflang tagging is unlikely to improve your rankings for any given language/territory/page/keyword; rather, it's more likely to prevent the wrong content from showing up in the wrong territory. Make sense?
I'd try to manage expectations around "restoring" rankings. What if your performance dropped because of strong competitor activity, changing consumer behaviour, or other factors? Whilst your international setup is part of that picture, the reality is much more complex. I'd try to shift the conversation away from working out ways to / waiting for your rankings to "restore" from a magic bullet fix, and start talking about the many strategies and tactics you might deploy to improve rankings gradually moving forwards.
I'd be _really _nervous of outsourced link building. If you're handing money to a third party to get you links, you're only a small amounts of semantics away from buying links outright. What are the doing, exactly? Are you producing exceptionally high quality, useful information and resources, which they're helping to shine a spotlight on - or are you paying them a fee for them to magically acquire links? It feels like, of all the possible risks and causes of your problems, this is the area I'd want to scrutinise the most; and in the meantime, start looking into ways in which your pages can earn links without having to pay a mysterious third party!
Yikes, sorry, I didn't see your responses. Sorry for the delayed response!
Your understanding is correct. It doesn't matter whether the content is duplicated within a single domain, across multiple domains, or multiple subdomains - if it's duplicate but targeting different languages and/or territories, then you need to use hreflang tags to manage that. If the pages are the same in all but phone number (where the phone number is targeting a different territory), that's exactly the kind of situation where you should be using hreflang tags.
Separately, all pages should use canonical tags which reference the correct version of themselves (note, not the alternate/hreflang equivalent), as way of declaring that they're the correct version of that localised page. The hreflang tags take care of the relationship between the multilingual versions without the need to try and do anything clever with pointing canonical references between them.
The blog question is more complex. Are your audiences/products/services different enough in those markets to warrant a distinct content strategy? Does that make sense, strategically?
If the content is different, then you don't want hreflang tags on those individual pages, because you've no equivalent resource/page to reference. Challenging!
So, in a nutshell:
Use hreflang tags everywhere where you've very-similar/duplicate pages which only differ in terms of which territory they serve - regardless of which domain they're on.
Use canonical tags on all pages/templates, to reference the correct version of that page/template - but this shouldn't try and do anything clever in terms of its relationship with the hreflang tags.
Where content isn't the same between multiple sites/pages, don't use hreflang tags (but do still use canonical tags).
Make sense?
Looks like the site is missing a robots.txt file; that could cause some issues (hit tip, Tim Holmes). I'd get one in place, just for housekeeping - search engines have been known behave oddly sometimes when they can't find a robots.txt file.
I'm struggling to see anything obvious that would cause this behaviour, but equally, I wouldn't be too concerned about it - the page is clearly indexed and discoverable, so this may just be a quirk.
Hey Alessia,
Just wanted to ping you a note to let you know that I'm looking at and thinking about this, and intending to get back to you tomorrow with a structured answer and some direction.
This sounds like the best approach. Typically, most crawlers won't save or interact with cookies, so Google etc will just see the site they've requested.
Yup, an "empty" robots.txt file should look like this:
User-Agent: *
Disallow:
Ah, a very interesting question!
I'd not be too concerned; you're loading the content in through a data attribute rather than directly as text. However, there are definitely a few options you could consider:
I'd be keen to explore #2 - feels like you should be able to achieve the effect you're after with an image which isn't ridiculously huge.
4/26/2017 We all have horror stories of migration projects gone wrong. Things get missed, priorities change, and performance is impacted. Can we do better if we improve our definitions?
11/14/2011 Howdy mozzers! I'm proud to say that I attended the recent (and frankly amazing) SearchLove London conference, and that I came away with some really exciting ideas and plans. One of the key focuses of the conference was on working out how we can change our behaviour and processes to make things easier to 'get stuff done' - particularly with big clients or organisation...
Digital strategist. Marketing technologist. Full stack developer.
Looks like your connection to Moz was lost, please wait while we try to reconnect.