Welcome to the Q&A Forum

Tom-Anthony

Are those main category pages (like /collections/living-room-furniture) or are they different?

Tom-Anthony

Hi Dana,

Expires headers and other caching headers can help improve site performance (as you said), and that will be a good thing for SEO. There is no reason to be concerned - they are common headers and there isn't much they could do to have any negative impact on SEO.

Good luck!

Tom

Tom-Anthony

Hi there!

Ok, it is difficult to know all the ins and outs without looking at the site, but the immediate issue is that your robots.txt setup is incorrect. robots.txt files should be one per subdomain, and cannot exist inside sub-folders:

A **robots.txt **file is a file at the root of your site that indicates those parts of your site you don’t want accessed by search engine crawlers

From Google's page here: https://support.google.com/webmasters/answer/6062608?hl=en

You shouldn't be blocking Google from either site, and attempting to do so may be the problem with why your hreflang directives are not being detected. You should move to having a single robots.txt file located at https://www.clientname.com/robots.txt, with a link to a single sitemap index file. That sitemap index file should then link to each of your two UK & US sitemap files.

You should ensure you have hreflang directives for every page. Hopefully after these changes you will see things start to get better. Good luck!

Tom-Anthony

Hey NeatIT!

I see you have a working solution there. Did you have a specific question about the setup?

I did notice that your setup cane sometimes result in chaining 301 redirects, which is one area for possible improvement.

Let me know how we can help!

Tom-Anthony

If you have many URLs from the old site in the index that are all in the same directory (or a handful of directories) you can quickly and easily remove whole directories of URLs from the index via Google Search Console. We have found it to work very quickly.

Go into Search Console, selected ‘Remove URLs’ under ‘Google Index’ in the left hand menu.
Add the page or folder you want to remove, and click next. If you add the homepage, that's the same as all pages on the site. If you add a folder you'll get three options under the ‘Reason’ drop down.

One of those options is ‘Remove directory’. Select that.

Tom-Anthony

Ok, cool. To reiterate - with escaped_fragment you are just serving the same content in a tweaked format and Google recommend it rather than frown upon it. Good to be sure though.

See you at SearchLove!

Tom-Anthony

Hi,

I am not sure I follow your concerns around serving an alternative version of the page to search engines - is that concern based on concerns it will be frowned upon or technical concerns?

Using the escaped_fragment methodology would work for your purposes, and would be the best approach. If you have technical concerns around creating the HTML snapshots you could look at a service such as https://prerender.io/ which helps manage this process.

If that doesn't answer your question, please give more information so we can understand more specifically where you concerns are.

Tom-Anthony

It seems like the issue is a bug in the way Google handle data from your site ('null' being computer speak for 'empty', and often appearing after buggy handling of data). However, it seems that the indication from Umar is correct, and that this buggy data handling is likely prompted by crawling issue so that is the best place to start.

Tom-Anthony

Is it possible for you to give a clearer description of the categories? You say they are different products but that one is a second category of the other

Does the page you want to rank show up for any other searches? In your analytics are you getting any traffic from Google to that page?

Tom-Anthony

So, there are lots of 'ifs' here, but the primary problem I see with your plan is that the CDN will return the content to Googlebot without the request hitting your server so you won't have the option to serve different headers to Googlebot.

Remember that every page is the main HTML content (which may be static or dynamically generated for every request), and then a whole bunch of other resources (Javascript and CSS files, images, font files etc.). These other resources are typically static and lend themselves far better to being cached.

Are your pages static or dynamic? If they are dynamic then you are possibly not benefitting from them being cached anyway, so you could use the 'vary' header on just these pages, and not on any static resources. This would ensure your static resources are cached by your CDN and give you a lot of the benefit of the CDN, and only the dynamic HTML content is served directly from the server.

If most of your pages are static you could still use this approach, but just without the full benefit of the CDN, which sucks.

Some of the CDNs are already working on this (see http://www.computerworld.com/s/article/9225343/Akamai_eyes_acceleration_boost_for_mobile_content and http://orcaman.blogspot.co.uk/2013/08/cdn-caching-problems-vary-user-agent.html) to try and find better solutions.

I hope some of this helps!

Tom-Anthony

I would say a bit of both. It is fine to repeat your primary keyword phrase several times on the page; the number of times depends upon the amount of content. SEOmoz's On Page tool recommends 4 repetitions. However, you should also try to use some synonyms and secondary target keyword phrases also.

A good resource I saw posted today which might be of interest:

http://lsikeywords.com/

Tom-Anthony

Hi there!

Ok, it is difficult to know all the ins and outs without looking at the site, but the immediate issue is that your robots.txt setup is incorrect. robots.txt files should be one per subdomain, and cannot exist inside sub-folders:

A **robots.txt **file is a file at the root of your site that indicates those parts of your site you don’t want accessed by search engine crawlers

From Google's page here: https://support.google.com/webmasters/answer/6062608?hl=en

You shouldn't be blocking Google from either site, and attempting to do so may be the problem with why your hreflang directives are not being detected. You should move to having a single robots.txt file located at https://www.clientname.com/robots.txt, with a link to a single sitemap index file. That sitemap index file should then link to each of your two UK & US sitemap files.

You should ensure you have hreflang directives for every page. Hopefully after these changes you will see things start to get better. Good luck!

Tom-Anthony

Ok, cool. To reiterate - with escaped_fragment you are just serving the same content in a tweaked format and Google recommend it rather than frown upon it. Good to be sure though.

See you at SearchLove!

Tom-Anthony

Putting aside server load / config issues, and from the pure SEO point of view.

No, you shouldn't have any major issues with that many 301s. However, what you might find is that depending on the size of your site and the frequency of Googlebots visits that some of these pages take a long time (months) to drop out of the index and be replaced by their newer alternatives. This normally isn't cause for alarm.

In some instances you might end up with pages that now have now links to them (as their parent categories were all redirected also) and so seem to get stuck and never get recrawled by Google to update. In a couple of instances I have had success using XML sitemap files that just include these 'blocked' pages (the old URLs still in the index) to prompt Google to recrawl them.

Also there is Google Webmaster Tools feature to 'crawl as Googlebot' which then prompts you to 'submit to index' which you can use to prompt recrawls on a per-page basis (but you have credits here, so should only be for the more important pages).

Best of luck!

Tom-Anthony

Just to chime in on this, albeit maybe a little late now... I had the same thought as I was reading through this with using rel=canonical to point the old pages to the new for now, so the search engines don't have any duplicate content issues until a 301 redirect can take over when the new site is fully launched.

However, depending on your rollout schedule, this would mean that the SERPs would soon be indexing only the new pages. You'd need to ensure that the traffic diverter you are using would handle this. Otherwise you could put the rel=canonical on the new pages for now, which would avoid the duplicate content until you are fully launched. Then you'd remove it and 301 redirect the old pages to the new.

Just something you maybe want to think about! Hopefully your traffic diverter can handle this though.

Tom-Anthony

I will just add that you need to be very careful. Maybe try the ones out that everyone has kindly suggested and then use SEOmoz Campaigns or other tools to check the quality of it.

This Whiteboard Friday post from last week has one of Bing's team pointing out how it is very important that your sitemap is of a high quality (no 404s, 302s or 301s for example), or it could be ignored completely:

http://www.seomoz.org/blog/bings-duane-forrester-on-webmaster-tools-metrics-and-sitemap-quality-thresholds

Tom-Anthony

If you have many URLs from the old site in the index that are all in the same directory (or a handful of directories) you can quickly and easily remove whole directories of URLs from the index via Google Search Console. We have found it to work very quickly.

Go into Search Console, selected ‘Remove URLs’ under ‘Google Index’ in the left hand menu.
Add the page or folder you want to remove, and click next. If you add the homepage, that's the same as all pages on the site. If you add a folder you'll get three options under the ‘Reason’ drop down.

One of those options is ‘Remove directory’. Select that.

Tom-Anthony

Hi David,

Google/Bing etc. have very few problems recognising such characters in the Latin alphabet. It looks like you are mainly concerned with umlauts, which Google handles intelligently. For example...

Google will identify the difference between a search for "Küchen" (kitchens in German) and "Kuchen" (cake in German) and offer up relevant results. This is true in Google US and Google UK, not just localised Googles.
Search suggestions work just fine with these characters, and even with the standardised way of rewriting them when there is no accessible way to type them (for umlauts this is with an e following the letter). For example, in Google.de, type "Kue" and you will be given the suggestion "Küchen".

You are mainly concerned with brands, which muddies the waters a little because many people in English speaking markets won't bother/know how to type the umlauts. However, Google normally handles this well and recognises the intent.

I would recommend you ensure you consistent use the brand name with the foreign characters, as intended. Google/Bing and co. shouldn't have any problems. Which HTML encoding you use is by the by, in my opinion, as long as the characters are rendering correctly.

Tom-Anthony

Hi Panos,

I don't necessarily disagree with Eric's answer, but I wanted to answer from a different point of view. I'm going to assume you really want or need some refresh mechanism built into the page.

In which case I'd agree that a Javascript approach using AJAX is probably a better solution. It will mean that users only need to load the new article headlines, and not the whole page, so the strain on your servers should be reduced. Furthermore, I find it a neater solution all around anyway - you could provide a notice 'new headlines available' that people click to refresh the articles list. This might be the best of both worlds?

Either way, meta refresh isn't as flexible, isn't as clean, and will put more strain on your servers.

Good luck!

-Tom

Tom-Anthony

So, there are lots of 'ifs' here, but the primary problem I see with your plan is that the CDN will return the content to Googlebot without the request hitting your server so you won't have the option to serve different headers to Googlebot.

Remember that every page is the main HTML content (which may be static or dynamically generated for every request), and then a whole bunch of other resources (Javascript and CSS files, images, font files etc.). These other resources are typically static and lend themselves far better to being cached.

Are your pages static or dynamic? If they are dynamic then you are possibly not benefitting from them being cached anyway, so you could use the 'vary' header on just these pages, and not on any static resources. This would ensure your static resources are cached by your CDN and give you a lot of the benefit of the CDN, and only the dynamic HTML content is served directly from the server.

If most of your pages are static you could still use this approach, but just without the full benefit of the CDN, which sucks.

Some of the CDNs are already working on this (see http://www.computerworld.com/s/article/9225343/Akamai_eyes_acceleration_boost_for_mobile_content and http://orcaman.blogspot.co.uk/2013/08/cdn-caching-problems-vary-user-agent.html) to try and find better solutions.

I hope some of this helps!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

Tom-Anthony

@Tom-Anthony

Latest posts made by Tom-Anthony

Best posts made by Tom-Anthony

Blog Posts

What GPT Means for Structured Data – Whiteboard Friday

An Apple Search Engine? – Whiteboard Friday

Better Site Speed: 4 Outside-the-Box Ideas

Google Confirms Chrome Usage Data Used to Measure Site Speed

Will Intelligent Personal Assistants Replace Websites?

All About App Search: Indexing, Ranking Factors, Universal Links, and More

Revisiting 'Navigational' 'Informational' & 'Transactional' Searches in a Post-PageRank World

An Open-Source Tool for Checking rel-alternate-hreflang Annotations

Search Trends: Are Compound Queries the Start of the Shift to Data-Driven Search?

Google's Physical Web and its Impact on Search

IPv6, C-Blocks, and How They Affect SEO

Machine Learning for SEOs

From Keywords to Contexts: the New Query Model

10 .htaccess File Snippets You Should Have Handy

Monitor Which Social Networks Your Visitors are Logged Into With Google Analytics

Introducing SERP Turkey: A Free Tool to Split-Test and Gather CTR Analytics of SERP Entries

Link Profile Tool to Discover Paid Links or Other Anomalous Linking Activity

Competitive Analysis in Under 60 Seconds Using Google Docs

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved