Are those main category pages (like /collections/living-room-furniture) or are they different?
- Home
- Tom-Anthony
Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Tom-Anthony
@Tom-Anthony
Job Title: VP Product
Company: SearchPilot
I'm VP Product at SearchPilot (formerly DistilledODN), where I help develop our SEO A/B Testing platform. I've been building websites for 20 years, enjoy security research as a hobby (and have been rewarded by Google for BlackHat SEO research), have a PhD in AI, and am father to three smart & funny girls.
Favorite Thing about SEO
Constantly learning new stuff!
Latest posts made by Tom-Anthony
-
RE: How to deal with filter pages - Shopifyposted in On-Page Optimization
-
RE: Are Expires Headers Detrimental to SEO Health?posted in Technical SEO
Hi Dana,
Expires headers and other caching headers can help improve site performance (as you said), and that will be a good thing for SEO. There is no reason to be concerned - they are common headers and there isn't much they could do to have any negative impact on SEO.
Good luck!
Tom
-
RE: Robots.txt in subfolders and hreflang issuesposted in Technical SEO
Hi there!
Ok, it is difficult to know all the ins and outs without looking at the site, but the immediate issue is that your robots.txt setup is incorrect. robots.txt files should be one per subdomain, and cannot exist inside sub-folders:
A **
robots.txt**file is a file at the root of your site that indicates those parts of your site you don’t want accessed by search engine crawlersFrom Google's page here: https://support.google.com/webmasters/answer/6062608?hl=en
You shouldn't be blocking Google from either site, and attempting to do so may be the problem with why your hreflang directives are not being detected. You should move to having a single robots.txt file located at https://www.clientname.com/robots.txt, with a link to a single sitemap index file. That sitemap index file should then link to each of your two UK & US sitemap files.
You should ensure you have hreflang directives for every page. Hopefully after these changes you will see things start to get better. Good luck!
-
RE: 6 .htaccess Rewrites: Remove index.html, Remove .html, Force non-www, Force Trailing Slashposted in Intermediate & Advanced SEO
Hey NeatIT!
I see you have a working solution there. Did you have a specific question about the setup?
I did notice that your setup cane sometimes result in chaining 301 redirects, which is one area for possible improvement.
Let me know how we can help!

-
RE: What can I do to rank higher than low-quality low-content sites?posted in Local SEO
If you have many URLs from the old site in the index that are all in the same directory (or a handful of directories) you can quickly and easily remove whole directories of URLs from the index via Google Search Console. We have found it to work very quickly.
-
Go into Search Console, selected ‘Remove URLs’ under ‘Google Index’ in the left hand menu.
-
Add the page or folder you want to remove, and click next. If you add the homepage, that's the same as all pages on the site. If you add a folder you'll get three options under the ‘Reason’ drop down.
One of those options is ‘Remove directory’. Select that.

-
-
RE: Lazy Loading of products on an E-Commerce Website - Options Neededposted in Intermediate & Advanced SEO
Ok, cool. To reiterate - with escaped_fragment you are just serving the same content in a tweaked format and Google recommend it rather than frown upon it. Good to be sure though.
See you at SearchLove!

-
RE: Lazy Loading of products on an E-Commerce Website - Options Neededposted in Intermediate & Advanced SEO
Hi,
I am not sure I follow your concerns around serving an alternative version of the page to search engines - is that concern based on concerns it will be frowned upon or technical concerns?
Using the escaped_fragment methodology would work for your purposes, and would be the best approach. If you have technical concerns around creating the HTML snapshots you could look at a service such as https://prerender.io/ which helps manage this process.
If that doesn't answer your question, please give more information so we can understand more specifically where you concerns are.

-
RE: "Null" appearing as top keyword in "Content Keywords" under Google index in Google Search Consoleposted in Intermediate & Advanced SEO
It seems like the issue is a bug in the way Google handle data from your site ('null' being computer speak for 'empty', and often appearing after buggy handling of data). However, it seems that the indication from Umar is correct, and that this buggy data handling is likely prompted by crawling issue so that is the best place to start.
-
RE: What would cause the wrong category page to come up?posted in Intermediate & Advanced SEO
Is it possible for you to give a clearer description of the categories? You say they are different products but that one is a second category of the other
Does the page you want to rank show up for any other searches? In your analytics are you getting any traffic from Google to that page?
-
RE: Pitfalls when implementing the “VARY User-Agent” server responseposted in Intermediate & Advanced SEO
So, there are lots of 'ifs' here, but the primary problem I see with your plan is that the CDN will return the content to Googlebot without the request hitting your server so you won't have the option to serve different headers to Googlebot.
Remember that every page is the main HTML content (which may be static or dynamically generated for every request), and then a whole bunch of other resources (Javascript and CSS files, images, font files etc.). These other resources are typically static and lend themselves far better to being cached.
Are your pages static or dynamic? If they are dynamic then you are possibly not benefitting from them being cached anyway, so you could use the 'vary' header on just these pages, and not on any static resources. This would ensure your static resources are cached by your CDN and give you a lot of the benefit of the CDN, and only the dynamic HTML content is served directly from the server.
If most of your pages are static you could still use this approach, but just without the full benefit of the CDN, which sucks.
Some of the CDNs are already working on this (see http://www.computerworld.com/s/article/9225343/Akamai_eyes_acceleration_boost_for_mobile_content and http://orcaman.blogspot.co.uk/2013/08/cdn-caching-problems-vary-user-agent.html) to try and find better solutions.
I hope some of this helps!

Best posts made by Tom-Anthony
-
RE: Repeat Keyword Phrase or Use Variationsposted in On-Page Optimization
I would say a bit of both. It is fine to repeat your primary keyword phrase several times on the page; the number of times depends upon the amount of content. SEOmoz's On Page tool recommends 4 repetitions. However, you should also try to use some synonyms and secondary target keyword phrases also.
A good resource I saw posted today which might be of interest:
-
RE: Robots.txt in subfolders and hreflang issuesposted in Technical SEO
Hi there!
Ok, it is difficult to know all the ins and outs without looking at the site, but the immediate issue is that your robots.txt setup is incorrect. robots.txt files should be one per subdomain, and cannot exist inside sub-folders:
A **
robots.txt**file is a file at the root of your site that indicates those parts of your site you don’t want accessed by search engine crawlersFrom Google's page here: https://support.google.com/webmasters/answer/6062608?hl=en
You shouldn't be blocking Google from either site, and attempting to do so may be the problem with why your hreflang directives are not being detected. You should move to having a single robots.txt file located at https://www.clientname.com/robots.txt, with a link to a single sitemap index file. That sitemap index file should then link to each of your two UK & US sitemap files.
You should ensure you have hreflang directives for every page. Hopefully after these changes you will see things start to get better. Good luck!
-
RE: Lazy Loading of products on an E-Commerce Website - Options Neededposted in Intermediate & Advanced SEO
Ok, cool. To reiterate - with escaped_fragment you are just serving the same content in a tweaked format and Google recommend it rather than frown upon it. Good to be sure though.
See you at SearchLove!

-
RE: Thousands of 301 redirections - .htaccess alternatives?posted in White Hat / Black Hat SEO
Putting aside server load / config issues, and from the pure SEO point of view.
No, you shouldn't have any major issues with that many 301s. However, what you might find is that depending on the size of your site and the frequency of Googlebots visits that some of these pages take a long time (months) to drop out of the index and be replaced by their newer alternatives. This normally isn't cause for alarm.
In some instances you might end up with pages that now have now links to them (as their parent categories were all redirected also) and so seem to get stuck and never get recrawled by Google to update. In a couple of instances I have had success using XML sitemap files that just include these 'blocked' pages (the old URLs still in the index) to prompt Google to recrawl them.
Also there is Google Webmaster Tools feature to 'crawl as Googlebot' which then prompts you to 'submit to index' which you can use to prompt recrawls on a per-page basis (but you have credits here, so should only be for the more important pages).
Best of luck!
-
RE: Best approach to launch a new site with new urls - same domainposted in Intermediate & Advanced SEO
Just to chime in on this, albeit maybe a little late now... I had the same thought as I was reading through this with using rel=canonical to point the old pages to the new for now, so the search engines don't have any duplicate content issues until a 301 redirect can take over when the new site is fully launched.
However, depending on your rollout schedule, this would mean that the SERPs would soon be indexing only the new pages. You'd need to ensure that the traffic diverter you are using would handle this. Otherwise you could put the rel=canonical on the new pages for now, which would avoid the duplicate content until you are fully launched. Then you'd remove it and 301 redirect the old pages to the new.
Just something you maybe want to think about! Hopefully your traffic diverter can handle this though.

-
RE: Online Sitemap Generatorposted in Intermediate & Advanced SEO
I will just add that you need to be very careful. Maybe try the ones out that everyone has kindly suggested and then use SEOmoz Campaigns or other tools to check the quality of it.
This Whiteboard Friday post from last week has one of Bing's team pointing out how it is very important that your sitemap is of a high quality (no 404s, 302s or 301s for example), or it could be ignored completely:
-
RE: What can I do to rank higher than low-quality low-content sites?posted in Local SEO
If you have many URLs from the old site in the index that are all in the same directory (or a handful of directories) you can quickly and easily remove whole directories of URLs from the index via Google Search Console. We have found it to work very quickly.
-
Go into Search Console, selected ‘Remove URLs’ under ‘Google Index’ in the left hand menu.
-
Add the page or folder you want to remove, and click next. If you add the homepage, that's the same as all pages on the site. If you add a folder you'll get three options under the ‘Reason’ drop down.
One of those options is ‘Remove directory’. Select that.

-
-
RE: Do search engines understand special/foreign characters?posted in Intermediate & Advanced SEO
Hi David,
Google/Bing etc. have very few problems recognising such characters in the Latin alphabet. It looks like you are mainly concerned with umlauts, which Google handles intelligently. For example...
-
Google will identify the difference between a search for "Küchen" (kitchens in German) and "Kuchen" (cake in German) and offer up relevant results. This is true in Google US and Google UK, not just localised Googles.
-
Search suggestions work just fine with these characters, and even with the standardised way of rewriting them when there is no accessible way to type them (for umlauts this is with an e following the letter). For example, in Google.de, type "Kue" and you will be given the suggestion "Küchen".
You are mainly concerned with brands, which muddies the waters a little because many people in English speaking markets won't bother/know how to type the umlauts. However, Google normally handles this well and recognises the intent.
I would recommend you ensure you consistent use the brand name with the foreign characters, as intended. Google/Bing and co. shouldn't have any problems. Which HTML encoding you use is by the by, in my opinion, as long as the characters are rendering correctly.
-
-
RE: What is the best way to refresh a webpage of a news site, SEO wise?posted in Technical SEO
Hi Panos,
I don't necessarily disagree with Eric's answer, but I wanted to answer from a different point of view. I'm going to assume you really want or need some refresh mechanism built into the page.
In which case I'd agree that a Javascript approach using AJAX is probably a better solution. It will mean that users only need to load the new article headlines, and not the whole page, so the strain on your servers should be reduced. Furthermore, I find it a neater solution all around anyway - you could provide a notice 'new headlines available' that people click to refresh the articles list. This might be the best of both worlds?
Either way, meta refresh isn't as flexible, isn't as clean, and will put more strain on your servers.
Good luck!
-Tom
-
RE: Pitfalls when implementing the “VARY User-Agent” server responseposted in Intermediate & Advanced SEO
So, there are lots of 'ifs' here, but the primary problem I see with your plan is that the CDN will return the content to Googlebot without the request hitting your server so you won't have the option to serve different headers to Googlebot.
Remember that every page is the main HTML content (which may be static or dynamically generated for every request), and then a whole bunch of other resources (Javascript and CSS files, images, font files etc.). These other resources are typically static and lend themselves far better to being cached.
Are your pages static or dynamic? If they are dynamic then you are possibly not benefitting from them being cached anyway, so you could use the 'vary' header on just these pages, and not on any static resources. This would ensure your static resources are cached by your CDN and give you a lot of the benefit of the CDN, and only the dynamic HTML content is served directly from the server.
If most of your pages are static you could still use this approach, but just without the full benefit of the CDN, which sucks.
Some of the CDNs are already working on this (see http://www.computerworld.com/s/article/9225343/Akamai_eyes_acceleration_boost_for_mobile_content and http://orcaman.blogspot.co.uk/2013/08/cdn-caching-problems-vary-user-agent.html) to try and find better solutions.
I hope some of this helps!

I'm VP Product at SearchPilot (formerly DistilledODN), where I help develop our SEO A/B Testing platform. I've been building websites for 20 years, enjoy security research as a hobby (and have been rewarded by Google for BlackHat SEO research), have a PhD in AI, and am father to three smart & funny girls.
Looks like your connection to Moz was lost, please wait while we try to reconnect.