Struggling with Google Bot Blocks - Please help!
-
I own a site called www.wheretobuybeauty.com.au
After months and months we still have a serious issue with all pages having blocked URLs according to Google Webmaster Tools.
The 404 errors are returning a 200 header code according to the email below. Do you agree that the 404.php code should be changed? Can you do that please ?
The current state:
Google webmaster tools Index Status shows:
26,000 pages indexed
44,000 pages blocked by robots.
In late March, we implemented a change recommended by an SEO expert and he provided a new robots.txt file, advised that we should amend sitemap.xml and other changes. We implemented those changes and then setup a re-index of the site by google. The no of blocked URLs eventually reduced in May and June to 1,000 for a few days – but now the problem has rapidly returned.
The no of pages that are displayed in a google search request of www.google.com.au where the query was ‘site:wheretobuybeauty.com.au’ is 37,000:
This new site has been re-crawled over last 4 weeks.
About the site
This is a Linux php site and has the following:
55,000 URLs in sitemap.xml submitted successfully to webmaster tools
robots.txt file has been modified several times:
Firstly we had none
Then we created one but were advised that it needed to have this current content:
User-agent: *
Disallow:
-
No problem my friend. You are most welcome and here at Moz, you will not only be able to get almost all your SEO related queries addressed and solved, you will also learn a great deal about digital marketing. I highly recommend to every aspiring digital marketer to be active on a community like Moz and I bet they will be able to save a great deal of time and money as well. Wish you all the very best.
Regards,
Devanur Rafi.
-
Thanks Devanur - trying out everything you have suggested.
-
Hi Alex,
Sorry, if I were not clear in my previous post. I meant that in general pages with cleaner code will have an edge over similar pages with bad code when it comes to SEO.
Just an example: Page A has cleaner code compared to page B with all other SEO factors being equal. In a scenario like this, page B might not be favored by Google because of issues arising from bad code like page loading performance, poor rendering in browsers etc,.
The issue at hand might not be because your pages do not pass W3 Validation but its not a bad idea to have a cleaner code on your website
Best regards,
Devanur Rafi.
-
Hi Devanur
My understanding is that Google does not have a problem with invalid XHTML or pages that are not W3C accessible. Please see a comment on this at SEOMOZ:
-
Hi Alex,
I did a code validation check for the following URL:
It gave 238 Errors and 538 Warnings!!
Search engines like Google favor pages with cleaner code. So, I strongly recommend to have the code cleaned on the website.
Here you go for validation check:
Best regards,
Devanur Rafi.
-
Hi Alex,
If the underscores constitute only 4% of the total URLs, then this can be safely kept aside in purview of the current issue.
Same goes with the keyword repetition in the page titles and URLs. However, if it is possible for you to revisit your URL structure and have it like the following, you should go for it:
www.wheretobuybeauty.com.au/<brand< a=""> name>/<product name="">, e.g.</product></brand<>
http://www.wheretobuybeauty.com.au/floris/royal-arms-diamond-edition-eau-de-parfum-spray-100ml-34oz
Same thing with the Page titles also.
Now we are left with two things, the page performance and URL canonicalization. Please have them fixed as early as possible.
Also, I checked your IP address and you have gone for a shared hosting. This is not at all recommended if you are a serious online business owner. Your IP, 103.9.170.75 is being shared by at least 250 other domains that include some bad websites.
Though there are different views about IP bad neighborhood and its impact on SEO, I have always been an advocate of clean IP and recommended it to all my clients always. You can go in for a dedicated IP which is very cheap these days and better yet if you go for a VPS.
For more about this, please check out the "Oops, your IP is either dirty or virtual" section on the following page:
http://www.bruceclay.com/in/seo-tech-tips/techtips.htm
And also, this section, "A Strong Foundation for Your Site to Operate On" on the following page:
http://www.bruceclay.com/blog/2011/04/the-seo-bucket-list-3-things-to-do-before-your-site-dies/
Lastly, I checked your domain's DNS health and here you go for the results:
http://intodns.com/wheretobuybeauty.com.au
Though these might not be causing the current issue, its good to sort everything as we should not leave any stone unturned in making our website a better one out there.
Best regards,
Devanur Rafi.
-
Hey Devanur
please see our responses below:
Hi Alex,
Thanks for the info. Here are few issues that I observed with the website and I am very confident that if you can address and fix these, you should come out of the issue with flying colors:
1. URL canonicalization issue: Both the www and non-www versions of your website URLs return an HTTP header status code 200. You should ideally make all the non-www URLs to be redirected to their respective www versions via a 301 permanent redirection immediately.
**Response: We are asking the developer to correct this. **
2. Inconsistent URL structure: Your website is still using 'underscrores (_) in the URLs as word separators. There are underscores along with the recommended hyphens (-). This inconsistent usage can sometimes lead to issues. So please replace all the underscores with hyphens.
Response: This problem only occurs in a few pages where special characters have been replaced with underscores – probably in 4% of product pages. I can’t see that this has an impact on the SEO?
3. Google PageSpeed check: When I ran Google PageSpeed test on some of the URLs from your website along with the ones that you gave, I found the score varying between, 28 and 60. Please look at the recommendations that the PageSpeed tool gives and try to address the issues (especially the ones like, "Reduce blocking resources". For more: https://developers.google.com/speed/docs/best-practices/rtt#PreferAsyncResources)
I suggest you to please run Google PageSpeed check for some of the URLs.
Note: The URLs from your website that are present in the Google's index may also give similar issues when run through PageSpeed test. This should not make you not addressing these issues.
Response: We will ask the developers to improve performance specifically with the highest value things that are showing up in Google PageSpeed check.
4. Heavy pages leading to higher page loading times and response times:
Many of the pages that I checked are more than 1.3 MB in size which is very huge.This can be a really big problem most of the times that will not only impacts your website from search engines' perspective but also leads to bad user experience which ultimately affects the SEO of your website. You can use tools like gtmetrix.com and fix the issues shown by them.
Response: We will ask the developers to improve performance specifically with the highest value things that are showing up in gtmetrix.com suggestions.
5. Repetition of keywords or phrases in page titles and URLs:
This issue might look like an over optimization effort and should be fixed as early as possible.
For example: www.wheretobuybeauty.com.au/acqua-di-parma/acqua-di-parma-acqua-di-parma-collezione-barbiere-shaving-cream-75ml_25oz
If you look at the above page, the phrase, 'acqua-di-parma' is present twice in both the URL and page title. This is something that you need to review seriously as it looks like keyword repetition that is not good from an SEO stand point.
Response: This occurs with approx 300 product pages out of 40,000 so a very small percentage. We will clean this up when we update our data. I can’t see that this has any impact on SEO considering the small no? Note however that every product page is constructed as follows:
http://www.wheretobuybeauty.com.au/floris/floris-royal-arms-diamond-edition-eau-de-parfum-spray-100ml_34oz
Is there some risk that this will look like over optimisation?
By the way, your robots.txt file is clean and it should not be causing these issues.
Please have the issues mentioned above as soon as possible and you should be out of the woods soon after that.
I wish you good luck Alex.
Best regards,
Devanur Rafi.
-
Hi Alex,
Thanks for the info. Here are few issues that I observed with the website and I am very confident that if you can address and fix these, you should come out of the issue with flying colors:
1. URL canonicalization issue: Both the www and non-www versions of your website URLs return an HTTP header status code 200. You should ideally make all the non-www URLs to be redirected to their respective www versions via a 301 permanent redirection immediately.
2. Inconsistent URL structure: Your website is still using 'underscrores (_) in the URLs as word separators. There are underscores along with the recommended hyphens (-). This inconsistent usage can sometimes lead to issues. So please replace all the underscores with hyphens.
3. Google PageSpeed check: When I ran Google PageSpeed test on some of the URLs from your website along with the ones that you gave, I found the score varying between, 28 and 60. Please look at the recommendations that the PageSpeed tool gives and try to address the issues (especially the ones like, "Reduce blocking resources". For more: https://developers.google.com/speed/docs/best-practices/rtt#PreferAsyncResources)
I suggest you to please run Google PageSpeed check for some of the URLs.
Note: The URLs from your website that are present in the Google's index may also give similar issues when run through PageSpeed test. This should not make you not addressing these issues.
4. Heavy pages leading to higher page loading times and response times:
Many of the pages that I checked are more than 1.3 MB in size which is very huge.This can be a really big problem most of the times that not only impacts your website from search engines' perspective but also leads to bad user experience which ultimately affects the SEO of your website. You can use tools like gtmetrix.com and fix the issues shown by them.
5. Repetition of keywords or phrases in page titles and URLs:
This issue might look like an over optimization effort and should be fixed as early as possible.
For example: www.wheretobuybeauty.com.au/acqua-di-parma/acqua-di-parma-acqua-di-parma-collezione-barbiere-shaving-cream-75ml_25oz
It could have been like: www.wheretobuybeauty.com.au/acqua-di-parma/collezione-barbiere-shaving-cream-75ml-25oz
If you look at the above page, the phrase, 'acqua-di-parma' is present twice in both the URL and page title. This is something that you need to review seriously as it looks like keyword repetition that is not good from an SEO stand point.
By the way, your robots.txt file is clean and it should not be causing these issues.
Please have the issues mentioned above as soon as possible and you should be out of the woods soon after that.
I wish you good luck Alex.
Best regards,
Devanur Rafi.
-
Thanks Devanur
I put this to my partners and he said he is addressing it but that the main issue still remains.
This is the critical issue where there are only a few pages visible to google search as almost all are blocked by the google bot. I am re-stating the problem in this email for you.
Can you please take a look at the whole problem and see if you can see what is causing this.
Is robots.txt causing this? It is the only change that we have made and at one point the problem was corrected but has now returned. I have read everything that I can about robots.txt on the google site and in forums.
Here are two examples (out of 44,000) that are blocked. It is easy to find other examples – simply test any of the product pages – only 200 out of 44,000 return any result.
Try searching using www.google.com.au and using the search query
Abercrombie & Fitch 1892 Cobalt Eau De Cologne Spray 50ml/1.7oz site:wheretobuybeauty.com.au
Second example:
Try searching using:
Acqua Di Parma Collezione Barbiere Shaving Cream 75ml/2.5oz site:wheretobuybeauty.com.au
The current state:
Google webmaster tools Index Status shows:
26,000 pages indexed
44,000 pages blocked by robots.
In late March, we implemented a change recommended by an SEO expert Harmeen and he provided a new robots.txt file, advised that we should amend sitemap.xml and other changes. We implemented those changes and then setup a re-index of the site by google. The no of blocked URLs eventually reduced in May and June to 1,000 for a few days – but now the problem has rapidly returned.
This new site has been re-crawled over last 4 weeks.
About the site
55,000 URLs in sitemap.xml submitted successfully to webmaster tools
robots.txt file has been modified several times:
Firstly we had none, then we created one but were advised that it needed to have this current content:
“User-agent: *
Disallow:
Sitemap: http://www.wheretobuybeauty.com.au/sitemap.xml”
I put this into robots.txt but was then advised yesterday that there should be no blank line between these lines and I removed them yesterday.
-
Hi Alex,
Without diving in to the issue of increased number of 404 errors being reported by Webmaster tools account, let us first look at the core issue where, 404 pages (non-existing resources) that return an HTTP header status code 200. These are called, 'soft 404 errors'. Ideally all the non-existing resources on the website should return an HTTP header status code 404 or 410 as per the situation and not a status 200 which is very confusing for search engines and a bad practice. This should be fixed immediately. Please have all such pages return 404 and not 200 as soon as possible.
Here you go for more about the soft 404 errors:
https://support.google.com/webmasters/answer/181708?hl=en
and here to know more about when to return a 404 status code:
https://support.google.com/webmasters/answer/2409439?hl=en
Best regards,
Devanur Rafi.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Site appearing and disappearing from google serps.
Hi, My website is normally on page 2-3 on google consistently. Over the past month it has been appearing and then completely disappearing from the serps. One day it will be on page 2, then the next day completely missing from the serps. When i check the index it seems to be indexed correctly when doing site:mysite.com. I don't understand why this keeps happening, any experience with this issue? It doesn't seem to be a google dance as far as I can tell. When my other sites dance they typically just go up or down a few ranks for a couple weeks until they stabilize. Not completely fall off the search engine.
Algorithm Updates | | Chris_www0 -
Google June Core update massive drop in visibility and rank
Following Google Core update in June, I noticed that some websites has seen small improvements or stayed on same positions in SERPs, but our website received nearly 60% visibility drop. Now, after two weeks, some keywords shows "slight" improvement, while majority of keywords still sits on quite low places in searches. Even others big brands got hit after update, but some competitors didn't and overtook our positions. Anyone else noticed any patterns after update and can share some thoughts? Thank you.
Algorithm Updates | | Optimal_Strategies1 -
Hit by an unnamed Google update on November 30th - Still suffering
Hi, Moz Community. Just decided to sign up for a free trial because I'm absolutely at my wits end here. Here's my site: cheapgamesguru.com I run a small PC gaming blog monetized by affiliate marketing. I do all the writing, SEO, etc. myself. The content I write is, from what I can tell, fully complying with Google's guidelines, and in and of itself is pretty informative and high-quality. My site was started in December of 2015, and it was doing very well for a good 10 or 11 months - until late November of 2016. Then something happened. My traffic started plummeting - I went from getting nearly 300 organic users a day (Not sessions - actual unique users) to 80, then 40, and now I'm lucky to get over 15 a day. I do not do ANY black hat SEO whatsoever. I have not taken part in any shady link building schemes, nor do I try to trick Google in any way. I just write good content, do good keyword research (Targeting only low-hanging fruit and low-difficulty keywords using KWFinder), and do my best to provide a good user experience. I run no ads on my site. Glenn Gabe wrote about a potential Google update on November 29th, but the stuff he said in his article doesn't seem to affect me - my mobile site is perfectly fine, according to Google's own metrics and testing tools. Here's the article in question: http://www.gsqi.com/marketing-blog/november-30-2016-google-algorithm-update/ At first, I thought it was possible that this was a result of my competitors simply doing far better than me - but that doesn't seem to be the case, as their rankings did not actually move - mine simply pummeted. And many of their sites are far worse than mine in terms of grammar, spelling, and site speed. I understand backlinks are important, by the way, but I really don't think that's why my site was hit. Many competitors of mine have little to no backlinks and are doing great, and it would also not make much sense for Google to hit an otherwise great site just because they have few backlinks. A friend of mine has reached out to Glenn Gabe himself to see if he can get his input on my site, but he's had a busy schedule and hasn't gotten a chance to take a look yet. I recently obtained a backlink from a highly relevant DA 65 site (About a month ago, same niche as my site), and it now shows up in Search Console and Ahrefs - but it hasn't affected rankings whatsoever. Important Note: I'm not only just ranking poorly for stuff, I'm ranking in position 100-150+ for many low-competition keywords. I have no idea why that is happening - is my site THAT bad, that my content deserves to be ranking on page 15 or lower? Sorry for the long question. I'm struggling here, and just wanted to give as much information as possible. I would really appreciate any input you guys can give me - if any SEO experts want to turn my site into a case study and work with me to improve things, I'd also be open to that 😉 I kid, of course - I know you guys are all busy. Thanks! P.S. I've attached a picture of my SEMRush graph, for reference, as well. mhgSw
Algorithm Updates | | polycountz0 -
New Google Update In The Past Two Days???
Was there a new Google update in the past couple of days. Traffic on my test site has gone from ~ 1,000 per day to over 4,000 per day for no particular reason. Most of the traffic is still coming from Google and is not the result of any new major links. My keyword rankings also appear to be the same ...
Algorithm Updates | | Humanovation0 -
Is it me or Google?
Hi All, I'm new here so take it easy on me.. OK, basically i have had a SEO company for about 3 years, they did a wonderful job, for the last 12 months or so i have been in top 1-3 positions for pretty much every keyword i wanted... On Jan 17th, that all changed, suddenly google doesnt like something about my site... for the sake of this questions lets focus purely on the keyword "CCTV", i use to be 1st or 2nd, it varied... Since Jan 17th i am all over the place, today alone, i was 9th this morning, then 13th, then 22nd... I am working on a lot of things my SEO company told me to do, with regards my site, obviously this is going to take time... but my big concern is that google doesnt seem to know where to rank me lol, i mean, at least if they settled on a place i.e 22nd, then i have a stable base to work from... Has anyone seen this kind of thing before, and can i expect at somepoint google decides to simply remove me? Any advice welcome. regards James
Algorithm Updates | | isntworkdull0 -
Organic listing & map listing on 1st page of Google
Hi, Back then, a company could get multiple listings in SERP, one in Google Maps area and a homepage or internal pages from organic search results. But lately, I've noticed that Google are now putting together the maps & organic listings. This observation has been confirmed by a couple of SEO people and I thought it made sense, but one day I stumble with this KWP "bmw dealership phoenix" and saw that www.bmwnorthscottsdale.com has separate listing for google places and organic results. Any idea how this company did this? Please see the attached image
Algorithm Updates | | ao5000000 -
Site name appended to page title in google search
Hi there, I have a strange problem concerning how the search results for my site appears in Google. The site is Texaspoker.dk and for some strange reason that name is appended at the end of the page title when I search for it in Google. The site name is not added to the page titles on the site. If I search in Google.dk (the relevant search engine for the country I am targeting) for "Unibet Fast Poker" I get the following page title displayed in the search results: Unibet Fast Poker starter i dag - få €10 og prøv ... - Texaspoker.dk If you visit the actual page you can see that there is no site name added to the page title: http://www.texaspoker.dk/unibet-fast-poker It looks like it is only being appended to the pages that contains rich snippets markup and not he forum threads where the rich snippets for some reason doesn't work. If I do a search for "Afstemning: Foretrukne TOPS Events" the title appears as it should without the site name being added: Afstemning: Foretrukne TOPS Events Anybody have any experience regarding this or an idea to why this is happening? Maybe the rich snippets are automatically pulling the publisher name from my Google+ account... edited: It doesn't seem to have anything to do with rich snippets, if I search for "Billeder og stuff v.2" the site name is also appended and if I search for "bedste poker bonus" the site name is not.
Algorithm Updates | | MPO0 -
When Google crawls and indexes a new page does it show up immediately in Google search - "site;"?
We made changes to a site, including the addition of a new page and corresponding link/text changes to existing pages. The changes are not yet showing up in the Google index (“site:”/cache), but, approximately 24 hours after making the changes, The SERP's for this site jumped up. We obtained a new back link about a couple of weeks ago, but it is not yet showing up in OSE, Webmaster Tools, or other tools. Just wondering if you think the Google SERP changes run ahead of what they actually show us in site: or cache updates. Has Google made a significant SERP “adjustment” recently? Thanks.
Algorithm Updates | | richpalpine0