Google can't access/crawl my site!

granitgash

Hi

I'm dealing with this problem for a few days. In fact i didn't realize it was this serious until today when i saw most of my site "de-indexed" and losing most of the rankings.

[URL Errors: 1st photo]

8/21/14 there were only 42 errors but in 8/22/14 this number went to 272 and it just keeps going up.

The site i'm talking about is gazetaexpress.com (media news, custom cms) with lot's of pages.

After i did some research i came to the conclusion that the problem is to the firewall, who might have blocked google bots from accessing the site. But the server administrator is saying that this isn't true and no google bots have been blocked.

Also when i go to WMT, and try to Fetch as Google the site, this is what i get:

[Fetch as Google: 2nd photo]

From more than 60 tries, 2-3 times it showed Complete (and this only to homepage, never to articles).

What can be the problem? Can i get Google to crawl properly my site and is there a chance that i will lose my previous rankings?

Thanks a lot
Granit

FvhvDVR.png dKx3m1O.png

Travis_Bailey

What did you do specifically to mitigate the problem? You can PM me, if you would like.

Travis_Bailey

This applies to the guy from Albania.

Oh, this IS the guy from Albania. Never mind.

KeriMorgret

Great, thanks for letting us know what happened with this!

granitgash

Hi all

Just wanted to let you know that we fixed the problem. We disabled CloudFlare which we found out was blocking Google bots. More about this issue can be found at: https://support.cloudflare.com/hc/en-us/articles/200169806-I-m-getting-Google-Crawler-Errors-What-should-I-do-

granitgash

Hi Travis, thank you for your time.

Great for your friend, I also suggest to visit Kosovo someday, you will have great time here, for sure

Back to the issue:

Here is an interesting issue that is happening with the crawler.

Our own cms uses htaccess for rewrite purposes. I created 2 new files that are independent from CMS and tried to fetch them with WMT, and it worked like a charm.

These 2 independent files are:

www.gazetaexpress.com/test_manaferra.php

www.gazetaexpress.com/xhezidja.php

Then, I created an ajax page with our CMS, which contains only plain text, tried to fetch it by WMT and strangely enough it didn't work. To make sure that the .htaccess file is not affecting this behavior, I deleted the htaccess and tried to fetch it, but it didn't worked.

The ajax page is: www.gazetaexpress.com/page/xhezidja/?pageSEO=false

The site works perfectly for humans which access it via the browser.

I'm more than confused now!

ac857dfbf02a316d92d378bc48f9c395.png

Travis_Bailey

A friend of mine just got back from Kosovo. It was the last stop on a tour of the Balkans. He had a pretty good time. Moving along...

I crawled about 12K URLs and hit almost 90 Internal Server Errors (500). It's probably not your core problem, but it's something to look at. Here are a few examples:

http://www.gazetaexpress.com/blihet/?search_category_id=1&searchFilter=1

http://www.gazetaexpress.com/shitet/?category_id=134&searchFilter=1

http://www.gazetaexpress.com/me-qera/?category_id=131&searchFilter=1

There was one actual page that threw a 500 at the time of crawl:

http://www.gazetaexpress.com/mistere/edhe-kesaj-i-thuhet-veze-22591/

The edhe kesaj page now resolves fine. (I'm not even going to pretend to understand or write Albanian.)

So there may be some issues with the server or hosting. If you haven't already, try this troubleshooter from Cloudflare.

Andy.Drinkwater

Ah OK - well keep us updated with what you find. Someone else will chip in with other info if they have some

-Andy

granitgash

We are suspecting that CloudFlare might be causing these troubles. We are trying everything, in the meantime i'm looking here to see if anyone has any similar experience or an idea for solution.

As for warnings, the only warning we had was the one last week (8/23/14) saying that Google bot can't acces our site:

Over the last 24 hours, Googlebot encountered 316 errors while attempting to connect to your site. Your site's overall connection failure rate is 7.5%.

-Granit

Andy.Drinkwater

It doesn't look like a firewall, as I can crawl it with Screaming Frog. However, the server logs will be able to answer that one for you.

Without looking in depth, I'm not seeing anything that stands out to me - do you think that there have been changes to the server that could cause issues? What firewall is the server running? Also, if there were errors in crawling the site, you would see a warning about this.

-Andy

granitgash

In mid-march website changed it's CMS but i don't think that could be the reason because until this week everything was working perfectly. I don't think it could have been compromised too. I'm still suspecting it could be the firewall blocking bots from crawling the site, but the server administrator couldn't find any evidence of this.

Andy.Drinkwater

Hi Granit,

Has any work been done to the site in the last 2-3 months? Have you had any warnings in webmaster tools at all? I did once see a strange problem where Google wasn't crawling a site correctly because it had been compromised, but after checking, there is nothing like this on yours.

-Andy

granitgash

No prb. Thanks a lot for your time. Let just hope that someone in the community will help with a solution

KeriMorgret

Unfortunately, I don't have a quick answer for you. Looking forward to seeing what other community members have to say on this one!

granitgash

I'm looking at the http version in GWT

KeriMorgret

If I do a site:gazetaexpress.com in Google, I get some results that are http, and some results that are https. The https ones say there is an SSL connection error.

Are you looking at the http or https version in GWT?

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Google can't access/crawl my site!

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Google Is Indexing my 301 Redirects to Other sites

Why doesn't my website crawl by Google?

If I put a piece of content on an external site can I syndicate to my site later using a rel=canonical link?

I've got duplicate pages. For example, blog/page/2 is the same as author/admin/page/2\. Is this something I should just ignore, or should I create the author/admin/page2 and then 301 redirect?

Google didn't indexed my domain.

How does Google determine 'top refeferences'?

How can I rank a national site for local terms

My site links have gone from a mega site links to several small links under my SERP results in Google. Any ideas why?