Google can't access/crawl my site!
-
Hi
I'm dealing with this problem for a few days. In fact i didn't realize it was this serious until today when i saw most of my site "de-indexed" and losing most of the rankings.
[URL Errors: 1st photo]
8/21/14 there were only 42 errors but in 8/22/14 this number went to 272 and it just keeps going up.
The site i'm talking about is gazetaexpress.com (media news, custom cms) with lot's of pages.
After i did some research i came to the conclusion that the problem is to the firewall, who might have blocked google bots from accessing the site. But the server administrator is saying that this isn't true and no google bots have been blocked.
Also when i go to WMT, and try to Fetch as Google the site, this is what i get:
[Fetch as Google: 2nd photo]
From more than 60 tries, 2-3 times it showed Complete (and this only to homepage, never to articles).
What can be the problem? Can i get Google to crawl properly my site and is there a chance that i will lose my previous rankings?
Thanks a lot
Granit -
What did you do specifically to mitigate the problem? You can PM me, if you would like.
-
This applies to the guy from Albania.
Oh, this IS the guy from Albania. Never mind.
-
Great, thanks for letting us know what happened with this!
-
Hi all
Just wanted to let you know that we fixed the problem. We disabled CloudFlare which we found out was blocking Google bots. More about this issue can be found at: https://support.cloudflare.com/hc/en-us/articles/200169806-I-m-getting-Google-Crawler-Errors-What-should-I-do-
-
Hi Travis, thank you for your time.
Great for your friend, I also suggest to visit Kosovo someday, you will have great time here, for sure
Back to the issue:
Here is an interesting issue that is happening with the crawler.
Our own cms uses htaccess for rewrite purposes. I created 2 new files that are independent from CMS and tried to fetch them with WMT, and it worked like a charm.
These 2 independent files are:
www.gazetaexpress.com/test_manaferra.php
www.gazetaexpress.com/xhezidja.php
Then, I created an ajax page with our CMS, which contains only plain text, tried to fetch it by WMT and strangely enough it didn't work. To make sure that the .htaccess file is not affecting this behavior, I deleted the htaccess and tried to fetch it, but it didn't worked.
The ajax page is: www.gazetaexpress.com/page/xhezidja/?pageSEO=false
The site works perfectly for humans which access it via the browser.
I'm more than confused now!
-
A friend of mine just got back from Kosovo. It was the last stop on a tour of the Balkans. He had a pretty good time. Moving along...
I crawled about 12K URLs and hit almost 90 Internal Server Errors (500). It's probably not your core problem, but it's something to look at. Here are a few examples:
http://www.gazetaexpress.com/blihet/?search_category_id=1&searchFilter=1
http://www.gazetaexpress.com/shitet/?category_id=134&searchFilter=1
http://www.gazetaexpress.com/me-qera/?category_id=131&searchFilter=1
There was one actual page that threw a 500 at the time of crawl:
http://www.gazetaexpress.com/mistere/edhe-kesaj-i-thuhet-veze-22591/
The edhe kesaj page now resolves fine. (I'm not even going to pretend to understand or write Albanian.)
So there may be some issues with the server or hosting. If you haven't already, try this troubleshooter from Cloudflare.
-
Ah OK - well keep us updated with what you find. Someone else will chip in with other info if they have some
-Andy
-
We are suspecting that CloudFlare might be causing these troubles. We are trying everything, in the meantime i'm looking here to see if anyone has any similar experience or an idea for solution.
As for warnings, the only warning we had was the one last week (8/23/14) saying that Google bot can't acces our site:
Over the last 24 hours, Googlebot encountered 316 errors while attempting to connect to your site. Your site's overall connection failure rate is 7.5%.
-Granit
-
It doesn't look like a firewall, as I can crawl it with Screaming Frog. However, the server logs will be able to answer that one for you.
Without looking in depth, I'm not seeing anything that stands out to me - do you think that there have been changes to the server that could cause issues? What firewall is the server running? Also, if there were errors in crawling the site, you would see a warning about this.
-Andy
-
In mid-march website changed it's CMS but i don't think that could be the reason because until this week everything was working perfectly. I don't think it could have been compromised too. I'm still suspecting it could be the firewall blocking bots from crawling the site, but the server administrator couldn't find any evidence of this.
-
Hi Granit,
Has any work been done to the site in the last 2-3 months? Have you had any warnings in webmaster tools at all? I did once see a strange problem where Google wasn't crawling a site correctly because it had been compromised, but after checking, there is nothing like this on yours.
-Andy
-
No prb. Thanks a lot for your time. Let just hope that someone in the community will help with a solution
-
Unfortunately, I don't have a quick answer for you. Looking forward to seeing what other community members have to say on this one!
-
I'm looking at the http version in GWT
-
If I do a site:gazetaexpress.com in Google, I get some results that are http, and some results that are https. The https ones say there is an SSL connection error.
Are you looking at the http or https version in GWT?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Incorrect Spelling Indexed In Meta Info - Can't Change It
Hi,It would be great if a member of the community could help me to resolve this issue.Google is indexing an incorrect spelling on of our key pages and we can't identify the reason why.- The page in question: https://newbridgesilverware.com/jewelleryAs you can see from the attached image, the Meta Title is rendered to contain the keyword "jewelry" (the American spelling.) We want this to read as "jewellery" - the British-English spelling. Yet in the page source the word is given in the meta title as "jewellery". Nowhere in the page source or on the page itself does the American spelling appear - yet Google still renders it in the Meta Title.Can anyone identify why this is happening and offer any possible solutions?Much appreciatedDhqJp
Intermediate & Advanced SEO | | Johnny_AppleSeed1 -
What can cause for a service page to rank in Google's Answer Box?
Hello Everyone, Have recently seen a Google result for "vps hosting" showing service page details in Answer Box. I would really like to know, what can cause a service page to appear in the Answer Box? Have attached a screenshot of result page. CaRiWtQUcAALn9n.png CaRiWtQUcAALn9n.png
Intermediate & Advanced SEO | | eukmark0 -
Sites still rank who don't seem like they should. Why?
So you've been MOZing and SEOing for years and we're all convinced of the 10x factor when it comes to content and ranking for certain search terms... right? So what do you do when some older sites that don't even produce content dominate the first page of a very important search term? They're home pages with very little content and have clearly all dabbled in pre Panda SEO. Surely people are still seeing this and wondering why?
Intermediate & Advanced SEO | | wearehappymedia0 -
Dfferent url of some other site is shown by Google in cace copy of our site's page
Hi, When i check cached copy of url of my site http://goo.gl/BZw2Zz , the url in cache copy shown by Google is of some other third party site. Why is Google showing third party url in our site's cached url. Did any of you guys faced any such issue. Regards,
Intermediate & Advanced SEO | | vivekrathore0 -
'Nofollow' footer links from another site, are they 'bad' links?
Hi everyone,
Intermediate & Advanced SEO | | romanbond
one of my sites has about 1000 'nofollow' links from the footer of another of my sites. Are these in any way hurtful? Any help appreciated..0 -
How do I presuade Google to re-consider my site?
A few weeks ago I got an emai from Google that my site is suspected to violating Google guidelines-->suspected links manipulationg Google Page rank. My site dropped to the second page. I have contacted some of the top webmasters who link to me and they have removed the links or added a nofollow. When I asked for re-consideation I got an answear that there are still suspected links. What do I do now? I can't remove all of my links?! BTW this happened before the offical Pinguin Update.
Intermediate & Advanced SEO | | Ofer230 -
How to get the 'show map of' tag/link in Google search results
I have 2 clients that have apparently random examples of the 'show map of' link in Google search results. The maps/addresses are accurate and for airports. They are both aggregators, they service the airports e.g. lax airport shuttle (not actual example) BUT DO NOT have Google Place listings for these pages either manually OR auto populated from Google, DO NOT have the map or address info on the pages that are returned in the search results with the map link. Does anyone know how this is the case? Its great that this happens for them but id like to know how/why so I can replicate across all their appropriate pages. My understanding was that for this to happen you HAD to have Google Place pages for the appropriate pages (which they cant do as they are aggregators). Thanks in advance, Andy
Intermediate & Advanced SEO | | AndyMacLean0 -
Competitior 'scraped' entire site - pretty much - what to do?
I just discovered a competitor in the insurance lead generation space has completely copied my client's site's architecture, page names, titles, even the form, tweaking a word or two here or there to prevent 100% 'scraping'. We put a lot of time into the site, only to have everything 'stolen'. What can we do about this? My client is very upset. I looked into filing a 'scraper' report through Google but the slight modifications to content technically don't make it a 'scraped' site. Please advise to what course of action we can take, if any. Thanks,
Intermediate & Advanced SEO | | seagreen
Greg0