Google can't access/crawl my site!
-
Hi
I'm dealing with this problem for a few days. In fact i didn't realize it was this serious until today when i saw most of my site "de-indexed" and losing most of the rankings.
[URL Errors: 1st photo]
8/21/14 there were only 42 errors but in 8/22/14 this number went to 272 and it just keeps going up.
The site i'm talking about is gazetaexpress.com (media news, custom cms) with lot's of pages.
After i did some research i came to the conclusion that the problem is to the firewall, who might have blocked google bots from accessing the site. But the server administrator is saying that this isn't true and no google bots have been blocked.
Also when i go to WMT, and try to Fetch as Google the site, this is what i get:
[Fetch as Google: 2nd photo]
From more than 60 tries, 2-3 times it showed Complete (and this only to homepage, never to articles).
What can be the problem? Can i get Google to crawl properly my site and is there a chance that i will lose my previous rankings?
Thanks a lot
Granit -
What did you do specifically to mitigate the problem? You can PM me, if you would like.
-
This applies to the guy from Albania.
Oh, this IS the guy from Albania. Never mind.
-
Great, thanks for letting us know what happened with this!
-
Hi all
Just wanted to let you know that we fixed the problem. We disabled CloudFlare which we found out was blocking Google bots. More about this issue can be found at: https://support.cloudflare.com/hc/en-us/articles/200169806-I-m-getting-Google-Crawler-Errors-What-should-I-do-
-
Hi Travis, thank you for your time.
Great for your friend, I also suggest to visit Kosovo someday, you will have great time here, for sure
Back to the issue:
Here is an interesting issue that is happening with the crawler.
Our own cms uses htaccess for rewrite purposes. I created 2 new files that are independent from CMS and tried to fetch them with WMT, and it worked like a charm.
These 2 independent files are:
www.gazetaexpress.com/test_manaferra.php
www.gazetaexpress.com/xhezidja.php
Then, I created an ajax page with our CMS, which contains only plain text, tried to fetch it by WMT and strangely enough it didn't work. To make sure that the .htaccess file is not affecting this behavior, I deleted the htaccess and tried to fetch it, but it didn't worked.
The ajax page is: www.gazetaexpress.com/page/xhezidja/?pageSEO=false
The site works perfectly for humans which access it via the browser.
I'm more than confused now!
-
A friend of mine just got back from Kosovo. It was the last stop on a tour of the Balkans. He had a pretty good time. Moving along...
I crawled about 12K URLs and hit almost 90 Internal Server Errors (500). It's probably not your core problem, but it's something to look at. Here are a few examples:
http://www.gazetaexpress.com/blihet/?search_category_id=1&searchFilter=1
http://www.gazetaexpress.com/shitet/?category_id=134&searchFilter=1
http://www.gazetaexpress.com/me-qera/?category_id=131&searchFilter=1
There was one actual page that threw a 500 at the time of crawl:
http://www.gazetaexpress.com/mistere/edhe-kesaj-i-thuhet-veze-22591/
The edhe kesaj page now resolves fine. (I'm not even going to pretend to understand or write Albanian.)
So there may be some issues with the server or hosting. If you haven't already, try this troubleshooter from Cloudflare.
-
Ah OK - well keep us updated with what you find. Someone else will chip in with other info if they have some
-Andy
-
We are suspecting that CloudFlare might be causing these troubles. We are trying everything, in the meantime i'm looking here to see if anyone has any similar experience or an idea for solution.
As for warnings, the only warning we had was the one last week (8/23/14) saying that Google bot can't acces our site:
Over the last 24 hours, Googlebot encountered 316 errors while attempting to connect to your site. Your site's overall connection failure rate is 7.5%.
-Granit
-
It doesn't look like a firewall, as I can crawl it with Screaming Frog. However, the server logs will be able to answer that one for you.
Without looking in depth, I'm not seeing anything that stands out to me - do you think that there have been changes to the server that could cause issues? What firewall is the server running? Also, if there were errors in crawling the site, you would see a warning about this.
-Andy
-
In mid-march website changed it's CMS but i don't think that could be the reason because until this week everything was working perfectly. I don't think it could have been compromised too. I'm still suspecting it could be the firewall blocking bots from crawling the site, but the server administrator couldn't find any evidence of this.
-
Hi Granit,
Has any work been done to the site in the last 2-3 months? Have you had any warnings in webmaster tools at all? I did once see a strange problem where Google wasn't crawling a site correctly because it had been compromised, but after checking, there is nothing like this on yours.
-Andy
-
No prb. Thanks a lot for your time. Let just hope that someone in the community will help with a solution
-
Unfortunately, I don't have a quick answer for you. Looking forward to seeing what other community members have to say on this one!
-
I'm looking at the http version in GWT
-
If I do a site:gazetaexpress.com in Google, I get some results that are http, and some results that are https. The https ones say there is an SSL connection error.
Are you looking at the http or https version in GWT?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Indexed Site A's Content On Site B, Site C etc
Hi All, I have an issue where the content (pages and images) of Site A (www.ericreynolds.photography) are showing up in Google under different domains Site B (www.fastphonerepair.com), Site C (www.quarryhillvet.com), Site D (www.spacasey.com). I believe this happened because I installed an SSL cert on Site A but didn't have the default SSL domain set on the server. You were able to access Site B and any page from Site A and it would pull up properly. I have since fixed that SSL issue and am now doing a 301 redirect from Sites B, C and D to Site A for anything https since Sites B, C, D are not using an SSL cert. My question is, how can I trigger google to re-index all of the sites to remove the wrong listings in the index. I have a screen shot attached so you can see the issue clearer. I have resubmitted my site map but I'm not seeing much of a change in the index for my site. Any help on what I could do would be great. Thanks
Intermediate & Advanced SEO | | cwscontent
Eric TeVM49b.png qPtXvME.png1 -
When searching for related:katom.com on google, why isn't our website coming up?
A lot of our competitors come up but we aren't coming up. What do we need to do so that google considers us related? Our website is culinarydepotinc.com And I believe not being related to those big competitors affects our SEO, is that correct?
Intermediate & Advanced SEO | | Sammyh2 -
Is it necessary to use Google's Structured Data Markup or alternative for my B2B site?
Hi, We are in the process of going through a re-design for our site. Am trying to understand if we need to use some sort of structured data either from Google Structured data or schema. org?
Intermediate & Advanced SEO | | Krausch0 -
Google don't index .ee version of a website
Hello, We have a problem with our clients website .ee. This website was developed by another company and now we don't know what is wrong with it. If i do a Google search "site:.ee" it only finds konelux.ee homepage and nothing else. Also homepage title tag and meta dec is in Finnish language not in Estonian language. If i look at .ee/robots.txt it looks like robots.txt don't block Google access. Any ideas what can be wrong here? BR, T
Intermediate & Advanced SEO | | sfinance0 -
How is Google crawling and indexing this directory listing?
We have three Directory Listing pages that are being indexed by Google: http://www.ccisolutions.com/StoreFront/jsp/ http://www.ccisolutions.com/StoreFront/jsp/html/ http://www.ccisolutions.com/StoreFront/jsp/pdf/ How and why is Googlebot crawling and indexing these pages? Nothing else links to them (although the /jsp.html/ and /jsp/pdf/ both link back to /jsp/). They aren't disallowed in our robots.txt file and I understand that this could be why. If we add them to our robots.txt file and disallow, will this prevent Googlebot from crawling and indexing those Directory Listing pages without prohibiting them from crawling and indexing the content that resides there which is used to populate pages on our site? Having these pages indexed in Google is causing a myriad of issues, not the least of which is duplicate content. For example, this file <tt>CCI-SALES-STAFF.HTML</tt> (which appears on this Directory Listing referenced above - http://www.ccisolutions.com/StoreFront/jsp/html/) clicks through to this Web page: http://www.ccisolutions.com/StoreFront/jsp/html/CCI-SALES-STAFF.HTML This page is indexed in Google and we don't want it to be. But so is the actual page where we intended the content contained in that file to display: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff As you can see, this results in duplicate content problems. Is there a way to disallow Googlebot from crawling that Directory Listing page, and, provided that we have this URL in our sitemap: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff, solve the duplicate content issue as a result? For example: Disallow: /StoreFront/jsp/ Disallow: /StoreFront/jsp/html/ Disallow: /StoreFront/jsp/pdf/ Can we do this without risking blocking Googlebot from content we do want crawled and indexed? Many thanks in advance for any and all help on this one!
Intermediate & Advanced SEO | | danatanseo0 -
Can Google index PDFs with flash?
Does anyone know if Google can index PDF with Flash embedded? I would assume that the regular flash recommendations are still valid, even when embedded in another document. I would assume there is a list of the filetype and version which Google can index with the search appliance, but was not able to find any. Does anyone have a link or a list?
Intermediate & Advanced SEO | | andreas.wpv0 -
My New(ish) Site Isn't Ranking Well And Recently Fell
I launched my site (jesfamilylaw.com) at the beginning of January. Since then, I've been trying to build high quality back links. I have a few back links with keyword targeted anchor text from some guest posts I've published (maybe 3 or so) and I have otherwise signed up for business directories and industry-specific directories. I have a few social media profiles and some likes on Facebook, both for the company page and some posts. Despite this, I've had a lot of trouble cracking Google's top ten for any term, long or tall tail. I was starting to climb for Evanston Family Law, which is the key term I believe I am best optimized for, but took a dive yesterday. I fell from maybe the 14th result to somewhere on the 4th page. For all my other target terms, I don't know if I've gotten into the 20s yet. To further complicate matters, my Google Places listing isn't showing and is on the second page of results for Places searches, after businesses that aren't located in the same city. The night before I fell, I resubmitted my site to Google because Webmaster tools was showing duplicate title tags when I had none. I had also made a couple changes to some internal links and title tags, but only for a small fraction of the site. Long story short, I don't know what's going on. I don't know why I fell in the rankings and why my site isn't competitive for some of my target key phrases. I've read so many horror stories about Penguin that I fear my onsite optimization may be hurting my rankings or my back links are insufficient. I've done plenty of competitor research and the sites that are beating me have very aggressive onsite optimization and few back links. In short, I am very confused. Any help would be immensely appreciated.
Intermediate & Advanced SEO | | JESFamilyLaw0 -
Strategies to compete with a new domain/site
Hi all, What would be ( highlights ) your strategy in order to rank and compete with a new domain against competitors that have an average of 50% domain authority and around 2000 root domain linking to them, if you would start with a completely new website/domain? How long would you estimate the new site to be competitive? In the retail area. Working on it a month full time I would go with On page SEO off course, detailling each products and building the internal link structure Get back links, backlinks, backlinks and... backlinks... Build the social media network feed a blog Thanks for your input Considering working on the site for a month full time, I would estimate a ranking after a month or 2 although the competitions very high. Your thoughts ?
Intermediate & Advanced SEO | | Derek_A0