Google can't access/crawl my site!
-
Hi
I'm dealing with this problem for a few days. In fact i didn't realize it was this serious until today when i saw most of my site "de-indexed" and losing most of the rankings.
[URL Errors: 1st photo]
8/21/14 there were only 42 errors but in 8/22/14 this number went to 272 and it just keeps going up.
The site i'm talking about is gazetaexpress.com (media news, custom cms) with lot's of pages.
After i did some research i came to the conclusion that the problem is to the firewall, who might have blocked google bots from accessing the site. But the server administrator is saying that this isn't true and no google bots have been blocked.
Also when i go to WMT, and try to Fetch as Google the site, this is what i get:
[Fetch as Google: 2nd photo]
From more than 60 tries, 2-3 times it showed Complete (and this only to homepage, never to articles).
What can be the problem? Can i get Google to crawl properly my site and is there a chance that i will lose my previous rankings?
Thanks a lot
Granit -
What did you do specifically to mitigate the problem? You can PM me, if you would like.
-
This applies to the guy from Albania.
Oh, this IS the guy from Albania. Never mind.
-
Great, thanks for letting us know what happened with this!
-
Hi all
Just wanted to let you know that we fixed the problem. We disabled CloudFlare which we found out was blocking Google bots. More about this issue can be found at: https://support.cloudflare.com/hc/en-us/articles/200169806-I-m-getting-Google-Crawler-Errors-What-should-I-do-
-
Hi Travis, thank you for your time.
Great for your friend, I also suggest to visit Kosovo someday, you will have great time here, for sure
Back to the issue:
Here is an interesting issue that is happening with the crawler.
Our own cms uses htaccess for rewrite purposes. I created 2 new files that are independent from CMS and tried to fetch them with WMT, and it worked like a charm.
These 2 independent files are:
www.gazetaexpress.com/test_manaferra.php
www.gazetaexpress.com/xhezidja.php
Then, I created an ajax page with our CMS, which contains only plain text, tried to fetch it by WMT and strangely enough it didn't work. To make sure that the .htaccess file is not affecting this behavior, I deleted the htaccess and tried to fetch it, but it didn't worked.
The ajax page is: www.gazetaexpress.com/page/xhezidja/?pageSEO=false
The site works perfectly for humans which access it via the browser.
I'm more than confused now!
-
A friend of mine just got back from Kosovo. It was the last stop on a tour of the Balkans. He had a pretty good time. Moving along...
I crawled about 12K URLs and hit almost 90 Internal Server Errors (500). It's probably not your core problem, but it's something to look at. Here are a few examples:
http://www.gazetaexpress.com/blihet/?search_category_id=1&searchFilter=1
http://www.gazetaexpress.com/shitet/?category_id=134&searchFilter=1
http://www.gazetaexpress.com/me-qera/?category_id=131&searchFilter=1
There was one actual page that threw a 500 at the time of crawl:
http://www.gazetaexpress.com/mistere/edhe-kesaj-i-thuhet-veze-22591/
The edhe kesaj page now resolves fine. (I'm not even going to pretend to understand or write Albanian.)
So there may be some issues with the server or hosting. If you haven't already, try this troubleshooter from Cloudflare.
-
Ah OK - well keep us updated with what you find. Someone else will chip in with other info if they have some
-Andy
-
We are suspecting that CloudFlare might be causing these troubles. We are trying everything, in the meantime i'm looking here to see if anyone has any similar experience or an idea for solution.
As for warnings, the only warning we had was the one last week (8/23/14) saying that Google bot can't acces our site:
Over the last 24 hours, Googlebot encountered 316 errors while attempting to connect to your site. Your site's overall connection failure rate is 7.5%.
-Granit
-
It doesn't look like a firewall, as I can crawl it with Screaming Frog. However, the server logs will be able to answer that one for you.
Without looking in depth, I'm not seeing anything that stands out to me - do you think that there have been changes to the server that could cause issues? What firewall is the server running? Also, if there were errors in crawling the site, you would see a warning about this.
-Andy
-
In mid-march website changed it's CMS but i don't think that could be the reason because until this week everything was working perfectly. I don't think it could have been compromised too. I'm still suspecting it could be the firewall blocking bots from crawling the site, but the server administrator couldn't find any evidence of this.
-
Hi Granit,
Has any work been done to the site in the last 2-3 months? Have you had any warnings in webmaster tools at all? I did once see a strange problem where Google wasn't crawling a site correctly because it had been compromised, but after checking, there is nothing like this on yours.
-Andy
-
No prb. Thanks a lot for your time. Let just hope that someone in the community will help with a solution
-
Unfortunately, I don't have a quick answer for you. Looking forward to seeing what other community members have to say on this one!
-
I'm looking at the http version in GWT
-
If I do a site:gazetaexpress.com in Google, I get some results that are http, and some results that are https. The https ones say there is an SSL connection error.
Are you looking at the http or https version in GWT?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How necessary is it to disavow links in 2017? Doesn't Google's algorithm take care of determining what it will count or not?
Hi All, So this is a obvious question now. We can see sudden fall or rise of rankings; heavy fluctuations. New backlinks are contributing enough. Google claims it'll take care of any low quality backlinks without passing pagerank to website. Other end we can many scenarios where websites improved ranking and out of penalty using disavow tool. Google's statement and Disavow tool, both are opposite concepts. So when some unknown low quality backlinks are pointing and been increasing to a website? What's the ideal measure to be taken?
Intermediate & Advanced SEO | | vtmoz0 -
Google v bing/yahoo
Had a penguin unnatural link building penalty was lifted last June. Since then no significant recovery in rankings. Saying that why do we still rank on page 1 on Bing and Yahoo but rank nowhere in Google. Any suggestions. Thanks David
Intermediate & Advanced SEO | | Archers0 -
Https://www.mywebsite.com/blog/tag/wolf/ setting tag pages as blog corner stone article?
We do not have enough content rich page to target all of our keywords. Because of that My SEO guy wants to set some corner stone blog articles in order to rank them for certain key words on Google. He is asking me to use the following rule in our article writing(We have blog on our website):
Intermediate & Advanced SEO | | AlirezaHamidian
For example in our articles when we use keyword "wolf", link them to the blog page:
https://www.mywebsite.com/blog/tag/wolf/
It seems like a good idea because in the tag page there are lots of material with the Keyword "wolf" . But the problem is when I search for keyword "wolf" for example on the Google, some other blog pages are ranked higher than this tag page. But he tells me in long run it is a better strategy. Any idea on this?0 -
What NAP format do I use if the USPS can't even find my client's address?
My client has a site already listed on Google+Local under "5208 N 1st St". He has some other NAPs, e.g., YellowPages, under "5208 N First Street". The USPS finds neither of these, nor any variation that I can possibly think of! Which is better? Do I just take the one that Google has accepted and make all the others like it as best I can? And doesn't it matter that the USPS doesn't even recognize the thing? Or no? Local SEO wizards, thanks in advance for your guidance!
Intermediate & Advanced SEO | | rayvensoft0 -
How to remove wrong crawled domain from Google index
Hello, I'm running a Wordpress multisite. When I create a new site for a client, we do the preparation using the multisite domain address (ex: cameleor.cobea.be). To keep the site protected we use the "multisite privacy" plugin which allows us to restrict the site to admin only. When site is ready we a domain mapping plugin to redirect the client domain to the multisite (ex: cameleor.com). Unfortunately, recently we switched our domain mappin plugin by another one and 2 sites got crawled by Google on their multsite address as well. So now when you type "cameleor" in Google you get the 2 domains in SERPS (see here http://screencast.com/t/0wzdrYSR). It's been 2 weeks or so that we fixed the plugin issue and now cameleor.cobea.be is redirected to the correct address cameleor.com. My question: how can I get rid of those wrong urls ? I can't remove it in Google Webmaster Tools as they belong to another domain (cf. cameleor.cobea.be for which I can't get authenticated) and I wonder if will ever get removed from index as they still redirect to something (no error to the eyes of Google)..? Does anybody has an idea or a solution for me please ? Thank you very much for your help Regards Jean-Louis
Intermediate & Advanced SEO | | JeanlouisSEO0 -
I'm pulling my hair out trying to figure out why google stopped crawling.. any help is appreciated
This is going to be kind of long, simply because there is a background to the domain name that is not typical to anybody in the world really and I'm not sure if its possible that it was penalized or ranked lower because of that or not. Because of that I'm going to include it with the hopes that giving the full picture some nice soul in the world who has more knowledge in this than me see's something or knows something and can point me in the right direction. Our site has been around for a few years, at one point the domain was seized by homeland security ICE, and then they had to give it back in Dec. which sparked a lot of the SOPA PIPA stuff and we became the poster child so to speak. The site had previously been up since 2008, but due to that whole mess the site was down for 13 months on the dreaded seized server with a scary warning graphic and site title which caused quite obviously a bunch of 404 errors and who knows what else damage to anything we'd had before that as far as page rank and incoming links. we had a lot of incoming links from high quality sites. We were advised upon getting the domain back to pretty much scrap all the old content that was on the site prior and just start fresh.. which we did. Googlebot started crawling slowly, but then as we started getting back into the swing of things people started linking to us,some with high page rank, we were getting indexed quite frequently and ranking high on search results in our niche.. Then something happened on March 4th, we had arguably our best day with google traffic, we'd been linked back by places like Huff Post etc for content in our niche.. and the next day literally it was a freefall. Darn near nothing. I've attached a screen shot from webmaster tools so you can see how drastic it was. I went crazy, trying to figure out what was wrong, searching obsessively through webmaster tools looking for any indication of a problem, searched the site on google site:dajaz1.com and what comes up is page 2 page 3 page 45 page 46. It's also taken to indexing our category and tag pages and even our search pages. I've now set those all to noindex follow but when I look at where the googlebots are at on the site, they're on the categories, pages, author pages, and tags. Some of our links are still getting indexed, but doing a search just of our site name and we're ranking below many of the media sites that have written about our legal issues, when a month ago we were at least top result for our own name. I've racked my brain trying to figure out the issue. I've disabled plugins, I'm on fetch as google bot all the time making sure our stuff is at least coming out as 200 (we had 2 days where we were getting 403 errors due to a super-cache issue, but once fixed googlebot returned like it never left) I've literally watched 1000 videos, read 100 forums, added in SEO plugins, tried to optimize the site to the point I'm worried I'm over doing it.. and still they've barely begun to crawl. As you can see there is some activity in the last 2-3 days, but even submitting a new site map once I changed the theme out of desperation it's only indexed 16. I've looked for errors all through webmaster tools and I can't find anything to tell me why that happened, how to fix it, and how to get googlebot to like us again. I'm pulling my hair out here. The links we have incoming are high quality links like huffington post , spin, complex, etc. Those haven't slowed down at all, we do outgoing links to sites we trust and are high quality as well. I've got interns working on how they're writing titles and such, I've gone through and attempted to fix duplicate pages and titles.. I've been going through and re-writing meta description tags What am I missing? I'm pulling my hair out trying to figure out what the issue is. Eternally grateful for any help provided. jnzb6.png
Intermediate & Advanced SEO | | malady0 -
What is the best tool to crawl a site with millions of pages?
I want to crawl a site that has so many pages that Xenu and Screaming Frog keep crashing at some point after 200,000 pages. What tools will allow me to crawl a site with millions of pages without crashing?
Intermediate & Advanced SEO | | iCrossing_UK0 -
Can't find my site on Bing, since ages
Hi Guys, Well, the problem seems normal but I guess it's not. I have tried many things, and nothing changed it, now I give it last try... ask so maybe you will help me. The problem is.. I can't find my site nowhere in Bing, I mean nowhere by not in first 20 pages for my keywords "beauty tips" and the site is: http://www.beauty-tips.net/. In my opinion it should be pretty high... maybe it's too high so I can't see it ;). I never had special problems with Bing, was easier to be there "somewhere" than in google, but with this one is totally opposite. Any ideas? Thanks for your time!
Intermediate & Advanced SEO | | Luke220