Google can't access/crawl my site!
-
Hi
I'm dealing with this problem for a few days. In fact i didn't realize it was this serious until today when i saw most of my site "de-indexed" and losing most of the rankings.
[URL Errors: 1st photo]
8/21/14 there were only 42 errors but in 8/22/14 this number went to 272 and it just keeps going up.
The site i'm talking about is gazetaexpress.com (media news, custom cms) with lot's of pages.
After i did some research i came to the conclusion that the problem is to the firewall, who might have blocked google bots from accessing the site. But the server administrator is saying that this isn't true and no google bots have been blocked.
Also when i go to WMT, and try to Fetch as Google the site, this is what i get:
[Fetch as Google: 2nd photo]
From more than 60 tries, 2-3 times it showed Complete (and this only to homepage, never to articles).
What can be the problem? Can i get Google to crawl properly my site and is there a chance that i will lose my previous rankings?
Thanks a lot
Granit -
What did you do specifically to mitigate the problem? You can PM me, if you would like.
-
This applies to the guy from Albania.
Oh, this IS the guy from Albania. Never mind.
-
Great, thanks for letting us know what happened with this!
-
Hi all
Just wanted to let you know that we fixed the problem. We disabled CloudFlare which we found out was blocking Google bots. More about this issue can be found at: https://support.cloudflare.com/hc/en-us/articles/200169806-I-m-getting-Google-Crawler-Errors-What-should-I-do-
-
Hi Travis, thank you for your time.
Great for your friend, I also suggest to visit Kosovo someday, you will have great time here, for sure
Back to the issue:
Here is an interesting issue that is happening with the crawler.
Our own cms uses htaccess for rewrite purposes. I created 2 new files that are independent from CMS and tried to fetch them with WMT, and it worked like a charm.
These 2 independent files are:
www.gazetaexpress.com/test_manaferra.php
www.gazetaexpress.com/xhezidja.php
Then, I created an ajax page with our CMS, which contains only plain text, tried to fetch it by WMT and strangely enough it didn't work. To make sure that the .htaccess file is not affecting this behavior, I deleted the htaccess and tried to fetch it, but it didn't worked.
The ajax page is: www.gazetaexpress.com/page/xhezidja/?pageSEO=false
The site works perfectly for humans which access it via the browser.
I'm more than confused now!
-
A friend of mine just got back from Kosovo. It was the last stop on a tour of the Balkans. He had a pretty good time. Moving along...
I crawled about 12K URLs and hit almost 90 Internal Server Errors (500). It's probably not your core problem, but it's something to look at. Here are a few examples:
http://www.gazetaexpress.com/blihet/?search_category_id=1&searchFilter=1
http://www.gazetaexpress.com/shitet/?category_id=134&searchFilter=1
http://www.gazetaexpress.com/me-qera/?category_id=131&searchFilter=1
There was one actual page that threw a 500 at the time of crawl:
http://www.gazetaexpress.com/mistere/edhe-kesaj-i-thuhet-veze-22591/
The edhe kesaj page now resolves fine. (I'm not even going to pretend to understand or write Albanian.)
So there may be some issues with the server or hosting. If you haven't already, try this troubleshooter from Cloudflare.
-
Ah OK - well keep us updated with what you find. Someone else will chip in with other info if they have some
-Andy
-
We are suspecting that CloudFlare might be causing these troubles. We are trying everything, in the meantime i'm looking here to see if anyone has any similar experience or an idea for solution.
As for warnings, the only warning we had was the one last week (8/23/14) saying that Google bot can't acces our site:
Over the last 24 hours, Googlebot encountered 316 errors while attempting to connect to your site. Your site's overall connection failure rate is 7.5%.
-Granit
-
It doesn't look like a firewall, as I can crawl it with Screaming Frog. However, the server logs will be able to answer that one for you.
Without looking in depth, I'm not seeing anything that stands out to me - do you think that there have been changes to the server that could cause issues? What firewall is the server running? Also, if there were errors in crawling the site, you would see a warning about this.
-Andy
-
In mid-march website changed it's CMS but i don't think that could be the reason because until this week everything was working perfectly. I don't think it could have been compromised too. I'm still suspecting it could be the firewall blocking bots from crawling the site, but the server administrator couldn't find any evidence of this.
-
Hi Granit,
Has any work been done to the site in the last 2-3 months? Have you had any warnings in webmaster tools at all? I did once see a strange problem where Google wasn't crawling a site correctly because it had been compromised, but after checking, there is nothing like this on yours.
-Andy
-
No prb. Thanks a lot for your time. Let just hope that someone in the community will help with a solution
-
Unfortunately, I don't have a quick answer for you. Looking forward to seeing what other community members have to say on this one!
-
I'm looking at the http version in GWT
-
If I do a site:gazetaexpress.com in Google, I get some results that are http, and some results that are https. The https ones say there is an SSL connection error.
Are you looking at the http or https version in GWT?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google has discovered a URL but won't index it?
Hey all, have a really strange situation I've never encountered before. I launched a new website about 2 months ago. It took an awfully long time to get index, probably 3 weeks. When it did, only the homepage was indexed. I completed the site, all it's pages, made and submitted a sitemap...all about a month ago. The coverage report shows that Google has discovered the URL's but not indexed them. Weirdly, 3 of the pages ARE indexed, but the rest are not. So I have 42 URL's in the coverage report listed as "Excluded" and 39 say "Discovered- currently not indexed." When I inspect any of these URL's, it says "this page is not in the index, but not because of an error." They are listed as crawled - currently not indexed or discovered - currently not indexed. But 3 of them are, and I updated those pages, and now those changes are reflected in Google's index. I have no idea how those 3 made it in while others didn't, or why the crawler came back and indexed the changes but continues to leave the others out. Has anyone seen this before and know what to do?
Intermediate & Advanced SEO | | DanDeceuster0 -
How can a recruitment company get 'credit' from Google when syndicating job posts?
I'm working on an SEO strategy for a recruitment agency. Like many recruitment agencies, they write tons of great unique content each month and as agencies do, they post the job descriptions to job websites as well as their own. These job websites won't generally allow any linking back to the agency website from the post. What can we do to make Google realise that the originator of the post is the recruitment agency and they deserve the 'credit' for the content? The recruitment agency has a low domain authority and so we've very much at the start of the process. It would be a damn shamn if they produced so much great unique content but couldn't get Google to recognise it. Google's advice says: "Syndicate carefully: If you syndicate your content on other sites, Google will always show the version we think is most appropriate for users in each given search, which may or may not be the version you'd prefer. However, it is helpful to ensure that each site on which your content is syndicated includes a link back to your original article. You can also ask those who use your syndicated material to use the noindex meta tag to prevent search engines from indexing their version of the content." - But none of that can happen. Those big job websites just won't do it. A previous post here didn't get a sufficient answer. I'm starting to think there isn't an answer, other than having more authority than the websites we're syndicating to. Which isn't going to happen any time soon! Any thoughts?
Intermediate & Advanced SEO | | Mark_Reynolds0 -
Wrong country sites being shown in google
Hi, I am having some issues with country targeting of our sites. Just to give a brief background of our setup and web domains We use magento and have 7 connected ecommerce sites on that magento installation 1.www.tidy-books.co.uk (UK) - main site 2. www.tidy-books.com (US) - variations in copy but basically a duplicate of UK 3.www.tidy-books.it (Italy) - fully translated by a native speaker - its' own country based social medias and content regularly updated/created 4.www.tidy-books.fr (France) - fully translated by a native speaker - its' own country based social medias and content regularly updated/created 5.www.tidy-books.de (Germany) - fully translated by a native speaker - uits' own country based social medias and content regularly updated/created 6.www.tidy-books.com.au (Australia) - duplicate of UK 7.www.tidy-books.eu (rest of Europe) - duplicate of UK I’ve added the country and language href tags to all sites. We use cross domain canonical URLS I’ve targeted in the international targeting in Google webmaster the correct country where appropriate So we are getting number issues which are driving me crazy trying to work out why The major one is for example If you search with an Italian IP in google.it for our brand name Tidy Books the .com site is shown first then .co.uk and then all other sites followed on page 3 the correct site www.tidy-books.it The Italian site is most extreme example but the French and German site still appear below the .com site. This surely shouldn’t be the case? Again this problem happens with the co.uk and .com sites with when searching google.co.uk for our keywords the .com often comes up before the .co.uk so it seems we have are sites competing against each other which again can’t be right or good. The next problem lies in the errors we are getting on google webmaster on all sites is having no return tags in the international targeting section. Any advice or help would be very much appreciated. I’ve added some screen shots to help illustrate and happy to provide extra details. Thanks UK%20hreflang%20errors.png de%20search.png fr%20search.png it%20search.png
Intermediate & Advanced SEO | | tidybooks1 -
Why isn't the Google change of address tool working for me?
Last night I switched my site from http to https. Both sites are verified in Webmaster Tools but when I try to use the change of address it says- Your account doesn't contain any sites we can use for a change of address. Add and verify the new site, then try again. How do I fix this?
Intermediate & Advanced SEO | | EcommerceSite0 -
My homepage doesn't rank anymore. It's been replaced by irrelevant subpages which rank around 100-200 instead of top 5.
Hey guys, I think I got some kind of penalty for my homepage. I was in top5 for my keywords. Then a few days ago, my homepage stopped ranking for anything except searching for my domain name in Google. sitename.com/widget-reviews/ previously ranked #3 for "widget reviews"
Intermediate & Advanced SEO | | wearetribe
but now....
sitename.com/widget-training-for-pet-cats/ is ranking #84 for widget reviews instead. Similarly across all my other keywords, irrelevant, wrong pages are ranking. Did I get some kind of penalty?0 -
Google Algo update for over SEO'd sites: Is this a game changer?
This must be on the forum somewhere already but I cant find it. Google are updating there algo to penalise over SEO'd sites, is this a game changer? http://www.pcpro.co.uk/news/373630/google-to-demote-seo-heavy-sites Cheers
Intermediate & Advanced SEO | | activitysuper0 -
Can this site be optimised?
I have been told that because of the technology this site was developed with it cannot be changed for example urls title and meta tags cannot be changed. why is that and what other types of sites also cannot be changed. http://www.alliedpickfords.com/Pages/Landing.aspx For example i have been told alot of online stores cannot be optimised because the urls change every time some one goes to the page therefor you cant lionk to a certain page is that true and what is the way around it if any.
Intermediate & Advanced SEO | | duncan2740 -
Google Places / Google Analytics
I apologize first if this comes across as extremely novice, but I realized I really didn't know the answer and so - here I am. 🙂 Is anyone familiar with tracking google place traffic in google analytics? Is it possible? I'd love to know how many of our visitors are coming from our google place listings (we have several locations throughout the state.) Much gratitude in advance ~ Alicia
Intermediate & Advanced SEO | | Aaronetics0