Why Moz OSE, Ahrefs, Majestic and so on, don't change their user agent while crawling?

astronaute

Some blackhat websites, PBNs and other "cheaters" are using various methods to effectively block third party backlink checker bots (OSE, Ahrefs, Majestic...) : robot.txt, IP and such.

A simple solution for those bots would be to mimic Google by using its user agent string for example.
Or if not legally permitted (which I doubt) use some kind of randomness in user agent strings, urls, and IPs in order to prevent blocking.

This should not be a big deal IMHO, am I missing something obvious ?

Cyrus-Shepard

The ethics of the Internet dictate that you

crawl politely,
obey robots.txt and
properly identify yourself

This isn't a new issue. Link networks and sites have blocked crawlers and manipulated Google for years. Fortuneatly, it's only a small fraction of the web. Also, it unlikely links from those networks have much value, so crawl priority would be super low anyway.

Actually, it could be viewed as beneficial when blackhat sites block OSE and aHrefs, because those sites often get penalized by Google, but 3rd party crawlers have no way to know this, so blocking effectively keeps them out of the indexes.

astronaute

Well, I think bot blocking is an obvious problem even now, and will be more important tomorrow with all private networks as you can imagine.

MOZ (and others) should find and implement the best possible solution, I see no problem with TAGFEE as soon as you are transparent with regards to the fact that your bots are undetectable.

I understand that what I'm proposing is maybe not best nor wanted solution, but the problem must be addressed or OSE will soon have no value at all

What do you propose ?

KeriMorgret

I agree with George here -- we'd hear a huge outcry if we pretended to be Googlebot or a different bot. We'd also likely get blocked, as sometimes people only let in a certain few known bots/IPs to crawl their site. If we changed user agents and IPs regularly, it would not be cool or TAGFEE.

astronaute

What about using different user agents and IPs regurarly in order to avoid detection ?

Is there any acceptable other solution ?

George.Fanucci

The reputation and integrity of the major players would be at stake here. If they changed their user agent identification (to spoof Googlebot or Bing or whatever) that could be detected, and they would be castigated. The crawler IP address and its user agent ID would be out of sync...

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Why Moz OSE, Ahrefs, Majestic and so on, don't change their user agent while crawling?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

My website Moz Page Authority

Can i get a list of internal and external links from moz?

Link profile problem? User-generated reviews website suffering after Penguin

Buying Branded URL's

Multiple KW's , on-page and anchor text

How do paid directories like thomasnet.com do so well in the serps? Aren't the Panda updates supposed to be moving us away from this?

Do you think it's a good idea to try to find synergy between clients for blog posts/citations/links, or should you keep clients away from each other?

Does the url text count as anchor text in google's algo?