Why Moz OSE, Ahrefs, Majestic and so on, don't change their user agent while crawling?
-
Some blackhat websites, PBNs and other "cheaters" are using various methods to effectively block third party backlink checker bots (OSE, Ahrefs, Majestic...) : robot.txt, IP and such.
A simple solution for those bots would be to mimic Google by using its user agent string for example.
Or if not legally permitted (which I doubt) use some kind of randomness in user agent strings, urls, and IPs in order to prevent blocking.This should not be a big deal IMHO, am I missing something obvious ?
-
The ethics of the Internet dictate that you
- crawl politely,
- obey robots.txt and
- properly identify yourself
This isn't a new issue. Link networks and sites have blocked crawlers and manipulated Google for years. Fortuneatly, it's only a small fraction of the web. Also, it unlikely links from those networks have much value, so crawl priority would be super low anyway.
Actually, it could be viewed as beneficial when blackhat sites block OSE and aHrefs, because those sites often get penalized by Google, but 3rd party crawlers have no way to know this, so blocking effectively keeps them out of the indexes.
-
Well, I think bot blocking is an obvious problem even now, and will be more important tomorrow with all private networks as you can imagine.
MOZ (and others) should find and implement the best possible solution, I see no problem with TAGFEE as soon as you are transparent with regards to the fact that your bots are undetectable.
I understand that what I'm proposing is maybe not best nor wanted solution, but the problem must be addressed or OSE will soon have no value at all
What do you propose ?
-
I agree with George here -- we'd hear a huge outcry if we pretended to be Googlebot or a different bot. We'd also likely get blocked, as sometimes people only let in a certain few known bots/IPs to crawl their site. If we changed user agents and IPs regularly, it would not be cool or TAGFEE.
-
What about using different user agents and IPs regurarly in order to avoid detection ?
Is there any acceptable other solution ?
-
The reputation and integrity of the major players would be at stake here. If they changed their user agent identification (to spoof Googlebot or Bing or whatever) that could be detected, and they would be castigated. The crawler IP address and its user agent ID would be out of sync...
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate content on site->citations? How important is it to change it?
Hi there, I recently realised that the citations and directories i was building used the same content than the one on my website. I know this is not best practice. I will for sure make sure it doesn't happen in the future, but I am affraid of the ones i built in the past. How much do you think this would affect my rankings, and do you think this is a priority to go through my citations and directories to modify it?
Link Building | | H.M.N.0 -
Moz authority. Should this be taken as just a guideline.
Hi Guys. We have an opportunity to get a back link from a charity site...the site of "Friends of Richmond park'. I've just checked it out in open site explorer and Moz grades it's domain authority as 25/100. However, looking in more detail the site has gained quite a few high authority links, both in terms of DA and PA. Many of these links are reciprocal i.e) Friends of Richmond park have links to sites which also link back to them. So i'm wondering... should the 25/100 DA be taken as a basic guideline? Or is this quite a sophisticated measure that has taken into consideration the reciprocal links and 'downgraded' them as Google might? Isaac.
Link Building | | isaac6630 -
How long until links 'fall off'?
If I have site A linking to site B, and take down the links - does anyone have any experience in about how long they take to 'fall off', that is stop appearing in Webmaster Tools or Moz? I'm going on three weeks currently. Perhaps this takes months?
Link Building | | GFujioka0 -
Use 301'd Domain for a new campaign
Hello everybody, My company is getting ready to start a new mediacampaign on a very specific subject. The mediacampaign is not directly targeted at our core business, the goal is more to inform our customers about a subject and do a little branding for our company. A nice (and expensive) infographic was built that is going to be the core content of the campaign. We want the infographic to get shared a lot and therefore some of my colleagues want the url to be as short as possible. The idea is to host the infographic on a url on our companysite, but use a 301'd, shorter domainname in our communications. We are going to be getting a lot of links to this empty 301'd domain which does nothing else then 301 to our companysite. I know that linkbuilding to a 301'd domain is an old blackhat tactic, that's the main reason I don't feel good about this. But i can't really find any info on this subject.
Link Building | | Laurensvda0 -
Is anybody else noticing a dramatic change to their 'links to your site' section in Google Webmaster Tools?
Hey,
Link Building | | ChrisHolgate
Over the last six months or so we've been going through our backlink profile and cleaning up links from poor quality sources. Week by week there have been small changes in our Google Webmaster Tools 'links to your site' section to reflect this. I logged on this morning however and there has been a dramatic shift in the information displayed. Pretty much every bad link has been removed from the list including sites I know for a fact are still linking to us as they didn't communicate at all to our removal requests. Additionally, rather than showing the top 1000 links to our site as it used to, WMT is only showing 73 linking domains. The remaining 73 domains are good natural links from high quality sources. I'm guessing Google are just in the middle of an update and that the remaining linking domains (including the bad ones) will reappear shortly. This isn’t a request for advice or help but I’m just curious as to whether anybody else is seeing anything similar?0 -
Buying Branded URL's
Hello A competitor of mine has a bunch of branded terms that they purchased, with backlinks point to their main site, is there a specific reason they do this? If I analyze the links on the "branded url" theres it has very little juice/authority? Is it worth it to do this? ex www.siteCOUPON.com. or www.mainsite.biz I have a .ca and something similar to our site re-directing through register.com in Iframe. But would they just be using those terms because it has high search volume? Would this help me in anyway? Thank you!
Link Building | | TP_Marketing0 -
I wish to know how can I track users via what keywords they are searching and coming to my site exactly. These are non paid keywords.
There is a list of non paid keywords which is showing up but is that all ? I wish to know all the keywords people are searching and coming to my site? How can I accomplish the same.
Link Building | | shanky10 -
What's the real deal with nofollow
I had a few questions regarding nofollow links. It seems like more and more sites, forums, etc. nofollow their links. Is it still worth trying to get a link from? I've heard only Google takes nofollows into consideration. Do other search engines (Bing specifically) "listen" to nofollows? Finally, when checking for nofollow, does it need to be right by the link(s) or can it be anywhere in the source? Thanks in advance!
Link Building | | DevonIntl0