Why Moz OSE, Ahrefs, Majestic and so on, don't change their user agent while crawling?
-
Some blackhat websites, PBNs and other "cheaters" are using various methods to effectively block third party backlink checker bots (OSE, Ahrefs, Majestic...) : robot.txt, IP and such.
A simple solution for those bots would be to mimic Google by using its user agent string for example.
Or if not legally permitted (which I doubt) use some kind of randomness in user agent strings, urls, and IPs in order to prevent blocking.This should not be a big deal IMHO, am I missing something obvious ?
-
The ethics of the Internet dictate that you
- crawl politely,
- obey robots.txt and
- properly identify yourself
This isn't a new issue. Link networks and sites have blocked crawlers and manipulated Google for years. Fortuneatly, it's only a small fraction of the web. Also, it unlikely links from those networks have much value, so crawl priority would be super low anyway.
Actually, it could be viewed as beneficial when blackhat sites block OSE and aHrefs, because those sites often get penalized by Google, but 3rd party crawlers have no way to know this, so blocking effectively keeps them out of the indexes.
-
Well, I think bot blocking is an obvious problem even now, and will be more important tomorrow with all private networks as you can imagine.
MOZ (and others) should find and implement the best possible solution, I see no problem with TAGFEE as soon as you are transparent with regards to the fact that your bots are undetectable.
I understand that what I'm proposing is maybe not best nor wanted solution, but the problem must be addressed or OSE will soon have no value at all
What do you propose ?
-
I agree with George here -- we'd hear a huge outcry if we pretended to be Googlebot or a different bot. We'd also likely get blocked, as sometimes people only let in a certain few known bots/IPs to crawl their site. If we changed user agents and IPs regularly, it would not be cool or TAGFEE.
-
What about using different user agents and IPs regurarly in order to avoid detection ?
Is there any acceptable other solution ?
-
The reputation and integrity of the major players would be at stake here. If they changed their user agent identification (to spoof Googlebot or Bing or whatever) that could be detected, and they would be castigated. The crawler IP address and its user agent ID would be out of sync...
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My Backlinks are indexed in Ahrefs But Not Indexed in MOZ. Why?
I Create backlinks for my website to Increase DA But Backlinks are indexed and Ahrefs Show that backlinks but MOZ is showing backlinks. I am confuse Can you explain?
Link Building | | Seogamesokay1a0 -
Whats the best way to get credit links from sites i've built?
Hello! I've build 100's of sites. They mostly have site wide footer links pointing back to me. I know this is now frowned on. But does anyone have a good solution to get maximum value back from these? A few have a footer link to a credit page that then links back. I get quite a lot of work back from them. So I don't really fancy removing them. Many thanks in advance.
Link Building | | SolveWebMedia0 -
Use 301'd Domain for a new campaign
Hello everybody, My company is getting ready to start a new mediacampaign on a very specific subject. The mediacampaign is not directly targeted at our core business, the goal is more to inform our customers about a subject and do a little branding for our company. A nice (and expensive) infographic was built that is going to be the core content of the campaign. We want the infographic to get shared a lot and therefore some of my colleagues want the url to be as short as possible. The idea is to host the infographic on a url on our companysite, but use a 301'd, shorter domainname in our communications. We are going to be getting a lot of links to this empty 301'd domain which does nothing else then 301 to our companysite. I know that linkbuilding to a 301'd domain is an old blackhat tactic, that's the main reason I don't feel good about this. But i can't really find any info on this subject.
Link Building | | Laurensvda0 -
Changing permalinks, 301 nightmare
I would like to change some of permalinks on our website. I have created new posts with new permalinks and the old ones I have redirect by using yoast seo plugin. Today I have opened a category and I have realized that there are of course the post with new permalinks, but also the old once! What should I do with the old posts?
Link Building | | VillasDiani0 -
Open Site Explorer - Finding 404s from my competitor's external linking Root Domains
I want to find websites that have linked 404s to my competitors. My goal is to contact webmasters who have linked to my competitors with 404s, but I cannot figure out how to get Open Site Explorer to give me that data. I can easily see this in GWT for my own site, but I need this for my competition instead. Does anybody know a quick way to get this information?
Link Building | | Francisco_Meza0 -
Remove links or change anchor text?
I am currently in the process of cleaning up the link profile for a website that has been hit by Penguin thanks to loads of links from free directories with exact match keyword anchor texts (about 200 root domains from total of 300 root domains). I was wondering whether it's best to remove these un-natrual keyword anchor text links altogether, or change the anchor texts to brand (domain name, domainname.com, www.domainname.com, http://www.domainname.com)? I am currently trying to remove these links but was thinking it would be quicker to get to a healthier link profile (in terms of brand/commercial anchor text split) by altering the anchor texts and not removing them. Some of these directories are the worst of the worst on the other hand. Also note that I'm only really getting about a 30% response rate from the owners of these directories. Any thoughts? Many thanks in advance.
Link Building | | ec9awp0 -
Should a site run it's own affiliate program to get inbound links?
I've recently signed an ecommerce client who runs their affiliate program through Commission Junction. If they were to bring their program in-house and eventually get all those affiliate links pointed to their domain, would those links be counted?
Link Building | | HunterW0 -
What's the real deal with nofollow
I had a few questions regarding nofollow links. It seems like more and more sites, forums, etc. nofollow their links. Is it still worth trying to get a link from? I've heard only Google takes nofollows into consideration. Do other search engines (Bing specifically) "listen" to nofollows? Finally, when checking for nofollow, does it need to be right by the link(s) or can it be anywhere in the source? Thanks in advance!
Link Building | | DevonIntl0