Why Moz OSE, Ahrefs, Majestic and so on, don't change their user agent while crawling?
-
Some blackhat websites, PBNs and other "cheaters" are using various methods to effectively block third party backlink checker bots (OSE, Ahrefs, Majestic...) : robot.txt, IP and such.
A simple solution for those bots would be to mimic Google by using its user agent string for example.
Or if not legally permitted (which I doubt) use some kind of randomness in user agent strings, urls, and IPs in order to prevent blocking.This should not be a big deal IMHO, am I missing something obvious ?
-
The ethics of the Internet dictate that you
- crawl politely,
- obey robots.txt and
- properly identify yourself
This isn't a new issue. Link networks and sites have blocked crawlers and manipulated Google for years. Fortuneatly, it's only a small fraction of the web. Also, it unlikely links from those networks have much value, so crawl priority would be super low anyway.
Actually, it could be viewed as beneficial when blackhat sites block OSE and aHrefs, because those sites often get penalized by Google, but 3rd party crawlers have no way to know this, so blocking effectively keeps them out of the indexes.
-
Well, I think bot blocking is an obvious problem even now, and will be more important tomorrow with all private networks as you can imagine.
MOZ (and others) should find and implement the best possible solution, I see no problem with TAGFEE as soon as you are transparent with regards to the fact that your bots are undetectable.
I understand that what I'm proposing is maybe not best nor wanted solution, but the problem must be addressed or OSE will soon have no value at all
What do you propose ?
-
I agree with George here -- we'd hear a huge outcry if we pretended to be Googlebot or a different bot. We'd also likely get blocked, as sometimes people only let in a certain few known bots/IPs to crawl their site. If we changed user agents and IPs regularly, it would not be cool or TAGFEE.
-
What about using different user agents and IPs regurarly in order to avoid detection ?
Is there any acceptable other solution ?
-
The reputation and integrity of the major players would be at stake here. If they changed their user agent identification (to spoof Googlebot or Bing or whatever) that could be detected, and they would be castigated. The crawler IP address and its user agent ID would be out of sync...
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My website Moz Page Authority
My website is Best Comedy Tickets. The current homepage has a Page Authority is 41, mR : 4:53 and Domain Authority is 31. However all my sub pages are basically non existent. All the other pages on the site are a Page Authority of 1, mR : 0.00. Why is this and how do I increase this ? Are there any tips or tricks to increase subpages ? I appreciate all the feed back and am grateful for any tips or helpful advice.
Link Building | | JosephSantiagoNYC0 -
Can i get a list of internal and external links from moz?
My issue is that i have 133k external links and 730k total links. And from what i've read on the forums this means that we have about 597k internal links (730k total links minus 133k external links). And this seems like HUGE number of internal links. I somehow think it might be affecting our seo rankings as it might look like some "fishy" practice to Google. So in the graphs i can see that this number of total links has actually risen from 400k to 700k in the past months.. And since our external link count hasn't changed that much it means the number of internal links has grown a lot. So my idea is to check what are these internal links so I can better understand where they come from. Can anyone explain how come we have such a HUGE number of internal links? And more importnantly how can we see a list of these internal links so we could possibly rectify this and check if all is in order? I am adding a screnshot to give better insight jiDONUw
Link Building | | urkeman0 -
Link profile problem? User-generated reviews website suffering after Penguin
Hi all, We run a user-generated reviews website and we link to the websites of businesses with reviews. Many of them also link back to their reviews page (and in fact, we give them a widget to encourage this). I have a couple of questions about this; Most of these links are to legitimate, but fairly small and unknown businesses - which means a lot of our inbound (and outbound) links are low quality. Could this be hurting us? Given that the links are 2-way, could this be viewed as spam? Some businesses (including a couple with high traffic) have included our widget in their website footer - so there are thousands of (low quality) links from a single domain. Problem? The reason I'm asking is that our traffic suffered in the Panda update, and is again suffering with Penguin - despite the fact that we have a lot of unique content in the form of user-reviews. We've never engaged in any black-hat techniques. Any ideas or advice would be much appreciated. SEO novice here!
Link Building | | WOMO0 -
Buying Branded URL's
Hello A competitor of mine has a bunch of branded terms that they purchased, with backlinks point to their main site, is there a specific reason they do this? If I analyze the links on the "branded url" theres it has very little juice/authority? Is it worth it to do this? ex www.siteCOUPON.com. or www.mainsite.biz I have a .ca and something similar to our site re-directing through register.com in Iframe. But would they just be using those terms because it has high search volume? Would this help me in anyway? Thank you!
Link Building | | TP_Marketing0 -
Multiple KW's , on-page and anchor text
Hello, For each page on my site, I've targeted one primary keyword and three to four secondary keywords. All of the keywords variants are tightly themed. With some on-page, I've ranked page two or three for all of the keywords and many are starting to convert based on Analytics data. Each page scores an "A" using the SEOmoz KW targeting tool for the "primary keyword only". For secondary keywords, I've only included words but not the complete keyword. For example, if the primary keyword is "blue green widgets" and the secondary keyword is "get blue green widgets", I've included the word "get" throughout the copy to target the secondary keyword. My questions are... Should I include each secondary keyword once in the copy and not just the word "get" for example? Just wondering if there is a better approach to target all of the keywords via on-page. When getting links to each page, how would you vary the anchor text to target all of the keywords, primary and secondary? Thanks!
Link Building | | ShaneO0 -
How do paid directories like thomasnet.com do so well in the serps? Aren't the Panda updates supposed to be moving us away from this?
With all of the updates/changes to Google's algo, I assumed that paid listings & links like those on thomasnet.com would have less merit. Is this an incorrect assumption?
Link Building | | PropelMike0 -
Do you think it's a good idea to try to find synergy between clients for blog posts/citations/links, or should you keep clients away from each other?
Say you have for example three (in this case) clients, and: Client A sells red widgets Client B is a doctor Client C sellls blue widgets With some research, you find that: Red widgets (A) can make the process of blue widget creation (C) even more effective. Red widgets (A) can protect you from harmful things that doctors (B) are qualified to recommend that you stay away from. Furthermore, there are things that doctors (B) recommend that you do in order to maximize the benefits of red widgets (A) Blue widgets (C) carry with them certain potential health risks, which according to doctors (B) can be minimized using the following means Sometimes blue widgets (C) can be used to effectively repair red widget (A) factories ...and so forth. Sure you're really writing these articles to generate links and exchange authority, and frankly you started with "how can I find synergy between these clients?" rather than a with a great article subject that needed a citation which luckily happened to be another client, but the citations are legitimate and the clients are qualified to speak on the subjects where their expertise and interests overlap. Would you consider going ahead with this? Does anyone have any experience doing it? I could see potential pitfalls if clients were to interact with each other, but keeping yourself as the intermediary might well work and overall it seems like a decent way to grab low-hanging fruit as they say. What do you guys think?
Link Building | | PathMarketing0 -
Does the url text count as anchor text in google's algo?
I always try to strive for getting good anchortext in links that people link to my sites with, but sometimes they are not very web savvy, or just plain not willing to format the link the way i want it and we end up with links that use the URL as the anchortext. My question is, if i format a url such that it has keywords in it, and when someone makes a link like this: http://www.storwell.com/self-storage-toronto-bursary-application-form.php does google consider the words: storwell, self, storage, toronto, bursary, application & form as the anchor text?
Link Building | | adriandg0