Some bots excluded from crawling client's domain
-
Hi all!
My client is in healthcare in the US and for HIPAA reasons, blocks traffic from most international sources.
a. I don't think this is good for SEO
b. The site won't allow Moz bot or Screaming Frog bot to crawl it. It's so frustrating.
We can't figure out what mechanism they are utilizing to execute this. Any help as we start down the rabbit hole to remedy is much appreciated.
thank you!
-
The main reason it's not good is that Google crawl from different data-centers around the world. So one day they may think the site is up, then the next they may think the site is gone and down
Typically you use a user-agent lance to pierce these kinds of setups. Screaming Frog for example, you can pre-select from a variety of user-agents (including 'googlebot' and Chrome) but you can also author or write your own user-agent
Write a long one that looks like an encryption key. Tell your client the user agent you have defined, let them create and exemption for it within their spam-defense system. Insert the user-agent (which no one else has or uses) into Screaming Frog, use it to allow the crawler to pierce the defense grid
Typically you would want to exempt 'Googlebot' (as a user agent) from these defense systems, but it comes with a risk. Anyone with basic scripting knowledge or who knows how to install Chrome extensions, can alter the user-agent of their script (or web browser, it's under the user's control) with ease and it is widely known that many sites make an exception for 'Googlebot' - thus it becomes a common vulnerability. For example, lots of publishers create URLs which Google can access and index, yet if you are a bog standard user they ask you to turn off ad-blockers or pay a fee
Download the Chrome User-Agent extension, set your user-agent to "googlebot" and sail right through. Not ideal from a defense perspective
For this reason I have often wished (and I am really hoping someone from Google might be reading) that in Search Console, you could tell Google a custom user-agent string and give it to them. You could then exempt that, safe in the knowledge that no one else knows it, and Google would use your own custom string to identify themselves when accessing your site and content. Then everyone could be safe, indexable and happy
We're not there yet
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can a page that's 301 redirected get indexed / show in search results?
Hey folks, have searched around and haven't been able to find an answer to this question. I've got a client who has very different search results when including his middle initial. His bio page on his company's website has the slug /people/john-smith; I'm wondering if we set up a duplicate bio page with his middle initial (e.g. /people/john-b-smith) and then 301 redirect it to the existent bio page, whether the latter page would get indexed by google and show in search results for queries that use the middle initial (e.g. "john b smith"). I've already got the metadata based on the middle initial version but I know the slug is a ranking signal and since it's a direct match to one of his higher volume branded queries I thought it might help to get his bio page ranking more highly. Would that work or does the 301'd page effectively cease to exist in Google's eyes?
Technical SEO | | Greentarget0 -
404 crawl errors ending with your domain name??
Hello, I have a crawl test with numerous 404 errors ending with my domain name..? Not sure what the cause is. Plugins? Ecommerce? I use Wordpress if that could lead to an answer. Thanks for your time. K
Technical SEO | | Hydraulicgirl0 -
Need advice for new site's structure
Hi everyone, I need to update the structure of my site www.chedonna.it Basicly I've two main problems: 1. I've 61.000 index tag (more with no post)2. The category of my site are noindex I thought to fix my problem making the category index and the tag noindex, but I'm not sure if this is the best solution because I've a great number of tag idexed by Google for a long time. Mybe it is correct just to make the category index and linking it from the post and leave the tag index. Could you please let me know what's your opinion? Regards.
Technical SEO | | salvyy0 -
Roger bot taking a long time to crawl site
Hi all, I've noticed Roger bot is taking a long time to crawl my new site. It started on the 28th Feb 2013 and is still going. There aren't many pages at the moment. Any ideas please? thanks a lot, Mark.
Technical SEO | | caterfor1 -
Building URL's is there a difference between = and - ?
I have a Product Based Search site where the URL's are built dynamically based on the User input Parameters Currently I use the '=' t o built the URL based on the search parameters for eg: /condition=New/keywords=Ford+Focus/category=Exterior etc Is there any value in using hypen's instead of = ? Could you please help me in any general guidelines to follow
Technical SEO | | Chaits0 -
How to remove entire directory off Google's Cache
The old version of Webmaster tools used to allow you to select whether to remove a single page from index or an entire directory.
Technical SEO | | vpahwa
http://www.canig.com/pageimages/submitremovalrequest.jpg How can I do this with the new Webmaster Tools? I can't find the option to remove an entire directory.0 -
Can I format my H1 to be smaller than H2's and H3's on the same page?
I would like to create a web design with 12px H1 and for sub headings on the page to be more like 24px. Will search engines see this and dislike it? The reason for doing it is that I want to put a generic page title in the banner, and more poetic headings above the main body. Example: Small H1: Wholesale coffee, online coffee shop and London roastery Large h2: Respect the bean... Thanks
Technical SEO | | Crumpled_Dog
Scott0 -
Does removing product listings help raise SERP's on other pages?
Does removing content ever make sense? We have out of stock products that are left on the site (in an out of stock section) specifically for SEO value, but I am not sure how to approach the problem from a bottom line conversion stand point. Do we leave out of stock products and hope that they turn into a conversion rate via cross selling, or do out of stock products lower the value of other pages by "stealing" link juice and pagerank from the rest of the site? (and effectively driving interest away) What is your perspective? Do you believe that any content that is related or semi-related to your main focus is beneficial, or does it only make sense to have strong content that has a higher rate of conversion and overall site engagement?
Technical SEO | | 13375auc30