Some bots excluded from crawling client's domain
-
Hi all!
My client is in healthcare in the US and for HIPAA reasons, blocks traffic from most international sources.
a. I don't think this is good for SEO
b. The site won't allow Moz bot or Screaming Frog bot to crawl it. It's so frustrating.
We can't figure out what mechanism they are utilizing to execute this. Any help as we start down the rabbit hole to remedy is much appreciated.
thank you!
-
The main reason it's not good is that Google crawl from different data-centers around the world. So one day they may think the site is up, then the next they may think the site is gone and down
Typically you use a user-agent lance to pierce these kinds of setups. Screaming Frog for example, you can pre-select from a variety of user-agents (including 'googlebot' and Chrome) but you can also author or write your own user-agent
Write a long one that looks like an encryption key. Tell your client the user agent you have defined, let them create and exemption for it within their spam-defense system. Insert the user-agent (which no one else has or uses) into Screaming Frog, use it to allow the crawler to pierce the defense grid
Typically you would want to exempt 'Googlebot' (as a user agent) from these defense systems, but it comes with a risk. Anyone with basic scripting knowledge or who knows how to install Chrome extensions, can alter the user-agent of their script (or web browser, it's under the user's control) with ease and it is widely known that many sites make an exception for 'Googlebot' - thus it becomes a common vulnerability. For example, lots of publishers create URLs which Google can access and index, yet if you are a bog standard user they ask you to turn off ad-blockers or pay a fee
Download the Chrome User-Agent extension, set your user-agent to "googlebot" and sail right through. Not ideal from a defense perspective
For this reason I have often wished (and I am really hoping someone from Google might be reading) that in Search Console, you could tell Google a custom user-agent string and give it to them. You could then exempt that, safe in the knowledge that no one else knows it, and Google would use your own custom string to identify themselves when accessing your site and content. Then everyone could be safe, indexable and happy
We're not there yet
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Sub domain? Micro site? What's the best solution?
My client currently has two websites to promote their art galleries in different parts of the country. They have bought a new domain (let's call it buyart.com) which they would eventually like to use as an e-commerce platform. They are wondering whether they keep their existing two gallery websites (non e-commerce) separate as they always have been, or somehow combine these into the new domain and have one overarching brand (buyart.com). I've read a bit on subdomains and microsites but am unsure at this stage what the best option would be, and what the pros and cons are. My feeling is to bring it all together under buyart.com so everything is in one place and creates a better user journey for anyone who would like to visit. Thoughts?
Technical SEO | | WhitewallGlasgow0 -
My Website's Home Page is Missing on Google SERP
Hi All, I have a WordPress website which has about 10-12 pages in total. When I search for the brand name on Google Search, the home page URL isn't appearing on the result pages while the rest of the pages are appearing. There're no issues with the canonicalization or meta titles/descriptions as such. What could possibly the reason behind this aberration? Looking forward to your advice! Cheers
Technical SEO | | ugorayan0 -
Moving wordpress to it's own server
Our company wants to remove wordpress from our current windows OS server at provider 1 and move it to a new server at provider 2. Godaddy handles our DNS. I would like to have it on the same domain without masking. I would like to make a DNS entry on godaddy so that our current server and our new server can use the same URL (ie sellstuff.com). But I only want the DNS to direct traffic to our current server. The goal here is to have the new server using the same URL as the old server so nothing needs to be masked once traffic is redirected with a 301 rule in the htaccess file. But no traffic outside of the 301 rule will end up going to the new server. I would then like to edit the htaccess file on our current server to redirect to the new servers IP address when someone goes to sellstuff.com/blog. Does this make since and is it possible?
Technical SEO | | larsonElectronics0 -
My website's pages are not being indexed correctly
Hi, One of our websites, which is actually a price comparison engine, facing indexing problem at Google. When we check “site:mywebsite.com “, there are lots of pages indexed which are not from mywebsite.com but from merchants websites. The index result page also shows merchant’s page title. In some cases the title is from merchant’s site but when the given link is accessed it points to mywebsite.com/index. Also the cache displays the merchant’s product page as the last indexed version rather than showing ours. The mywebsite.com has quite few Merchants that send us their product feed. Those products are listed on comparison page with prices. The merchant’s links on comparison page are all no-follow links but some of the (not all) merchant’s product pages are indexed against mywebsite.com as mentioned above instead of product comparison page of mywebsite.com How can we fix the issue? Thanks!
Technical SEO | | digitalMSB0 -
No confirmation page on Google's Disavow links tool?
I've been going through and doing some spring cleaning on some spammy links to my site. I used Google's Disavow links tool, but after I submit my text file, nothing happens. Should I be getting some sort of confirmation page? After I upload my file, I don't get any notifications telling me Google has received my file or anything like that. It just takes me back to this page: http://cl.ly/image/0S320q46321R/Image 2013-04-26 at 11.15.25 AM.png Am I doing something wrong or is this what everyone else is seeing too?
Technical SEO | | shawn810 -
What's best practice for blog meta titles?
I have the option of placing meta titles on the actual blog, or on the blog category on my site. Should I have separate meta titles for each blog or bundle them under a category and try to drive traffic to the category? Can anyone help with best practice?
Technical SEO | | Lubeman0 -
Page has a 301 redirect, now we want to move it back to it's original place
Hi - This is the first time I've asked a question! My site, www.turnkeylandlords.co.uk is going through a bit of a redesign (for the 2nd time since it launched in July 2012...) First redesign meant we needed to move a page (https://www.turnkeylandlords.co.uk/about-turnkey-mortgages/conveyancing/) from the root to the 'about-us' section. We implemented a 301 redirect and everything went fine. I found out yesterday that the plan is to move this page (and another one as well, but it's the same issue so no point in sharing the URL) back to the root. What do I do? A new 301? Wouldn't this create a loop? Or just delete the original 301? Thanks in advance, Amelia
Technical SEO | | CommT0 -
The 'On Page' section of SEOMOZ
How does SEOMOZ choose a keyword for a page, for example it has ranked one of my pages for a search term which does not really appear on that page and then given it an F - how do I change the key word association? Secondly, when I first started using SEOMOZ I could change the page and then click the button 'Grade my on-page optimization' and it would show an immediate update - does anyone know why this has been stopped, as it is very useful to know you have got the page right away to an A for example.
Technical SEO | | bowravenseo0