Block Googlebot from submit button
-
Hi,
I have a website where many searches are made by the googlebot on our internal engine. We can make noindex on result page, but we want to stop the bot to call the ajax search button - GET form (because it pass a request to an external API with associate fees).
So, we want to stop crawling the form button, without noindex the search page itself. The "nofollow" tag don't seems to apply on button's submit.
Any suggestion?
-
Hey Olivier,
You could detect the user agent and hide the button. The difference isn't substantial enough to be called cloaking.
Or you could make the button not actually a button tag, but another tag with that traps clicks with a JS event. I'm not sure Google's headless browser is smart enough to automate that. I would try this first and if it doesn't work switch to the user agent detection idea.
Let us know how it goes!
-Mike
-
-
Can always do it in a programme the bot's can't use or hide it behind a log in field etc.
I also give you the following for consumption :
http://moz.com/blog/12-ways-to-keep-your-content-hidden-from-the-search-engines
Good luck!
-
Hi Bernard
Are you able to provide a link to the web form containing the submit button?
Peter
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Submitting URLs After New Search Console
Hi Everyone I wanted to see how people submit their urls to Google and ensure they are all being indexed. I currently have an ecommerce site with 18,000 products. I have sitemaps setup, but noticed that the various product pages haven't started ranking yet. If I submit the individual url through the new Google Search Console I see the page ranking in a matter of minutes. Before the new Google Search Console you could just ask Google to Fetch/Render an XML sitemap and ask it to crawl all the links. I don't see the same functionality working today on Google Search Console and was wondering if there are any new techniques people could share. Thanks,
Intermediate & Advanced SEO | | abiondo
Anthony1 -
SEO Implications of firewalls that block "foreign connections"
Hello! A client's IT security team has firewalls on the site with GEO blocking enabled. This is to prevent foreign connections to applications as part of a contractual agreements with their own clients. Does anyone have any experience with workarounds for this? Thank you!
Intermediate & Advanced SEO | | SimpleSearch0 -
Crawl and Indexation Error - Googlebot can't/doesn't access specific folders on microsites
Hi, My first time posting here, I am just looking for some feedback on a indexation issue we have with a client and any feedback on possible next steps or items I may have overlooked. To give some background, our client operates a website for the core band and a also a number of microsites based on specific business units, so you have corewebsite.com along with bu1.corewebsite.com, bu2.corewebsite.com. The content structure isn't ideal, as each microsite follows a structure of bu1.corewebsite.com/bu1/home.aspx, bu2.corewebsite.com/bu2/home.aspx and so on. In addition to this each microsite has duplicate folders from the other microsites so bu1.corewebsite.com has indexable folders bu1.corewebsite.com/bu1/home.aspx but also bu1.corewebsite.com/bu2/home.aspx the same with bu2.corewebsite.com has bu2.corewebsite.com/bu2/home.aspx but also bu2.corewebsite.com/bu1/home.aspx. Therre are 5 different business units so you have this duplicate content scenario for all microsites. This situation is being addressed in the medium term development roadmap and will be rectified in the next iteration of the site but that is still a ways out. The issue
Intermediate & Advanced SEO | | ImpericMedia
About 6 weeks ago we noticed a drop off in search rankings for two of our microsites (bu1.corewebsite.com and bu2.corewebsite.com) over a period of 2-3 weeks pretty much all our terms dropped out of the rankings and search visibility dropped to essentially 0. I can see that pages from the websites are still indexed but oddly it is the duplicate content pages so (bu1.corewebsite.com/bu3/home.aspx or (bu1.corewebsite.com/bu4/home.aspx is still indexed, similiarly on the bu2.corewebsite microsite bu2.corewebsite.com/bu3/home.aspx and bu4.corewebsite.com/bu3/home.aspx are indexed but no pages from the BU1 or BU2 content directories seem to be indexed under their own microsites. Logging into webmaster tools I can see there is a "Google couldn't crawl your site because we were unable to access your site's robots.txt file." This was a bit odd as there was no robots.txt in the root directory but I got some weird results when I checked the BU1/BU2 microsites in technicalseo.com robots text tool. Also due to the fact that there is a redirect from bu1.corewebsite.com/ to bu1.corewebsite.com/bu4.aspx I thought maybe there could be something there so consequently we removed the redirect and added a basic robots to the root directory for both microsites. After this we saw a small pickup in site visibility, a few terms pop into our Moz campaign rankings but drop out again pretty quickly. Also the error message in GSC persisted. Steps taken so far after that In Google Search Console, I confirmed there are no manual actions against the microsites. Confirmed there is no instances of noindex on any of the pages for BU1/BU2 A number of the main links from the root domain to microsite BU1/BU2 have a rel="noopener noreferrer" attribute but we looked into this and found it has no impact on indexation Looking into this issue we saw some people had similar issues when using Cloudflare but our client doesn't use this service Using a response redirect header tool checker, we noticed a timeout when trying to mimic googlebot accessing the site Following on from point 5 we got a hold of a week of server logs from the client and I can see Googlebot successfully pinging the site and not getting 500 response codes from the server...but couldn't see any instance of it trying to index microsite BU1/BU2 content So it seems to me that the issue could be something server side but I'm at a bit of a loss of next steps to take. Any advice at all is much appreciated!0 -
Why would our server return a 301 status code when Googlebot visits from one IP, but a 200 from a different IP?
I have begun a daily process of analyzing a site's Web server log files and have noticed something that seems odd. There are several IP addresses from which Googlebot crawls that our server returns a 301 status code for every request, consistently, day after day. In nearly all cases, these are not URLs that should 301. When Googlebot visits from other IP addresses, the exact same pages are returned with a 200 status code. Is this normal? If so, why? If not, why not? I am concerned that our server returning an inaccurate status code is interfering with the site being effectively crawled as quickly and as often as it might be if this weren't happening. Thanks guys!
Intermediate & Advanced SEO | | danatanseo0 -
Blocking poor quality content areas with robots.txt
I found an interesting discussion on seoroundtable where Barry Schwartz and others were discussing using robots.txt to block low quality content areas affected by Panda. http://www.seroundtable.com/google-farmer-advice-13090.html The article is a bit dated. I was wondering what current opinions are on this. We have some dynamically generated content pages which we tried to improve after panda. Resources have been limited and alas, they are still there. Until we can officially remove them I thought it may be a good idea to just block the entire directory. I would also remove them from my sitemaps and resubmit. There are links coming in but I could redirect the important ones (was going to do that anyway). Thoughts?
Intermediate & Advanced SEO | | Eric_edvisors0 -
C Block IP Links Strategy
Hi guys i run a web design company and have around 50 sites that i have designed most dont have links but to us i was considering adding a footer link that will link to a blog page within that site, each post on each site will have unique content about the project and about us as a design company. As you can see most of my ip address are c blocks, any advice here please, thanks in advance Example Ip list
Intermediate & Advanced SEO | | Will_Craig
abc.32.230.1
def.20.252.37
ghi.48.68.82
zz.32.229.131
zz.32.231.208
zz.32.253.87
xx.170.40.170
xx.170.40.172
xx.170.40.232
xx.170.40.247
xx.170.40.32
xx.170.43.200
xx.170.44.103
xx.170.44.105
xx.170.44.108
xx.170.44.111
xx.170.44.127
xx.170.44.137
xx.170.44.146
xx.170.44.157
xx.170.44.77
xx.170.44.81
xx.170.44.86
xx.170.44.95
xx.170.44.96 [question edited by staff to remove full IP addresses]0 -
It appears that Googlebot Mobile will look for mobile redirects from the desktop site, but still use the SEO from the desktop site.
Is the above statement correct? I've read that its better to have different SEO titles & descriptions for mobile sites as users search differently on mobile devices. I've also read it's good to link build, keep text content on mobile sites etc to get the mobile site to rank. If I choose to not have titles & descriptions on my mobile site will Google just rank our desktop version & then redirect a user on a mobile device to our mobile site or should I be adding in titles & descriptions into the mobile site? Thanks so much for any help!
Intermediate & Advanced SEO | | DCochrane0 -
Zip Code Blocks the Search Engines!
I have a site where when you visit the product pages, it asks for your zip code. This is obviously blocking the bots from crawling the site. I know you can basically tell the bots how to ignore the zip code feature but I am not exactly sure how to do this. Any help would be appreciated
Intermediate & Advanced SEO | | lhawk0