Block Googlebot from submit button
-
Hi,
I have a website where many searches are made by the googlebot on our internal engine. We can make noindex on result page, but we want to stop the bot to call the ajax search button - GET form (because it pass a request to an external API with associate fees).
So, we want to stop crawling the form button, without noindex the search page itself. The "nofollow" tag don't seems to apply on button's submit.
Any suggestion?
-
Hey Olivier,
You could detect the user agent and hide the button. The difference isn't substantial enough to be called cloaking.
Or you could make the button not actually a button tag, but another tag with that traps clicks with a JS event. I'm not sure Google's headless browser is smart enough to automate that. I would try this first and if it doesn't work switch to the user agent detection idea.
Let us know how it goes!
-Mike
-
-
Can always do it in a programme the bot's can't use or hide it behind a log in field etc.
I also give you the following for consumption :
http://moz.com/blog/12-ways-to-keep-your-content-hidden-from-the-search-engines
Good luck!
-
Hi Bernard
Are you able to provide a link to the web form containing the submit button?
Peter
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots blocked by pages webmasters tools
a mistake made in software. How can I solve the problem quickly? help me. XTRjH
Intermediate & Advanced SEO | | mihoreis0 -
If I block a URL via the robots.txt - how long will it take for Google to stop indexing that URL?
If I block a URL via the robots.txt - how long will it take for Google to stop indexing that URL?
Intermediate & Advanced SEO | | Gabriele_Layoutweb0 -
Eliminate render blocking javascript and css recommendation?
Our site's last Red flag issue is the "eliminate render blocking javascript and css" message. I don't know how to do that, and while I'm not sure if I could spend hours/days cutting and pasting and guessing until I made progress, I'd rather not. Does anyone know of a plugin that will just do this? Or, if not, how much would it cost to get a web developer to do this? Also, if there is not plugin (and it didn't look like there was when I looked) how long do you think this would take someone who knows what they are doing to complete. The site is: www.kempruge.com Thanks for any tips and/or suggestions, Ruben
Intermediate & Advanced SEO | | KempRugeLawGroup0 -
Block Level Link Juice
I need a better understanding of how links in different parts of the page pass juice. Much has been written about how footer links pass less juice than other parts of the page. The question I have is that if a page has a hypothetical 1000 points of Link Juice and can pass on +/-800 points via links, and I have 1 and only 1 link in the footer to another page, does it pass the full 800 points? Or... since footers only pass a small fraction of link juice, it passes lets say 80 points, and the other 720 points stays locked up on the page. This question is a hypothetical - I'm just trying to understand relationships. I don't know if I've explained the question too well, but if someone could answer i it, or point me in the right direction, I would appreciate it.
Intermediate & Advanced SEO | | CsmBill0 -
How to Block Google Preview?
Hi, Our site is very good for Javascript-On users, however many pages are loaded via AJAX and are inaccessible with JS-off. I'm looking to make this content available with JS-off so Search Engines can access them, however we don't have the Dev time to make them 'pretty' for JS-off users. The idea is to make them accessible with JS-off, but when requested by a user with JS-on the user is forwarded to the 'pretty' AJAX version. The content (text, images, links, videos etc) is exactly the same but it's an enormous amount of effort to make the JS-off version 'pretty' and I can't justify the development time to do this. The problem is that Googlebot will index this page and show a preview of the ugly JS-off page in the preview on their results - which isn't good for the brand. Is there a way or meta code that can be used to stop the preview but still have it cached? My current options are to use the meta noarchive or "Cache-Control" content="no-cache" to ask Google to stop caching the page completely, but wanted to know if there was a better way of doing this? Any ideas guys and girls? Thanks FashionLux
Intermediate & Advanced SEO | | FashionLux0 -
Submitting URLs multiple times in different sitemaps
We have a very dynamic site, with a large number of pages. We use a sitemap index file, that points to several smaller sitemap files. The question is: Would there be any issue if we include the same URL in multiple sitemap files? Scenario: URL1 appears on sitemap1. 2 weeks later, the page at URL1 changes and we'd like to update it on a sitemap. Would it be acceptable to add URL1 as an entry in sitemap2? Would there be any issues with the same URL appearing multiple times? Thanks.
Intermediate & Advanced SEO | | msquare0 -
10,000 New Pages of New Content - Should I Block in Robots.txt?
I'm almost ready to launch a redesign of a client's website. The new site has over 10,000 new product pages, which contain unique product descriptions, but do feature some similar text to other products throughout the site. An example of the page similarities would be the following two products: Brown leather 2 seat sofa Brown leather 4 seat corner sofa Obviously, the products are different, but the pages feature very similar terms and phrases. I'm worried that the Panda update will mean that these pages are sand-boxed and/or penalised. Would you block the new pages? Add them gradually? What would you recommend in this situation?
Intermediate & Advanced SEO | | cmaddison0 -
Blocking Dynamic URLs with Robots.txt
Background: My e-commerce site uses a lot of layered navigation and sorting links. While this is great for users, it ends up in a lot of URL variations of the same page being crawled by Google. For example, a standard category page: www.mysite.com/widgets.html ...which uses a "Price" layered navigation sidebar to filter products based on price also produces the following URLs which link to the same page: http://www.mysite.com/widgets.html?price=1%2C250 http://www.mysite.com/widgets.html?price=2%2C250 http://www.mysite.com/widgets.html?price=3%2C250 As there are literally thousands of these URL variations being indexed, so I'd like to use Robots.txt to disallow these variations. Question: Is this a wise thing to do? Or does Google take into account layered navigation links by default, and I don't need to worry. To implement, I was going to do the following in Robots.txt: User-agent: * Disallow: /*? Disallow: /*= ....which would prevent any dynamic URL with a '?" or '=' from being indexed. Is there a better way to do this, or is this a good solution? Thank you!
Intermediate & Advanced SEO | | AndrewY1