Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to stop Search Bot from crawling through a submit button
-
On our website http://www.thefutureminders.com/, we have three form fields that have three pull downs for Month, Day, and year. This is creating duplicate pages while indexing. How do we tell the search Bot to index the page but not crawl through the submit button?
Thanks
Naren
-
Hi Dan
What is happening is this - since we have all the months [12], all the dates [31] and years[1921 through 2011] in the form fields, the robot seems to be taking these incrementally and then using the submit button. After the submit button, user is presented with a registration page. While we do want the search to index the rest of the page and the crawl through the rest of the page links we do not want it to crawl through that submit button. I hope I am making sense.
Naren
-
The advantage of blocking a page from being indexed via a meta tag is it is less likely to have unexpected consequences. I've often seen in the past cases where an incorrectly modified robots.txt file leads to a site being blocked by accident.
-
Hi
To my knowledge, you don't stop it from crawling through the button (like a nofollowed link), rather you block the robot at the page it ends up on after clicking submit.
Say the user hits submit and it takes them to mydomain.com/confirm.html On that page you'll want to add;
....if you want it to NOT index the page but follow the links on it.
or
...if you want it to NOT index and NOT follow the links on that page.
Its advised that its better to do this with the meta tag than in robots.txt.
Hopefully I've understood the question correctly!
-Dan
-
Block the pages/folders you do not wish to be indexed with robots.txt file:
User-agent: * Disallow: /folder1/ Disallow: /folder2/OR you can add canonical tags to the other pages which are creating duplicate content.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Hide sitelinks from Google search results
Does anyone have any recommendations on how you can tell Google (hopefully via a URL) not to index that page of a website? I have tried through SEO Yoast to hide certain sitemaps (which has worked to a degree) but certain functionalities of Wordpress websites show links without them actually being part of a "sitemap" so those links are harder to hide. I'm having an issue with one of my websites - the sitelinks that Google is suggesting are nowhere near the most popular pages and I know that you can't make recommendations through Google not to show certain pages through Search Console. anymore. Any suggestions are greatly appreciated! Thanks!
Technical SEO | | MainstreamMktg0 -
How to allow bots to crawl all but WP-content
Hello, I would like my website to remain crawlable to bots, but to block my wp content and media. Does the following robots.txt work? I worry that the * user agent may conflict with the others. User-agent: *
Technical SEO | | Tom3_15
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/ User-agent: GoogleBot
Allow: / User-agent: GoogleBot-Mobile
Allow: / User-agent: GoogleBot-Image
Allow: / User-agent: Bingbot
Allow: / User-agent: Slurp
Allow: /0 -
Spam URL'S in search results
We built a new website for a client. When I do 'site:clientswebsite.com' in Google it shows some of the real, recently submitted pages. But it also shows many pages of spam url results, like this 'clientswebsite.com/gockumamaso/22753.htm' - all of which then go to the sites 404 page. They have page titles and meta descriptions in Chinese or Japanese too. Some of the urls are of real pages, and link to the correct page, despite having the same Chinese page titles and descriptions in the SERPS. When I went to remove all the spammy urls in Search Console (it only allowed me to temporarily hide them), a whole load of new ones popped up in the SERPS after a day or two. The site files itself are all fine, with no errors in the server logs. All the usual stuff...robots.txt, sitemap etc seems ok and the proper pages have all been requested for indexing and are slowly appearing. The spammy ones continue though. What is going on and how can I fix it?
Technical SEO | | Digital-Murph0 -
How google crawls images and which url shows as source?
Hi, I noticed that some websites host their images to a different url than the one their actually website is hosted but in the end google link to the one that the site is hosted. Here is an example: This is a page of a hotel in booking.com: http://www.booking.com/hotel/us/harrah-s-caesars-palace.en-gb.html When I try a search for this hotel in google images it shows up one of the images of the slideshow. When I click on the image on Google search, if I choose the Visit Page button it links to the url above but the actual image is located in a totally different url: http://r-ec.bstatic.com/images/hotel/840x460/135/13526198.jpg My question is can you host your images to one site but show it to another site and in the end google will lead to the second one?
Technical SEO | | Tz_Seo0 -
Do YouTube videos in iFrames get crawled?
There seems to be quite a few articles out there that say iframes cause problems with organic search and that the various bots can't/won't crawl them. Most of the articles are a few years old (including Moz's video sitemap article). I'm wondering if this is still the case with YouTube/Vimeo/etc videos, all of which only offer iFrames as an embed option. I have a hard time believing that a Google property (YT) would offer an embed option that it's own bot couldn't crawl. However, let me know if that is in fact the case. Thanks! Jim
Technical SEO | | DigitalAnarchy0 -
Google stopped crawling my site. Everybody is stumped.
This has stumped the Wordpress staff and people in the Google Webmasters forum. We are in Google News (have been for years), and so new posts are crawled immediately. On Feb 17-18 Crawl Stats dropped 85%, and new posts were no longer indexed (not appearing on News or search). Data highlighter attempts return "This URL could not be found in Google's index." No manual actions by Google. No changes to the website; no custom CSS. No Site Errors or new URL errors. No sitemap problems (resubmitting didn't help). We're on wordpress.com, so no odd code. We can see the robot.txt file. Other search engines can see us, as can social media websites. Older posts still index, but loss of News is a big hit. Also, I think overall Google referrals are dropping. We can Fetch the URL for a new post, and many hours later it appears on Google and News, and we can then use Data Highlighter. It's now 6 days and no recovery. Everybody is stumped. Any ideas? I just joined, so this might be the wrong venue. If so, apologies.
Technical SEO | | Editor-FabiusMaximus_Website0 -
Links under Meta Description when performing a search
Doing research for clients, I have came across seeing sites displaying hyperlinks underneath their own meta description. keywords that I have googled that result with hyperlinks displaying under meta descriptions: Google'd: iacquire (brand) bmw wheels (Beyern Wheels, position 1) aftermarket bmw wheels (MMR Wheels, position 2) These companys have hyperlinks underneath their descriptions. Anyone have any ideas why this happens or how it happens?
Technical SEO | | frnprz0 -
Do the search engines penalise you for images being WATERMARKED?
Our site contains a library of thousands of images which we are thinking of watermarking. Does anyone know if Google penalise sites for this or is it best practice in order to protect revenues? As watermarking these images makes them less shareable (but protects revenues) i was thinking Google might then penalise us - which might affect traffic Any ideas?
Technical SEO | | KevinDunne0