Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to stop Search Bot from crawling through a submit button
-
On our website http://www.thefutureminders.com/, we have three form fields that have three pull downs for Month, Day, and year. This is creating duplicate pages while indexing. How do we tell the search Bot to index the page but not crawl through the submit button?
Thanks
Naren
-
Hi Dan
What is happening is this - since we have all the months [12], all the dates [31] and years[1921 through 2011] in the form fields, the robot seems to be taking these incrementally and then using the submit button. After the submit button, user is presented with a registration page. While we do want the search to index the rest of the page and the crawl through the rest of the page links we do not want it to crawl through that submit button. I hope I am making sense.
Naren
-
The advantage of blocking a page from being indexed via a meta tag is it is less likely to have unexpected consequences. I've often seen in the past cases where an incorrectly modified robots.txt file leads to a site being blocked by accident.
-
Hi
To my knowledge, you don't stop it from crawling through the button (like a nofollowed link), rather you block the robot at the page it ends up on after clicking submit.
Say the user hits submit and it takes them to mydomain.com/confirm.html On that page you'll want to add;
....if you want it to NOT index the page but follow the links on it.
or
...if you want it to NOT index and NOT follow the links on that page.
Its advised that its better to do this with the meta tag than in robots.txt.
Hopefully I've understood the question correctly!
-Dan
-
Block the pages/folders you do not wish to be indexed with robots.txt file:
User-agent: * Disallow: /folder1/ Disallow: /folder2/
OR you can add canonical tags to the other pages which are creating duplicate content.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Crawl rate dropped to zero
Hello, I recently moved my site in godaddy from cpanel to managed wordpress. I bought this transfer directly from GoDaddy customer service. in this process they accidentally changed my domain from www to non www. I changed it back after the migration, but as a result of this sites craw rate from search console fell to zero and has not risen at all since then. In addition to this website does not display any other errors, i can ask google manually fetch my pages and it works as before, only the crawl rates seems to be dropped permanently. GoDaddy customer service also claims that do not see any errors but I think, however, that in some way they caused this during the migration when the url changed since the timing match perfectly. also when they accidentally removed the www, crawl rate of my sites non www version got up but fell back to zero when I changed it back to www version. Now the crawl rate of both www and non www version is zero. How do I get it to rise again? Customer service also said that the problem may be related to ftp-data of search console? But they were not able to help any more than .Would someone from here be able to help me with this in anyway please?
Technical SEO | | pok3rplay3r0 -
Why has my search traffic suddenly tanked?
On 6 June, Google search traffic to my Wordpress travel blog http://www.travelnasia.com tanked completely. There are no warnings or indicators in Webmaster Tools that suggest why this happened. Traffic from search has remained at zero since 6 June and shows no sign of recovering. Two things happened on or around 6 June. (1) I dropped my premium theme which was proving to be not mobile friendly and replaced it with the ColorMag theme which is responsive. (2) I relocated off my previous hosting service which was showing long server lag times to a faster host. Both of these should have improved my search performance, not tanked it. There were some problems with the relocation to the new web host which resulted in a lot of "out of memory" errors on the website for 3-4 days. The allowed memory was simply not enough for the complexity of the site and the volume of traffic. After a few days of trying to resolve these problems, I moved the site to another web host which allows more PHP memory and the site now appears reliably accessible for both desktop and mobile. But my search traffic has not recovered. I am wondering if in all of this I've done something that Google considers to be a cardinal sin and I can't see it. The clues I'm seeing include: Moz Pro was unable to crawl my site last Friday. It seems like every URL it tried to crawl was of the form http://www.travelnasia.com/wp-login.php?action=jetpack-sso&redirect_to=http://www.travelnasia.com/blog/bangkok-skytrain-bts-mrt-lines which resulted in a 500 status error. I don't know why this happened but I have disabled the Jetpack login function completely, just in case it's the problem. GWT tells me that some of my resource files are not accessible by GoogleBot due to my robots.txt file denying access to /wp-content/plugins/. I have removed this restriction after reading the latest advice from Yoast but I still can't get GWT to fetch and render my posts without some resource errors. On 6 June I see in Structured Data of GWT that "items" went from 319 to 1478 and "items with errors" went from 5 to 214. There seems to be a problem with both hatom and hcard microformats but when I look at the source code they seem to be OK. What I can see in GWT is that each hcard has a node called "n [n]" which is empty and Google is generating a warning about this. I see that this is because the author vcard URL class now says "url fn n" but I don't see why it says this or how to fix it. I also don't see that this would cause my search traffic to tank completely. I wonder if anyone can see something I'm missing on the site. Why would Google completely deny search traffic to my site all of a sudden without notifying any kind of penalty? Note that I have NOT changed the content of the site in any significant way. And even if I did, it's unlikely to result in a complete denial of traffic without some kind of warning.
Technical SEO | | Gavin.Atkinson1 -
Google stopped crawling my site. Everybody is stumped.
This has stumped the Wordpress staff and people in the Google Webmasters forum. We are in Google News (have been for years), and so new posts are crawled immediately. On Feb 17-18 Crawl Stats dropped 85%, and new posts were no longer indexed (not appearing on News or search). Data highlighter attempts return "This URL could not be found in Google's index." No manual actions by Google. No changes to the website; no custom CSS. No Site Errors or new URL errors. No sitemap problems (resubmitting didn't help). We're on wordpress.com, so no odd code. We can see the robot.txt file. Other search engines can see us, as can social media websites. Older posts still index, but loss of News is a big hit. Also, I think overall Google referrals are dropping. We can Fetch the URL for a new post, and many hours later it appears on Google and News, and we can then use Data Highlighter. It's now 6 days and no recovery. Everybody is stumped. Any ideas? I just joined, so this might be the wrong venue. If so, apologies.
Technical SEO | | Editor-FabiusMaximus_Website0 -
How to stop google from indexing specific sections of a page?
I'm currently trying to find a way to stop googlebot from indexing specific areas of a page, long ago Yahoo search created this tag class=”robots-nocontent” and I'm trying to see if there is a similar manner for google or if they have adopted the same tag? Any help would be much appreciated.
Technical SEO | | Iamfaramon0 -
302 redirect used, submit old sitemap?
The website of a partner of mine was recently migrated to a new platform. Even though the content on the pages mostly stayed the same, both the HTML source (divs, meta data, headers, etc.) and URLs (removed index.php, removed capitalization, etc) changed heavily. Unfortunately, the URLs of ALL forum posts (150K+) were redirected using a 302 redirect, which was only recently discovered and swiftly changed to a 301 after the discovery. Several other important content pages (150+) weren't redirected at all at first, but most now have a 301 redirect as well. The 302 redirects and 404 content pages had been live for over 2 weeks at that point, and judging by the consistent day/day drop in organic traffic, I'm guessing Google didn't like the way this migration went. My best guess would be that Google is currently treating all these content pages as 'new' (after all, the source code changed 50%+, most of the meta data changed, the URL changed, and a 302 redirect was used). On top of that, the large number of 404's they've encountered (40K+) probably also fueled their belief of a now non-worthy-of-traffic website. Given that some of these pages had been online for almost a decade, I would love Google to see that these pages are actually new versions of the old page, and therefore pass on any link juice & authority. I had the idea of submitting a sitemap containing the most important URLs of the old website (as harvested from the Top Visited Pages from Google Analytics, because no old sitemap was ever generated...), thereby re-pointing Google to all these old pages, but presenting them with a nice 301 redirect this time instead, hopefully causing them to regain their rankings. To your best knowledge, would that help the problems I've outlined above? Could it hurt? Any other tips are welcome as well.
Technical SEO | | Theo-NL0 -
How to stop my webmail pages not to be indexed on Google ??
when i did a search in google for Site:mywebsite.com , for a list of pages indexed. Surprisingly the following come up " Webmail - Login " Although this is associated with the domain , this is a completely different server , this the rackspace email server browser interface I am sure that there is nothing on the website that links or points to this.
Technical SEO | | UIPL
So why is Google indexing it ? & how do I get it out of there. I tried in webmaster tool but I could not , as it seems like a sub-domain. Any ideas ? Thanks Naresh Sadasivan0 -
Should I nofollow search results pages
I have a customer site where you can search for products they sell url format is: domainname/search/keywords/ keywords being what the user has searched for. This means the number of pages can be limitless as the client has over 7500 products. or should I simply rel canonical the search page or simply no follow it?
Technical SEO | | spiralsites0 -
Image search and CDNs
Hi, Our site has a very high domain strength. Although our site ranks well for general search phrases, we rank poorly for image search (even though our site has very high quality images). Our images are hosted on a separate CDN with a different domain. Although there are a number of benefits to doing this, since they are on a different domain, are we not able to capitalize on our my site's domain strength? Is there any way to associate our CDN to our main site via Google webmaster tools? Has anyone researched the search ranking impacts due to storing your images on a CDN, given that your domain strength is very high? Curious on people's thoughts?
Technical SEO | | NicB10