Best practices for robotx.txt -- allow one page but not the others?
-
So, we have a page, like domain.com/searchhere, but results are being crawled (and shouldn't be), results look like domain.com/searchhere?query1. If I block /searchhere? will it block users from crawling the single page /searchere (because I still want that page to be indexed).
What is the recommended best practice for this?
-
SEOmoz used to use Google Search for the site. I am confident Google has a solid method for keeping their own results clean.
It appears SEOmoz recently changed their search widget. If you examine the URL you shared, notice none of the search results actually appear in the HTML of the page. For example, load the view-source URL and perform a find (CTRL+F) for "testing" which is the subject of the search. There are no results. Since the results are not in the page's HTML, they would not get indexed.
-
If Google is viewing the search result pages as soft 404s, then yes, adding the noindex tag should resolve the problem.
-
And, because google can currently crawl these search result pages, there are a number of soft 404 pages popping up. Would adding a noindex tag to these pages fix the issue?
-
Thanks for the links and help.
How does seomoz keep search results from being indexed? They don't block search results with robots.txt and it doesn't appear that they add the noindex tag to the search result pages.(ex: view-source:http://www.seomoz.org/pages/search_results#stq=testing&stp=1)
-
Yeah, but Ryan's answer is the best one if you can go that route.
-
Hi Michelle,
The concept of crawl efficiency is highly misunderstood. Are all your site's pages being indexed? Is new content or changes indexed in a timely manner? If so, that would indicate your site is being crawled efficiently.
Regarding the link you shared, you are on the right track but need to dig a bit deeper. On the page you shared, find the discussion related to robots.txt. There is a link which will lead you to the following page:
https://developers.google.com/webmasters/control-crawl-index/docs/faq#h01
There you will find a more detailed explanation along with several examples of when not to use robots.txt.
robots.txt: Use it if crawling of your content is causing issues on your server. For example, you may want to disallow crawling of infinite calendar scripts. You should not use the robots.txt to block private content (use server-side authentication instead), or handle canonicalization (see our Help Center). If you must be certain that a URL is not indexed, use the robots meta tag or X-Robots-Tag HTTP header instead.
SEOmoz offers a great guide on this topic as well: http://www.seomoz.org/learn-seo/robotstxt
If you desire to go beyond the basic Google and SEOmoz explanation and learn more about this topic, my favorite article related to robots.txt, written by Lindsay, can be found here: http://www.seomoz.org/blog/serious-robotstxt-misuse-high-impact-solutions
-
-
Hi Ryan,
Wouldn't that cause issues with crawl efficiency?
Also, webmaster guidelines say "Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines."
-
Thank you. Are you sure about that?
-
what about if you use "<a title="Click for Help!">Canonical URL" tag ?</a>
You can put this code:
in
/searchhere?page.
-
The best practice would be to add the noindex tag to the search result pages but not the /searchhere page.
Typically speaking, the best robots.txt file is a blank one. The file should only be used as a last resort with respect to blocking content.
-
What you outlined sounds to me like it should work. Disallowing /searchhere? shouldn't disallow the top-level search page at /searchhere, but should disallow all the search result pages with queries after the ?.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Rel canonical tag from shopify page to wordpress site page
We have pages on our shopify site example - https://shop.example.com/collections/cast-aluminum-plaques/products/cast-aluminum-address-plaque That we want to put a rel canonical tag on to direct to our wordpress site page - https://www.example.com/aluminum-plaques/ We have links form the wordpress page to the shop page, and over time ahve found that google has ranked the shop pages over the wp pages, which we do not want. So we want to put rel canonical tags on the shop pages to say the wp page is the authority. I hope that makes sense, and I would appreciate your feeback and best solution. Thanks! Is that possible?
Intermediate & Advanced SEO | | shabbirmoosa0 -
Fresh page versus old page climbing up the rankings.
Hello, I have noticed that if publishe a webpage that google has never seen it ranks right away and usually in a descend position to start with (not great but descend). Usually top 30 to 50 and then over the months it slowly climbs up the rankings. However, if my page has been existing for let's say 3 years and I make changes to it, it takes much longer to climb up the rankings Has someone noticed that too ? and why is that ?
Intermediate & Advanced SEO | | seoanalytics0 -
Why has my home page replaced my sub-category page for set of keywords? Happened 2x in last 2 weeks for day or so only to fix itself. What is going on?
Today I noticed a really weird problem. Our LED Step Lights page (https://www.pegasuslighting.com/led-step-lights.html) has been replaced in the search results with our home page. See screenshot below. As I started to research what was going on, I noticed that this same thing must have happened on January 26 and 27 because in my Analytics I can see that our LED Step Lights sub-cat page had a sudden drop in traffic on those two days only to bounce back again on the 28th. See screenshot below. Our LED Step Lights page has had no changes in content, meta information, or anything in months. We have done no recent link building to this page in years. I don't understand what is going on. This is a popular page for us generating decent traffic. I really don't understand what is going on or even how to try and resolve this problem. I checked our Search Console. No messages. No manual web spam actions. Nothing to suggest that anything is going on except for the weird drops in traffic. Has anyone ever seen this happen before? Does anyone have any ideas as to what may be going on? serp-led-step-lights.png organic-traffic-drops.png search-console-led-step-lights.png
Intermediate & Advanced SEO | | cajohnson0 -
Whats the best way to remove search indexed pages on magento?
A new client ( aqmp.com.br/ )call me yestarday and she told me since they moved on magento they droped down more than US$ 20.000 in sales revenue ( monthly)... I´ve just checked the webmaster tool and I´ve just discovered the number of crawled pages went from 3.260 to 75.000 since magento started... magento is creating lots of pages with queries like search and filters. Example: http://aqmp.com.br/acessorios/lencos.html http://aqmp.com.br/acessorios/lencos.html?mode=grid http://aqmp.com.br/acessorios/lencos.html?dir=desc&order=name Add a instruction on robots.txt is the best way to remove unnecessary pages of the search engine?
Intermediate & Advanced SEO | | SeoMartin10 -
Can too many "noindex" pages compared to "index" pages be a problem?
Hello, I have a question for you: our website virtualsheetmusic.com includes thousands of product pages, and due to Panda penalties in the past, we have no-indexed most of the product pages hoping in a sort of recovery (not yet seen though!). So, currently we have about 4,000 "index" page compared to about 80,000 "noindex" pages. Now, we plan to add additional 100,000 new product pages from a new publisher to offer our customers more music choice, and these new pages will still be marked as "noindex, follow". At the end of the integration process, we will end up having something like 180,000 "noindex, follow" pages compared to about 4,000 "index, follow" pages. Here is my question: can this huge discrepancy between 180,000 "noindex" pages and 4,000 "index" pages be a problem? Can this kind of scenario have or cause any negative effect on our current natural SEs profile? or is this something that doesn't actually matter? Any thoughts on this issue are very welcome. Thank you! Fabrizio
Intermediate & Advanced SEO | | fablau0 -
Web pages fighting over rank for one keyword. Can it be stopped?
Hey, See attachment. Website is Omega Red. The page I want to rank for seems like it is being held back by other closely related pages with similar titles. I am looking to rank for electrical earthing with this page. On the graph it shows how the other pages have interacted over a period of time on the website and how if they drop out of the top 50 this page then moves up in Google. I don't really want to canonicalise the other pages into one but maybe this is what needs to happen? Any suggestions? bWymgVt.jpg
Intermediate & Advanced SEO | | Hughescov0 -
3 results for a site on page one?!?
Hi, I've never seen a website rank on page 1 in position 2, 3 and 4 for one query, completely separate results as well. I thought they limited the amount of results from a website on each page?
Intermediate & Advanced SEO | | activitysuper0 -
What are the Best Practices for moving a blog from subdomain to domain/subcategory?
Howdy SEOmoz fans! (couldn't resist). I'm moving a wordpress blog from blog.domain.com to domain.com/blog. Trying to do it right the first time and cover all my bases. Issues I'm trying to handle correctly, in varying degrees of importance: External LInks Internal Links Google Friendly Traffic Routing in a dynamic environment (wordpress, 301, .htaccess, etc.) Thanks so much for any and all input!
Intermediate & Advanced SEO | | NTM1