Best practices for robotx.txt -- allow one page but not the others?
-
So, we have a page, like domain.com/searchhere, but results are being crawled (and shouldn't be), results look like domain.com/searchhere?query1. If I block /searchhere? will it block users from crawling the single page /searchere (because I still want that page to be indexed).
What is the recommended best practice for this?
-
SEOmoz used to use Google Search for the site. I am confident Google has a solid method for keeping their own results clean.
It appears SEOmoz recently changed their search widget. If you examine the URL you shared, notice none of the search results actually appear in the HTML of the page. For example, load the view-source URL and perform a find (CTRL+F) for "testing" which is the subject of the search. There are no results. Since the results are not in the page's HTML, they would not get indexed.
-
If Google is viewing the search result pages as soft 404s, then yes, adding the noindex tag should resolve the problem.
-
And, because google can currently crawl these search result pages, there are a number of soft 404 pages popping up. Would adding a noindex tag to these pages fix the issue?
-
Thanks for the links and help.
How does seomoz keep search results from being indexed? They don't block search results with robots.txt and it doesn't appear that they add the noindex tag to the search result pages.(ex: view-source:http://www.seomoz.org/pages/search_results#stq=testing&stp=1)
-
Yeah, but Ryan's answer is the best one if you can go that route.
-
Hi Michelle,
The concept of crawl efficiency is highly misunderstood. Are all your site's pages being indexed? Is new content or changes indexed in a timely manner? If so, that would indicate your site is being crawled efficiently.
Regarding the link you shared, you are on the right track but need to dig a bit deeper. On the page you shared, find the discussion related to robots.txt. There is a link which will lead you to the following page:
https://developers.google.com/webmasters/control-crawl-index/docs/faq#h01
There you will find a more detailed explanation along with several examples of when not to use robots.txt.
robots.txt: Use it if crawling of your content is causing issues on your server. For example, you may want to disallow crawling of infinite calendar scripts. You should not use the robots.txt to block private content (use server-side authentication instead), or handle canonicalization (see our Help Center). If you must be certain that a URL is not indexed, use the robots meta tag or X-Robots-Tag HTTP header instead.
SEOmoz offers a great guide on this topic as well: http://www.seomoz.org/learn-seo/robotstxt
If you desire to go beyond the basic Google and SEOmoz explanation and learn more about this topic, my favorite article related to robots.txt, written by Lindsay, can be found here: http://www.seomoz.org/blog/serious-robotstxt-misuse-high-impact-solutions
-
-
Hi Ryan,
Wouldn't that cause issues with crawl efficiency?
Also, webmaster guidelines say "Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines."
-
Thank you. Are you sure about that?
-
what about if you use "<a title="Click for Help!">Canonical URL" tag ?</a>
You can put this code:
in
/searchhere?page.
-
The best practice would be to add the noindex tag to the search result pages but not the /searchhere page.
Typically speaking, the best robots.txt file is a blank one. The file should only be used as a last resort with respect to blocking content.
-
What you outlined sounds to me like it should work. Disallowing /searchhere? shouldn't disallow the top-level search page at /searchhere, but should disallow all the search result pages with queries after the ?.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Sitemaps: Best Practice
What should and what shouldn't go in the sitemap? In particular, pages like subscribe to our newsletter/ unsubscribe to our newsletter? Is there really any benefit in highlighting those pages to the SEs? Thanks for any advice/ anecdotes 🙂
Intermediate & Advanced SEO | | Fubra0 -
Targeting two search terms with same intent - one or more pages for SEO benefits?
I'd like some professional opinions on this topic. I'm looking after the SEO for my friends site, and there are two main search terms we are looking to boost in search engines. The company sells Billboard advertising space to businesses in the UK. Here are the two search terms we're looking to target: Billboard Advertising - 880 searches P/M Outdoor Advertising - 720 searches P/M It would usually make sense to make a separate page to target the keyword "billboard advertising" on its own fully optimised landing page with more information on the topic and with a targeted URL: www.website.com/billboard-advertising/ and the homepage to target "outdoor advertising" as it's an outdoor advertising agency. But there's a problem, as both search terms are highly related and have the same intent, I'm worried that if we create a separate page to target the billboard advertising, it will conflict with the homepage targeting outdoor advertising. Also, the main competitors who are currently ranked position 1-3, are ranking with their home pages and not optimised landing pages to target the exact search term "billboard advertising". Any advice on this?
Intermediate & Advanced SEO | | Jseddon920 -
Landing pages, are my pages competing?
If I have identified a keyword which generates income and when searched in google my homepage comes up ranked second, should I still create a landing page based on that keyword or will it compete with my homepage and cause it to rank lower?
Intermediate & Advanced SEO | | The_Great_Projects0 -
Removing Low Rank Pages Help Others Shine?
Good Morning! I have a handful of pages that are not ranking very well, if at all. They are not driving any traffic, and are realistically just sorta "there". I have already determined I will not be bringing them over to our new web redesign. My question, could it be in our best interest to try and save these pages with ZERO traction and optimize them? Re-purpose them? Or does having them on our site currently muddy up our other pages? Any help is greatly appreciated! Thanks!
Intermediate & Advanced SEO | | HashtagHustler0 -
How to associate content on one page to another page
Hi all, I would like associate content on "Page A" with "Page B". The content is not the same, but we want to tell Google it should be associated. Is there an easy way to do this?
Intermediate & Advanced SEO | | Viewpoints1 -
Any downsides of (permanent)redirecting 404 pages to more generic pages(category page)
Hi, We have a site which is somewhat like e-bay, they have several categories and advertisements posted by customers/ client. These advertisements disappear over time and turn into 404 pages. We have the option to redirect the user to the corresponding category page, but we're afraid of any negative impact of this change. Are there any downsides, and is this really the best option we have? Thanks in advance!
Intermediate & Advanced SEO | | vhendriks0 -
Merging your google places page with google plus page.
I have a map listing showing for the keyword junk cars for cash nj. I recently created a new g+ page and requested a merge between the places and the + page. now when you do a search you see the following. Junk Cars For Cash NJ LLC
Intermediate & Advanced SEO | | junkcars
junkcarforcashnj.com/
Google+ page - Google+ page the first hyperlink takes me to the about page of the G+ and the second link takes me to the posts section within g+. Is this normal? should i delete the places account where the listing was originally created? Or do i leave it as is? Thanks0 -
Best Strategy to display 8mg Images on Product Pages for Ecommerce
I have an ecommerce store that has a variety of images including some super high quality images that are 8 mg. This style of image could be completed for hundreds of products in the store. Does anyone have any tips on what I should be watching out for here? Is 8 mg too unusable?
Intermediate & Advanced SEO | | LukeyJamo0