Best practices for robotx.txt -- allow one page but not the others?
-
So, we have a page, like domain.com/searchhere, but results are being crawled (and shouldn't be), results look like domain.com/searchhere?query1. If I block /searchhere? will it block users from crawling the single page /searchere (because I still want that page to be indexed).
What is the recommended best practice for this?
-
SEOmoz used to use Google Search for the site. I am confident Google has a solid method for keeping their own results clean.
It appears SEOmoz recently changed their search widget. If you examine the URL you shared, notice none of the search results actually appear in the HTML of the page. For example, load the view-source URL and perform a find (CTRL+F) for "testing" which is the subject of the search. There are no results. Since the results are not in the page's HTML, they would not get indexed.
-
If Google is viewing the search result pages as soft 404s, then yes, adding the noindex tag should resolve the problem.
-
And, because google can currently crawl these search result pages, there are a number of soft 404 pages popping up. Would adding a noindex tag to these pages fix the issue?
-
Thanks for the links and help.
How does seomoz keep search results from being indexed? They don't block search results with robots.txt and it doesn't appear that they add the noindex tag to the search result pages.(ex: view-source:http://www.seomoz.org/pages/search_results#stq=testing&stp=1)
-
Yeah, but Ryan's answer is the best one if you can go that route.
-
Hi Michelle,
The concept of crawl efficiency is highly misunderstood. Are all your site's pages being indexed? Is new content or changes indexed in a timely manner? If so, that would indicate your site is being crawled efficiently.
Regarding the link you shared, you are on the right track but need to dig a bit deeper. On the page you shared, find the discussion related to robots.txt. There is a link which will lead you to the following page:
https://developers.google.com/webmasters/control-crawl-index/docs/faq#h01
There you will find a more detailed explanation along with several examples of when not to use robots.txt.
robots.txt: Use it if crawling of your content is causing issues on your server. For example, you may want to disallow crawling of infinite calendar scripts. You should not use the robots.txt to block private content (use server-side authentication instead), or handle canonicalization (see our Help Center). If you must be certain that a URL is not indexed, use the robots meta tag or X-Robots-Tag HTTP header instead.
SEOmoz offers a great guide on this topic as well: http://www.seomoz.org/learn-seo/robotstxt
If you desire to go beyond the basic Google and SEOmoz explanation and learn more about this topic, my favorite article related to robots.txt, written by Lindsay, can be found here: http://www.seomoz.org/blog/serious-robotstxt-misuse-high-impact-solutions
-
-
Hi Ryan,
Wouldn't that cause issues with crawl efficiency?
Also, webmaster guidelines say "Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines."
-
Thank you. Are you sure about that?
-
what about if you use "<a title="Click for Help!">Canonical URL" tag ?</a>
You can put this code:
in
/searchhere?page.
-
The best practice would be to add the noindex tag to the search result pages but not the /searchhere page.
Typically speaking, the best robots.txt file is a blank one. The file should only be used as a last resort with respect to blocking content.
-
What you outlined sounds to me like it should work. Disallowing /searchhere? shouldn't disallow the top-level search page at /searchhere, but should disallow all the search result pages with queries after the ?.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Redirecting homepage to internal page (2nd Tier page)
We are planning to experiment redirecting our homepage to one of the 2nd tier page. I mean....example.com to example.com/page. We need this page to rank well, but it doesn't have much internal links or external back-links, so we opt for this redirect. Advantage with this page is, it has "keyword" we want to rank for in URL. "page" in example.com/page. Will this help or hurt us in SEO? I think we are missing keyword in our root domain, so interested to highlight this page. Thanks, Satish
Intermediate & Advanced SEO | | vtmoz0 -
How Should We Best List Events Pages?
Hi everyone! Luke here from CHARGED.fm hoping that a brilliant mind could help me with another annoying (at least for me) technical seo question. It's about how we list the events on our ticketing site. Here's the rundown: We currently list tickets by event id, but our competitors keep the event page in the same silo and use the venue name and date of event in the url. So we do this: http://www.charged.fm/kinky-boots-tickets (disregard redirect for now) List the events where you can choose from these: http://www.charged.fm/event/tickets/2518362/kinky-boots
Intermediate & Advanced SEO | | keL.A.xT.o
http://www.charged.fm/event/tickets/2511448/kinky-boots Moz lists these as duplicate content, so we're wondering how to resolve this. We're also wondering if it would be benficial to keep the events page in the same silo like our competitors: http://www.vividseats.com/theatre/kinky-boots-tickets/kinky-boots-9-20-1537274.html (notice how they go /theatre/kinky-boots-tickets/event/) Would it be beneficial to list like this? Is it inconsequential? Could we leave things the way that they are or should we at least add the venue and date to the events page URL? Thanks a lot for any help,
Luke0 -
What is the best way to optimize/setup a teaser "coming soon" page for a new product launch?
Within the context of a physical product launch what are some ideas around creating a /coming-soon page that "teases" the launch. Ideally I'd like to optimize a page around the product, but the client wants to try build consumer anticipation without giving too many details away. Any thoughts?
Intermediate & Advanced SEO | | GSI0 -
Create different pages with keyword variations VS. Add keyword variations in 1 page
For searches involving keywords like "lessons", "courses", "classes" I see frequently pages in the top rankings which do not contain the search term in the title tag, despite these terms being quite competitive. It seems that when searching for "classes", google detects that pages about "courses" may be just as relevant. What do you recommend? option 1: creating 10 pages optimized on 10 different keyword variations, each with a significant part of unique content or option 2: one page and dropping throughout the page 10 keyword variations in body and headlines Given that keywords are all synonyms and website has already high domain authority in the niche. thanks
Intermediate & Advanced SEO | | lcourse0 -
How to avoid content canibalizm? How do I control which page is the landing page?
Hi All, To clarify my question I will give an example. Let's assume that I have a laptop e-commerce site and that one of my main categories is Samsung Laptops. The category page shows lots of laptops and a small section of text. On the other hand, in my article section I have a HUGE article about Samsung Laptops. If we consider the two word phrases each page is targeting then the answer is the same - Samsung Laptops. On the article i point to the category page using anchor such as "buy samsung laptops" or "samsung laptops" and on the category page (my wishful landing page) I point to the article with "learn about samsung laptops" or "samsung laptops pros and cons". Thanks
Intermediate & Advanced SEO | | BeytzNet0 -
Best way to transfer pagerank from one site to another
We currently own two sites (with unique domains) that accomplish a similar goal, but are completely different (so there's no duplicate content, etc) and were developed independently. Both sites have very good pagerank due to great press and inbound links over several years. Also both have thousands of pages and get a lot of inbound deep links. We plan on shutting one of the sites down so we can focus on the other. We'd like to transfer as much traffic and SEO/pagerank value from the one we're shutting down to the one we're continuing to focus on. What's the best way to do that? Should we just do a 301 redirect? Or keep the site running in some diminished form and link it to the site we're focusing on? I saw SEOmoz has a good guide on moving sites http://www.seomoz.org/learn-seo/redirection which recommends a 301 redirect, but I wanted to see if the same applies when merging sites as we are in this case.
Intermediate & Advanced SEO | | 212areacode0 -
What's the best SEO practice for having dynamic content on the same URL?
Let's use this example... www.miniclip.com and there's a function to log in... If you're logged in and a cookie checks that you're logged in and you're on page, let's say, www.miniclip.com/racing-games however the banners being displayed would have more call to action and offers on the page when a user is not logged in to entice them to sign up but the URL would still be www.miniclip.com/racing-games if and if not logged in, what would be the best URL practice for this? just do it?
Intermediate & Advanced SEO | | AdiRste0 -
Multiple Versions of Pages on One Website
Hi! My name is Sarah and I work for a brand design firm in Los Angeles. Currently we're working on a website redesign for our company. We have three pages of content that we want to add to the site, but are unsure if we will get penalized by Google if we add all of them since they may come off as too similar? The pages are: Branding
Intermediate & Advanced SEO | | Jawa
Personal Branding
Corporate Branding Does anyone know if our SEO will be penalized for having all three of these pages separately, or should we just focus on Branding, and include Personal Branding and Corporate Branding as sub categories on the page? Thanks! Sarah P.S. I should also say, we will have more than just the three aforementioned pages. It's going to be a big site with around 200+ pages. (Half of them being services, which is where the Branding, PB and CB pages will be located.)0