Best practices for robotx.txt -- allow one page but not the others?
-
So, we have a page, like domain.com/searchhere, but results are being crawled (and shouldn't be), results look like domain.com/searchhere?query1. If I block /searchhere? will it block users from crawling the single page /searchere (because I still want that page to be indexed).
What is the recommended best practice for this?
-
SEOmoz used to use Google Search for the site. I am confident Google has a solid method for keeping their own results clean.
It appears SEOmoz recently changed their search widget. If you examine the URL you shared, notice none of the search results actually appear in the HTML of the page. For example, load the view-source URL and perform a find (CTRL+F) for "testing" which is the subject of the search. There are no results. Since the results are not in the page's HTML, they would not get indexed.
-
If Google is viewing the search result pages as soft 404s, then yes, adding the noindex tag should resolve the problem.
-
And, because google can currently crawl these search result pages, there are a number of soft 404 pages popping up. Would adding a noindex tag to these pages fix the issue?
-
Thanks for the links and help.
How does seomoz keep search results from being indexed? They don't block search results with robots.txt and it doesn't appear that they add the noindex tag to the search result pages.(ex: view-source:http://www.seomoz.org/pages/search_results#stq=testing&stp=1)
-
Yeah, but Ryan's answer is the best one if you can go that route.
-
Hi Michelle,
The concept of crawl efficiency is highly misunderstood. Are all your site's pages being indexed? Is new content or changes indexed in a timely manner? If so, that would indicate your site is being crawled efficiently.
Regarding the link you shared, you are on the right track but need to dig a bit deeper. On the page you shared, find the discussion related to robots.txt. There is a link which will lead you to the following page:
https://developers.google.com/webmasters/control-crawl-index/docs/faq#h01
There you will find a more detailed explanation along with several examples of when not to use robots.txt.
robots.txt: Use it if crawling of your content is causing issues on your server. For example, you may want to disallow crawling of infinite calendar scripts. You should not use the robots.txt to block private content (use server-side authentication instead), or handle canonicalization (see our Help Center). If you must be certain that a URL is not indexed, use the robots meta tag or X-Robots-Tag HTTP header instead.
SEOmoz offers a great guide on this topic as well: http://www.seomoz.org/learn-seo/robotstxt
If you desire to go beyond the basic Google and SEOmoz explanation and learn more about this topic, my favorite article related to robots.txt, written by Lindsay, can be found here: http://www.seomoz.org/blog/serious-robotstxt-misuse-high-impact-solutions
-
-
Hi Ryan,
Wouldn't that cause issues with crawl efficiency?
Also, webmaster guidelines say "Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines."
-
Thank you. Are you sure about that?
-
what about if you use "<a title="Click for Help!">Canonical URL" tag ?</a>
You can put this code:
in
/searchhere?page.
-
The best practice would be to add the noindex tag to the search result pages but not the /searchhere page.
Typically speaking, the best robots.txt file is a blank one. The file should only be used as a last resort with respect to blocking content.
-
What you outlined sounds to me like it should work. Disallowing /searchhere? shouldn't disallow the top-level search page at /searchhere, but should disallow all the search result pages with queries after the ?.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is it bad practice to create pages that 404?
We have member pages on our site that are initially empty, until the member does some activity. Currently, since all of these pages are soft 404s, we return a 404 for all these pages and all internal links to them are js links (not links as far as bots are concerned). As soon as the page has content, we switch it to 200 and make the links into regular hrefs. After doing some research, I started thinking that this is not the best way to handle this situation. A better idea would be to noindex/follow the pages (before they have content) and let the links to these pages be real links. I'd love to hear input and feedback from fellow Mozzers. What are your thoughts?
Intermediate & Advanced SEO | | YairSpolter0 -
3 Pages Ranking Beside Each Other | How do I consolidate so one ranks better?
An ecommerce website I own called backyardGamez.com sells outdoor games, for example cornhole boards, bags, etc. One such product is a cornhole board carrying case. If you search the above phrase, my site has three pages that rank on the first page. The term isn't high volume, so I'm assuming that is part of the reason. Is this a good, normal thing or does this mean I have inadvertently broken up my ranking power from one powerful page to 3 OK pages? Does anyone know how I can take two of these pages and use them to make the 3rd page more powerful? For example, I would prefer 1 page ranks higher on page 1 in the serps and the other two fall a bit from supporting the other. Thanks, Adam
Intermediate & Advanced SEO | | Soft-Lite0 -
SEO Best Practice for a multi-language and multi-country website
Hello Moz Community, I hope someone could help me identify the best action to take on an on-page optimization confusion I am currently having. The website I am currently trying to optimize is http://www.riafinancial.com/locations/us/home.aspx. There is an option to view a country specific version of the page, or language version (there are 2 drop down menus on the top, for country or for language). When viewing a country specific version of the page, the URL changes depending on country selected. Some country versions also updates the content to the language of that country, but some remain English. Example, when viewing the France version of the page (http://www.riafinancial.com/locations/FR/home.aspx), the content is updated to french version, but when viewing the China version (http://www.riafinancial.com/locations/CN/home.aspx), the content is in English. This is because we have not yet translated for all countries (this will eventually be all translated). Now, when viewing by language, the URL does NOT change. Example, in http://www.riafinancial.com/locations/us/home.aspx, you can choose French, German, Italian, Polish, etc. The content of the page will change based on language chosen, but the URL (including page titles, meta-descriptions) will not change. My question is, how should I approach this for on-page optimization? Canonical? Hreflang? Any input, feedback, recommendation, suggestion will be greatly appreciated. Thanks! Sharon
Intermediate & Advanced SEO | | RiaMT0 -
2 pages lost page rank and not showing any backlinks in google
Hi we have a business/service related website, 2 of our main pages lost their page rank from 3 to 0 and are not showing any backlinks in google. What could be the possible reason. Please guide me.
Intermediate & Advanced SEO | | Tech_Ahead0 -
Best way to transfer pagerank from one site to another
We currently own two sites (with unique domains) that accomplish a similar goal, but are completely different (so there's no duplicate content, etc) and were developed independently. Both sites have very good pagerank due to great press and inbound links over several years. Also both have thousands of pages and get a lot of inbound deep links. We plan on shutting one of the sites down so we can focus on the other. We'd like to transfer as much traffic and SEO/pagerank value from the one we're shutting down to the one we're continuing to focus on. What's the best way to do that? Should we just do a 301 redirect? Or keep the site running in some diminished form and link it to the site we're focusing on? I saw SEOmoz has a good guide on moving sites http://www.seomoz.org/learn-seo/redirection which recommends a 301 redirect, but I wanted to see if the same applies when merging sites as we are in this case.
Intermediate & Advanced SEO | | 212areacode0 -
Landing Page - Home Page redesign SEO factor question - Serious concern.
Hi Folks, I'm considering making a big change to our website and really need some expert advise. Will we lose ranking if we do what I propose? Our site www.meninkilts.com, needs to split users/clients by "Commercial" and "Residential" so we can message/market completely differently to each client. We are considering doing this structure: Landing Page | | Commercial Homepage Residential Homepage Right now we rank well, for our keywords like "Window Cleaning cityname" but are worried that adding a landing page, and splitting our site to two homepages will effect seo (ie: a landing page would only have two buttons: one for commercial and one for residential). What would be the best way to handle this. Looking forward to your insights! Cheers Brent
Intermediate & Advanced SEO | | MenInKilts0 -
Negative impact on crawling after upload robots.txt file on HTTPS pages
I experienced negative impact on crawling after upload robots.txt file on HTTPS pages. You can find out both URLs as follow. Robots.txt File for HTTP: http://www.vistastores.com/robots.txt Robots.txt File for HTTPS: https://www.vistastores.com/robots.txt I have disallowed all crawlers for HTTPS pages with following syntax. User-agent: *
Intermediate & Advanced SEO | | CommercePundit
Disallow: / Does it matter for that? If I have done any thing wrong so give me more idea to fix this issue.0 -
Do in page links pointing to the parent page make the page more relevant for that term?
Here's a technical question. Suppose I have a page relevant to the term "Mobile Phones". I have a piece of text, on that page talking about "mobile phones", and within that text is the term "cell phones". Now if I link the text "cell phones", to the page it is already placed on (ie the parent page) - will the page gain more relevancy for the term "cell phones"?? Thanks
Intermediate & Advanced SEO | | James770