Best practices for robotx.txt -- allow one page but not the others?
-
So, we have a page, like domain.com/searchhere, but results are being crawled (and shouldn't be), results look like domain.com/searchhere?query1. If I block /searchhere? will it block users from crawling the single page /searchere (because I still want that page to be indexed).
What is the recommended best practice for this?
-
SEOmoz used to use Google Search for the site. I am confident Google has a solid method for keeping their own results clean.
It appears SEOmoz recently changed their search widget. If you examine the URL you shared, notice none of the search results actually appear in the HTML of the page. For example, load the view-source URL and perform a find (CTRL+F) for "testing" which is the subject of the search. There are no results. Since the results are not in the page's HTML, they would not get indexed.
-
If Google is viewing the search result pages as soft 404s, then yes, adding the noindex tag should resolve the problem.
-
And, because google can currently crawl these search result pages, there are a number of soft 404 pages popping up. Would adding a noindex tag to these pages fix the issue?
-
Thanks for the links and help.
How does seomoz keep search results from being indexed? They don't block search results with robots.txt and it doesn't appear that they add the noindex tag to the search result pages.(ex: view-source:http://www.seomoz.org/pages/search_results#stq=testing&stp=1)
-
Yeah, but Ryan's answer is the best one if you can go that route.
-
Hi Michelle,
The concept of crawl efficiency is highly misunderstood. Are all your site's pages being indexed? Is new content or changes indexed in a timely manner? If so, that would indicate your site is being crawled efficiently.
Regarding the link you shared, you are on the right track but need to dig a bit deeper. On the page you shared, find the discussion related to robots.txt. There is a link which will lead you to the following page:
https://developers.google.com/webmasters/control-crawl-index/docs/faq#h01
There you will find a more detailed explanation along with several examples of when not to use robots.txt.
robots.txt: Use it if crawling of your content is causing issues on your server. For example, you may want to disallow crawling of infinite calendar scripts. You should not use the robots.txt to block private content (use server-side authentication instead), or handle canonicalization (see our Help Center). If you must be certain that a URL is not indexed, use the robots meta tag or X-Robots-Tag HTTP header instead.
SEOmoz offers a great guide on this topic as well: http://www.seomoz.org/learn-seo/robotstxt
If you desire to go beyond the basic Google and SEOmoz explanation and learn more about this topic, my favorite article related to robots.txt, written by Lindsay, can be found here: http://www.seomoz.org/blog/serious-robotstxt-misuse-high-impact-solutions
-
-
Hi Ryan,
Wouldn't that cause issues with crawl efficiency?
Also, webmaster guidelines say "Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines."
-
Thank you. Are you sure about that?
-
what about if you use "<a title="Click for Help!">Canonical URL" tag ?</a>
You can put this code:
in
/searchhere?page.
-
The best practice would be to add the noindex tag to the search result pages but not the /searchhere page.
Typically speaking, the best robots.txt file is a blank one. The file should only be used as a last resort with respect to blocking content.
-
What you outlined sounds to me like it should work. Disallowing /searchhere? shouldn't disallow the top-level search page at /searchhere, but should disallow all the search result pages with queries after the ?.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Prioritise a page in Google/why is a well-optimised page not ranking
Hello I'm new to Moz Forums and was wondering if anyone out there could help with a query. My client has an ecommerce site selling a range of pet products, most of which have multiple items in the range for difference size animals i.e. [Product name] for small dog
Intermediate & Advanced SEO | | LauraSorrelle
[Product name] for medium dog
[Product name] for large dog
[Product name] for extra large dog I've got some really great rankings (top 3) for many keyword searches such as
'[product name] for dogs'
'[product name]' But these rankings are for individual product pages, meaning the user is taken to a small dog product page when they might have a large dog or visa versa. I felt it would be better for the users (and for conversions and bounce rates), if there was a group page which showed all products in the range which I could target keywords '[product name]', '[product name] for dogs'. The page would link through the the individual product pages. I created some group pages in autumn last year to trial this and, although they are well-optimised (score of 98 on Moz's optimisation tool), they are not ranking well. They are indexed, but way down the SERPs. The same group page format has been used for the PPC campaign and the difference to the retention/conversion of visitors is significant. Why are my group pages not ranking? Is it because my client's site already has good rankings for the target term and Google does not want to show another page of the site and muddy results?
Is there a way to prioritise the group page in Google's eyes? Or bring it to Google's attention? Any suggestions/advice welcome. Thanks in advance Laura0 -
Optimizing for Two Keywords - H Tag Best Practices?
Hey Everyone, I have to do a local SEO campaign. My landing pages need to target two keywords. I was wondering if you could look over this proposed H tags I've written and give me your thoughts. Houses for Sale and Commercial Real Estate in Houston, TX Houses for Sale in Houston, TX Commercial Real Estate in Houston, TX Am I heading in the right or wrong direction?
Intermediate & Advanced SEO | | Charles_Murdock0 -
Do I eventually 301 a page on our site that "expires," to a page that's related, but never expires, just to utilize the inbound link juice?
Our company gets inbound links from news websites that write stories about upcoming sporting events. The links we get are pointing to our event / ticket inventory pages on our commerce site. Once the event has passed, that event page is basically a dead page that shows no ticket inventory, and has no content. Also, each “event” page on our site has a unique url, since it’s an event that will eventually expire, as the game gets played, or the event has passed. Example of a url that a news site would link to: mysite.com/tickets/soldier-field/t7493325/nfc-divisional-home-game-chicago bears-vs-tbd-tickets.aspx Would there be any negative ramifications if I set up a 301 from the dead event page to another page on our site, one that is still somewhat related to the product in question, a landing page with content related to the team that just played, or venue they play in all season. Example, I would 301 to: mysite.com/venue/soldier-field tickets.aspx (This would be a live page that never expires.) I don’t know if that’s manipulating things a bit too much.
Intermediate & Advanced SEO | | Ticket_King1 -
301 redirect for page 2, page 3 etc of an article or feed
Hey guys, We're looking to move a blog feed we have to a new static URL page. We are using 301 redirects but I'm unsure of what to regarding page 2, page 3 etc. of the feed. How do I make sure those urls are being redirected as well? For example: Moving FloridaDentist.com/blog/dental-tips/ to a new page url FloridaDentist.com/dental-tips. So, we are using a 301 on that old url to the new one. My questions is what to do with the other pages like FloridaDentist.com/blog/dental-tips/page/3. How do we make sure that page is also 301'd to the new main url?
Intermediate & Advanced SEO | | RickyShockley0 -
I want to Disavow some more links - but I'm only allowed one .txt file?
Hey guys, Wondering if you good people could help me out on this one? A few months back (June 19) I disavowed some links for a client having uploaded a .txt file with the offending domains attached. However, recently I've noticed some more dodgy-looking domains being indexed to my client's site so went about creating a new "Disavow List". When I went to upload this new list I was informed that I would be replacing the existing file. So, my question is, what do I do here? Make a new list with both old and new domains that I plan on disavowing and replace the existing one? Or; Just replace the existing .txt file with the new file because Google has recognised I've already disavowed those older links?
Intermediate & Advanced SEO | | Webrevolve0 -
Need to know best practices of Search Engine Optimization 2013
I want to know best practices of Search Engine Optimization 2013 and also need best possible sources. Thanks
Intermediate & Advanced SEO | | GM0070 -
Splitting one page into two
Good day everyone! If you have a page that ranks well for two highly competitive, yet mutually irrelevant, terms, but that the page will be split into two as part of a website redesign, would you 301 it to term X or term Y? What criteria do you use? Are there any other things I should do to avoid the wrong page ranking for the wrong term? I don't want users searching for term X to end up in page Y. Thanks!
Intermediate & Advanced SEO | | andrep0 -
Amount of pages indexed for classified (number of pages for the same query)
I've notice that classified usually has a lots of pages indexed and that's because for each query/kw they index the first 100 results pages, normally they have 10 results per page. As an example imagine the site www.classified.com, for the query/kw "house for rent new york" there is the page www.classified.com/houses/house-for-rent-new-york and the "index" is set for the first 100 SERP pages, so www.classified.com/houses/house-for-rent-new-york www.classified.com/houses/house-for-rent-new-york-1 www.classified.com/houses/house-for-rent-new-york-2 ...and so on. Wouldn't it better to index only the 1st result page? I mean in the first 100 pages lots of ads are very similar so why should Google be happy by indexing lots of similar pages? Could Google penalyze this behaviour? What's your suggestions? Many tahnks in advance for your help.
Intermediate & Advanced SEO | | nuroa-2467120