Best practices for robotx.txt -- allow one page but not the others?
-
So, we have a page, like domain.com/searchhere, but results are being crawled (and shouldn't be), results look like domain.com/searchhere?query1. If I block /searchhere? will it block users from crawling the single page /searchere (because I still want that page to be indexed).
What is the recommended best practice for this?
-
SEOmoz used to use Google Search for the site. I am confident Google has a solid method for keeping their own results clean.
It appears SEOmoz recently changed their search widget. If you examine the URL you shared, notice none of the search results actually appear in the HTML of the page. For example, load the view-source URL and perform a find (CTRL+F) for "testing" which is the subject of the search. There are no results. Since the results are not in the page's HTML, they would not get indexed.
-
If Google is viewing the search result pages as soft 404s, then yes, adding the noindex tag should resolve the problem.
-
And, because google can currently crawl these search result pages, there are a number of soft 404 pages popping up. Would adding a noindex tag to these pages fix the issue?
-
Thanks for the links and help.
How does seomoz keep search results from being indexed? They don't block search results with robots.txt and it doesn't appear that they add the noindex tag to the search result pages.(ex: view-source:http://www.seomoz.org/pages/search_results#stq=testing&stp=1)
-
Yeah, but Ryan's answer is the best one if you can go that route.
-
Hi Michelle,
The concept of crawl efficiency is highly misunderstood. Are all your site's pages being indexed? Is new content or changes indexed in a timely manner? If so, that would indicate your site is being crawled efficiently.
Regarding the link you shared, you are on the right track but need to dig a bit deeper. On the page you shared, find the discussion related to robots.txt. There is a link which will lead you to the following page:
https://developers.google.com/webmasters/control-crawl-index/docs/faq#h01
There you will find a more detailed explanation along with several examples of when not to use robots.txt.
robots.txt: Use it if crawling of your content is causing issues on your server. For example, you may want to disallow crawling of infinite calendar scripts. You should not use the robots.txt to block private content (use server-side authentication instead), or handle canonicalization (see our Help Center). If you must be certain that a URL is not indexed, use the robots meta tag or X-Robots-Tag HTTP header instead.
SEOmoz offers a great guide on this topic as well: http://www.seomoz.org/learn-seo/robotstxt
If you desire to go beyond the basic Google and SEOmoz explanation and learn more about this topic, my favorite article related to robots.txt, written by Lindsay, can be found here: http://www.seomoz.org/blog/serious-robotstxt-misuse-high-impact-solutions
-
-
Hi Ryan,
Wouldn't that cause issues with crawl efficiency?
Also, webmaster guidelines say "Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines."
-
Thank you. Are you sure about that?
-
what about if you use "<a title="Click for Help!">Canonical URL" tag ?</a>
You can put this code:
in
/searchhere?page.
-
The best practice would be to add the noindex tag to the search result pages but not the /searchhere page.
Typically speaking, the best robots.txt file is a blank one. The file should only be used as a last resort with respect to blocking content.
-
What you outlined sounds to me like it should work. Disallowing /searchhere? shouldn't disallow the top-level search page at /searchhere, but should disallow all the search result pages with queries after the ?.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What are the best practices with website redesign & redirects?
I have a website that is not very pretty but has great rankings. I want to redesign the website and loose as little rankings as possible and still clean up the navigation. What are the best practices? Thanks in advance.
Intermediate & Advanced SEO | | JHSpecialty0 -
Putting "noindex" on a page that's in an iframe... what will that mean for the parent page?
If I've got a page that is being called in an iframe, on my homepage, and I don't want that called page to be indexed.... so I put a noindex tag on the called page (but not on the homepage) what might that mean for the homepage? Nothing? Will Google, Bing, Yahoo, or anyone else, potentially see that as a noindex tag on my homepage?
Intermediate & Advanced SEO | | Philip-DiPatrizio0 -
Best way to handle page filters and sorts
Hello Mozzers, I have a question that has to do with the best way to handle filters and sorts with Googlebot. I have a page that returns a list of widgets. I have a "root" page about widgets and then filter and sort functionality that shows basically the same content but adds parameters to the URL. For example, if you filter the page of 10 widgets by color, the page returns 3 red widgets on the top, and 7 non-red widgets on the bottom. If you sort by size, the page shows the same 10 widgets sorted by size. We use traditional php url parameters to pass filters and sorts, so obviously google views this as a separate URL. Right now we really don't do anything special in Google, but I have noticed in the SERPs sometimes if I search for "Widgets" my "Widgets" and "Widgets - Blue" both rank close to each other, which tells me Google basically (rightly) thinks these are all just pages about Widgets. Ideally though I'd just want to rank for my "Widgets" root page. What is the best way to structure this setup for googlebot? I think it's maybe one or many of the following, but I'd love any advice: put rel canonical tag on all of the pages with parameters and point to "root" use the google parameter tool and have it not crawl any urls with my parameters put meta no robots on the parameter pages Thanks!
Intermediate & Advanced SEO | | jcgoodrich0 -
I have removed over 2000+ pages but Google still says i have 3000+ pages indexed
Good Afternoon, I run a office equipment website called top4office.co.uk. My predecessor decided that he would make an exact copy of the content on our existing site top4office.com and place it on the top4office.co.uk domain which included over 2k of thin pages. Since coming in i have hired a copywriter who has rewritten all the important content and I have removed over 2k pages of thin pages. I have set up 301's and blocked the thin pages using robots.txt and then used Google's removal tool to remove the pages from the index which was successfully done. But, although they were removed and can now longer be found in Google, when i use site:top4office.co.uk i still have over 3k of indexed pages (Originally i had 3700). Does anyone have any ideas why this is happening and more importantly how i can fix it? Our ranking on this site is woeful in comparison to what it was in 2011. I have a deadline and was wondering how quickly, in your opinion, do you think all these changes will impact my SERPs rankings? Look forward to your responses!
Intermediate & Advanced SEO | | apogeecorp0 -
Web pages fighting over rank for one keyword. Can it be stopped?
Hey, See attachment. Website is Omega Red. The page I want to rank for seems like it is being held back by other closely related pages with similar titles. I am looking to rank for electrical earthing with this page. On the graph it shows how the other pages have interacted over a period of time on the website and how if they drop out of the top 50 this page then moves up in Google. I don't really want to canonicalise the other pages into one but maybe this is what needs to happen? Any suggestions? bWymgVt.jpg
Intermediate & Advanced SEO | | Hughescov0 -
Retail Store Detail Page and Local SEO Best Practices
We are working with a large retailer that has specific pages for each store they run. We are interested in leveraging the best practices that are out their specifically for local search. Our current issue is around URL design for the stores pages themselves. Currently, we have store URL's such as: /store/12584 The number is a GUID like character that means nothing to search engines or, frankly, humans. Is there a better way we could model this URL for increased relevancy for local retail search? For example: adding store name:
Intermediate & Advanced SEO | | mongillo
www.domain.com/store/1st-and-denny-new-york-city/23421
(example http://www.apple.com/retail/universityvillage/) fully explicit URI www.domain.com/store/us/new-york/new-york-city/10027/bronx/23421
(example http://www.patagonia.com/us/patagonia-san-diego-2185-san-elijo-avenue-cardiff-by-the-sea-california-92007?assetid=5172) the idea with this second version is that we'd make the URL structure more rich and detailed which might help for local search. Would there be a best practice or recommendation as to how we should model this URL? We are also working to create an on-page optimization but we're specifically interested in local seo strategy and URL design.0 -
Best Practices for Pagination on E-commerce Site
One of my e-commerce clients has a script enabled on their category pages that allows more products to automatically be displayed as you scroll down. They use this instead of page 1, 2, and a view all. I'm trying to decide if I want to insist that they change back to the traditional method of multiple pages with a view all button, and then implement rel="next", rel="prev", etc. I think the current auto method is disorienting for the user, but I can't figure out if it's the same for the spiders. Does anyone have any experience with this, or thoughts? Thanks!
Intermediate & Advanced SEO | | smallbox0 -
Redirecting One Page of Content on Domain A to Domain B
Let's say I have a nice page of content on Domain A, which is a strong domain. That page has a nice number of links from other websites and ranks on the first page of the SERPs for some good keywords. However, I would like to move that single page of content to Domain B using a 301 redirect. Domain B is a slightly weaker domain, however, it has better assets to monetize the traffic that visits this page of content. I expect that the rankings might slip down a few places but I am hoping that I will at least keep some of the credit for the inbound links from other websites. Has anyone ever done this? Did it work as you expected? Did the content hold its rankings after being moved? Any advice or philosophical opinions on this? Thank you!
Intermediate & Advanced SEO | | EGOL2