Best practices for robotx.txt -- allow one page but not the others?
-
So, we have a page, like domain.com/searchhere, but results are being crawled (and shouldn't be), results look like domain.com/searchhere?query1. If I block /searchhere? will it block users from crawling the single page /searchere (because I still want that page to be indexed).
What is the recommended best practice for this?
-
SEOmoz used to use Google Search for the site. I am confident Google has a solid method for keeping their own results clean.
It appears SEOmoz recently changed their search widget. If you examine the URL you shared, notice none of the search results actually appear in the HTML of the page. For example, load the view-source URL and perform a find (CTRL+F) for "testing" which is the subject of the search. There are no results. Since the results are not in the page's HTML, they would not get indexed.
-
If Google is viewing the search result pages as soft 404s, then yes, adding the noindex tag should resolve the problem.
-
And, because google can currently crawl these search result pages, there are a number of soft 404 pages popping up. Would adding a noindex tag to these pages fix the issue?
-
Thanks for the links and help.
How does seomoz keep search results from being indexed? They don't block search results with robots.txt and it doesn't appear that they add the noindex tag to the search result pages.(ex: view-source:http://www.seomoz.org/pages/search_results#stq=testing&stp=1)
-
Yeah, but Ryan's answer is the best one if you can go that route.
-
Hi Michelle,
The concept of crawl efficiency is highly misunderstood. Are all your site's pages being indexed? Is new content or changes indexed in a timely manner? If so, that would indicate your site is being crawled efficiently.
Regarding the link you shared, you are on the right track but need to dig a bit deeper. On the page you shared, find the discussion related to robots.txt. There is a link which will lead you to the following page:
https://developers.google.com/webmasters/control-crawl-index/docs/faq#h01
There you will find a more detailed explanation along with several examples of when not to use robots.txt.
robots.txt: Use it if crawling of your content is causing issues on your server. For example, you may want to disallow crawling of infinite calendar scripts. You should not use the robots.txt to block private content (use server-side authentication instead), or handle canonicalization (see our Help Center). If you must be certain that a URL is not indexed, use the robots meta tag or X-Robots-Tag HTTP header instead.
SEOmoz offers a great guide on this topic as well: http://www.seomoz.org/learn-seo/robotstxt
If you desire to go beyond the basic Google and SEOmoz explanation and learn more about this topic, my favorite article related to robots.txt, written by Lindsay, can be found here: http://www.seomoz.org/blog/serious-robotstxt-misuse-high-impact-solutions
-
-
Hi Ryan,
Wouldn't that cause issues with crawl efficiency?
Also, webmaster guidelines say "Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines."
-
Thank you. Are you sure about that?
-
what about if you use "<a title="Click for Help!">Canonical URL" tag ?</a>
You can put this code:
in
/searchhere?page.
-
The best practice would be to add the noindex tag to the search result pages but not the /searchhere page.
Typically speaking, the best robots.txt file is a blank one. The file should only be used as a last resort with respect to blocking content.
-
What you outlined sounds to me like it should work. Disallowing /searchhere? shouldn't disallow the top-level search page at /searchhere, but should disallow all the search result pages with queries after the ?.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
If I have two brands and I market one in English (BrandA.com) and one in Spanish (BrandB.com), and the websites are identical but in different languages, would that have a negative impact on SEO due to duplicate content?
I have a client who wants a website in Spanish and one in English. Typically we would use a multi-language plugin for a single site (brandA.com/en or /es), but this client markets to their Spanish-speaking constituents under a different brand. So I am wondering if we have BrandA.com in English, and the exact same content in Spanish at BrandB.com if there will be negative SEO implications and/or if it will be recognized as duplicate content by search engines?
Intermediate & Advanced SEO | | Designworks-SJ1 -
Is it better to optimise for several keywords/keyword variations on one page, or create sub categories for those specific terms?
I've done a fair of research to try to find the answer to this, but different people seem to give very different opinions, and none of the info I could find is recent! I'm working with a company that produces a range of industrial products that fit into 6 main categories, within this categories, there are types of products and the products themselves. Prior to my involvement most of the content was added to the product pages and very little was added to the overall category page. The structure works like this: Electronic devices > type of device > products The 'type of device' category could be something like a switch, but within that category are 3/4 different switch types...leaving me with 11 or 12 primary keyword/phrases to aim for as each switch is searched for in more than one way. Should I try to rank for all of those terms using that one category page? Or should I change the structure to something like: Electronic devices > type of device > sub-category/specific variation of device > product This would mean creating a page for each variation to have a more accute focus for a small number of phrases..but it also means I've added another step between the home page and the products. Any advice is welcome! I'm worried I'm overthinking it!
Intermediate & Advanced SEO | | Adam_SEO_Learning0 -
Best practice to redirect all 404s?
Hey is it best practice to redirect all 404 pages. For example if the 404 pages had 0 traffic and no links why would you need to redirect that page? Isn't it best practice just to leave as a 404? Cheers.
Intermediate & Advanced SEO | | kayl870 -
Fresh page versus old page climbing up the rankings.
Hello, I have noticed that if publishe a webpage that google has never seen it ranks right away and usually in a descend position to start with (not great but descend). Usually top 30 to 50 and then over the months it slowly climbs up the rankings. However, if my page has been existing for let's say 3 years and I make changes to it, it takes much longer to climb up the rankings Has someone noticed that too ? and why is that ?
Intermediate & Advanced SEO | | seoanalytics0 -
Can we talk a bit more about cannibalisation? Will Google pick one page and disregard others.
Hi all. I work for an e-commerce site called TOAD Diaries and we've been building some landing pages recently. Our most generic page was for '2017 Diaries'. Take a look here. Initial results are encouraging as this page is ranking top page for a lot of 'long tail' search queries, e.g) '2017 diaries a4', '2017 diaries a5', '2017 diaries week to view' etc. Interesting it doesn't even rank top 50 for the 'head term'... '2017 diaries'. **And our home page outranks it for this search term. **Yet it seems clear that this page is considered relevant and quality by Google it ranks just fine for the long tails. Question: Does this mean Google 'chosen' our home page over the 2017-page landing page? And that's why the 2017-page effectively doesn't rank for it's 'head term'? (I can't see this as many times a website will rank multiple times such as amazon) But any thoughts would be greatly appreciated. Also, what would you do in this scenario? Work on home-page to try to push it up for that term and not worry about the landing page? Any suggestions or thoughts would be greatly appreciated. Hope that makes sense. Do shout if not. Thanks in advance. Isaac.
Intermediate & Advanced SEO | | isaac6630 -
Can noindexed pages accrue page authority?
My company's site has a large set of pages (tens of thousands) that have very thin or no content. They typically target a single low-competition keyword (and typically rank very well), but the pages have a very high bounce rate and are definitely hurting our domain's overall rankings via Panda (quality ranking). I'm planning on recommending we noindexed these pages temporarily, and reindex each page as resources are able to fill in content. My question is whether an individual page will be able to accrue any page authority for that target term while noindexed. We DO want to rank for all those terms, just not until we have the content to back it up. However, we're in a pretty competitive space up against domains that have been around a lot longer and have higher domain authorities. Like I said, these pages rank well right now, even with thin content. The worry is if we noindex them while we slowly build out content, will our competitors get the edge on those terms (with their subpar but continually available content)? Do you think Google will give us any credit for having had the page all along, just not always indexed?
Intermediate & Advanced SEO | | THandorf0 -
Can a home page penalty cause a drop in rankings for all pages?
All my main keywords have dropped out of the SERPS. Could it be that the home page (the strongest) page has been devalued and therefore 'link juice' that used to spread throughout the site is no longer doing so. Would this cause all other pages to drop? I just can't understand how all my pages have lost rankings. The site is still indexed so there's no problem there.
Intermediate & Advanced SEO | | SamCUK0 -
Dynamic pages - ecommerce product pages
Hi guys, Before I dive into my question, let me give you some background.. I manage an ecommerce site and we're got thousands of product pages. The pages contain dynamic blocks and information in these blocks are fed by another system. So in a nutshell, our product team enters the data in a software and boom, the information is generated in these page blocks. But that's not all, these pages then redirect to a duplicate version with a custom URL. This is cached and this is what the end user sees. This was done to speed up load, rather than the system generate a dynamic page on the fly, the cache page is loaded and the user sees it super fast. Another benefit happened as well, after going live with the cached pages, they started getting indexed and ranking in Google. The problem is that, the redirect to the duplicate cached page isn't a permanent one, it's a meta refresh, a 302 that happens in a second. So yeah, I've got 302s kicking about. The development team can set up 301 but then there won't be any caching, pages will just load dynamically. Google records pages that are cached but does it cache a dynamic page though? Without a cached page, I'm wondering if I would drop in traffic. The view source might just show a list of dynamic blocks, no content! How would you tackle this? I've already setup canonical tags on the cached pages but removing cache.. Thanks
Intermediate & Advanced SEO | | Bio-RadAbs0