Block all search results (dynamic) in robots.txt?

rhutchings

I know that google does not want to index "search result" pages for a lot of reasons (dup content, dynamic urls, blah blah). I recently optimized the entire IA of my sites to have search friendly urls, whcih includes search result pages. So, my search result pages changed from:

/search?12345&productblue=true&id789

to

/product/search/blue_widgets/womens/large

As a result, google started indexing these pages thinking they were static (no opposition from me :)), but i started getting WMT messages saying they are finding a "high number of urls being indexed" on these sites. Should I just block them altogether, or let it work itself out?

onwebtoday

You can block the urls which has term "/product/search/" in them. It can be easily done by adding the following to the robots.txt

User-agent: *
Disallow: /product/search/

Hope this helps...

komeksimas

As you said: The increasing number of pages indexed will dilute the link juice of the entire site.

Can you give more example? Or just a tip where to search for this kind of information?

Thank you.

CleverPhD

I would agree with BK Search. You want to minimize what Google has to crawl (I know this sounds backwards) so that Google focuses on the pages that you want to rank.

Long term, why would you waste GoogleBot's time on pages that don't matter as much? What if you had an update on a more important page and GoogleBot is too busy indexing this infinite loop of pages.

At this point, I would use the noindex meta tag vs robots.txt so that google will crawl and remove all the urls from the index. Then you can drop it in later into robots.txt so it will stop crawling. Otherwise you may end up with a lot of junk in the index.

BKSearch

I might be a little different than some of these answers but I would recommend that you exclude them from getting indexed.

The reasons I would do that are that:

You know it is largely duplicate content and goes down to the same pages as your categories.

Google has stated that they would prefer to not have it indexed.

The increasing number of pages indexed will dilute the link juice of the entire site.

There is also the possibility that people using the url bar of their browser will start to increase the number of pages indexed by a large manner.

A competitor could create thousands of links to these pages and create a huge footprint that is search pages.

And finally, I like having product pages ranking highly if at all possible.

I would do this with both the robots.txt file and the GWMT exclusion on /product/search/ directory

Good Luck!

KeriMorgret

Hi! We're going through some of the older unanswered questions and seeing if people still have questions or if they've gone ahead and implemented something and have any lessons to share with us. Can you give an update, or mark your question as answered?

Thanks!

rhutchings

As a follow-up or further info: Its been about 5 months since the change. I do get some traffic from these indexed pages (not a ton, but enough that i would like to not block if there is no negative impact). The SE interaction seems to be confusion- they index the content, but also recognize that something may not be right. So I am wondering if anyone else has done something similar or is trying this.

Admitidly this is what i wanted the new url structure to do- as an experiment. Just looking for anyone else who has/is doing similar

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Block all search results (dynamic) in robots.txt?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

In Google Search Results ....Is it a site link or what? How to get this?

301 redirect to search results page?

Default Robots.txt in WordPress - Should i change it??

Site: inurl: Search

Rich Snippets not appearing in Search Results

Soft 404's from pages blocked by robots.txt -- cause for concern?

Negative impact on crawling after upload robots.txt file on HTTPS pages

Subdomains - duplicate content - robots.txt