Block all search results (dynamic) in robots.txt?
-
I know that google does not want to index "search result" pages for a lot of reasons (dup content, dynamic urls, blah blah). I recently optimized the entire IA of my sites to have search friendly urls, whcih includes search result pages. So, my search result pages changed from:
- /search?12345&productblue=true&id789
to
- /product/search/blue_widgets/womens/large
As a result, google started indexing these pages thinking they were static (no opposition from me :)), but i started getting WMT messages saying they are finding a "high number of urls being indexed" on these sites. Should I just block them altogether, or let it work itself out?
-
You can block the urls which has term "/product/search/" in them. It can be easily done by adding the following to the robots.txt
User-agent: * Disallow: /product/search/ Hope this helps...
-
As you said: The increasing number of pages indexed will dilute the link juice of the entire site.
Can you give more example? Or just a tip where to search for this kind of information?
Thank you.
-
I would agree with BK Search. You want to minimize what Google has to crawl (I know this sounds backwards) so that Google focuses on the pages that you want to rank.
Long term, why would you waste GoogleBot's time on pages that don't matter as much? What if you had an update on a more important page and GoogleBot is too busy indexing this infinite loop of pages.
At this point, I would use the noindex meta tag vs robots.txt so that google will crawl and remove all the urls from the index. Then you can drop it in later into robots.txt so it will stop crawling. Otherwise you may end up with a lot of junk in the index.
-
I might be a little different than some of these answers but I would recommend that you exclude them from getting indexed.
The reasons I would do that are that:
You know it is largely duplicate content and goes down to the same pages as your categories.
Google has stated that they would prefer to not have it indexed.
The increasing number of pages indexed will dilute the link juice of the entire site.
There is also the possibility that people using the url bar of their browser will start to increase the number of pages indexed by a large manner.
A competitor could create thousands of links to these pages and create a huge footprint that is search pages.
And finally, I like having product pages ranking highly if at all possible.
I would do this with both the robots.txt file and the GWMT exclusion on /product/search/ directory
Good Luck!
-
Hi! We're going through some of the older unanswered questions and seeing if people still have questions or if they've gone ahead and implemented something and have any lessons to share with us. Can you give an update, or mark your question as answered?
Thanks!
-
As a follow-up or further info: Its been about 5 months since the change. I do get some traffic from these indexed pages (not a ton, but enough that i would like to not block if there is no negative impact). The SE interaction seems to be confusion- they index the content, but also recognize that something may not be right. So I am wondering if anyone else has done something similar or is trying this.
Admitidly this is what i wanted the new url structure to do- as an experiment. Just looking for anyone else who has/is doing similar
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
In Google Search Results ....Is it a site link or what? How to get this?
Hello Experts, When I search in google any keyword like abcd in search results for one website after meta description there are showing few links of website ( image attached ) Can you please let me know what is this & how to achieve such type of links? Thanks! mdJBLYb
Intermediate & Advanced SEO | | wright3350 -
Why differents browsers return different search results?
Hi everyone, I don't understand the reason why if I delete cookies, chronology, set anonymous way surfing in Chorme and Safari, I have different results on Google. I tried it from the same pc and at the same time. Searching in google the query "vangogh" the internet site "www.vangogh-creative.it" is shown in the first page in Chrome but not in Safari. I asked in Google webmaster forum, but nobody seems to know the reason of this behavior. Can anyone help me? Thanks in advance. Massimiliano
Intermediate & Advanced SEO | | vanGoGh-creative0 -
Google search results
I have been doing some searches on google to see where my new site shows up, I started using the search words "graphic design firm st. louis" as a gauge, because my title is St. Louis Missouri Graphic Design Firm. I showed up on about page 5 to start , if I include the word "firm" and a few pages further back if I just search "graphic design st. louis", without the word firm. It seemed i was slowly moving up pages with both searches and then a few days ago I jumped to page 1 for search "graphic design firm st. louis" the thing is it doesnt show up at all now if i search "graphic design st. louis" without the word firm. what would cause the one search to jump so high while the other one dissapeared completely?? and what can i do? my keyword density is same for both , any ideas.
Intermediate & Advanced SEO | | eric69660 -
Do these results indicate a problem with my seo?
I've entered my the following search query into Google.co.uk related:mywebsite.co.uk However the resulting website that are brought back are on the whole nothing like our website, nor do they offer similar services to us. If I run this same query on my competitors websites they all bring back similar websites to each other. I read somewhere that gaining links from the websites that Google believes are similar/related to our own website is beneficial. But looking at our results it would seem that Google can't place what our site is about and which sites are similar. So I'm guessing this is a more pressing matter than link building right now!? Other info about our website: We rank fairly well for a lot of our target keywords.
Intermediate & Advanced SEO | | adamlcasey
Domain age = 11 years
PA =38
mR= 4.77
mT= 5.74
DA:= 31
DmR= 3.78
DmT= 3.84
PageRank = 3 Example of how random the results are the 1st website that comes back in our related websites search is for Doctors GP Practice. Our website sells GPS Telematics Solutions. Can anyone shed any light on this or just to confirm how much of a problem this is?0 -
MOZ crawl report says category pages blocked by meta robots but theyr'e not?
I've just run a SEOMOZ crawl report and it tells me that the category pages on my site such as http://www.top-10-dating-reviews.com/category/online-dating/ are blocked by meta robots and have the meta robots tag noindex,follow. This was the case a couple of days ago as I run wordpress and am using the SEO Category updater plugin. By default it appears it makes categories noindex, follow. Therefore I edited the plugin so that the default was index, follow as I want google to index the category pages so that I can build links to them. When I open the page in a browser and view source the tags show as index, follow which adds up. Why then is the SEOMOZ report telling me they are still noindex,follow? Presumably the crawl is in real time and should pick up the new follow tag or is it perhaps because its using data from an old crawl? As yet these pages aren't indexed by google. Any help is much appreciated! Thanks Sam.
Intermediate & Advanced SEO | | SamCUK0 -
Robots.txt file - How to block thosands of pages when you don't have a folder path
Hello.
Intermediate & Advanced SEO | | Unity
Just wondering if anyone has come across this and can tell me if it worked or not. Goal:
To block review pages Challenge:
The URLs aren't constructed using folders, they look like this:
www.website.com/default.aspx?z=review&PG1234
www.website.com/default.aspx?z=review&PG1235
www.website.com/default.aspx?z=review&PG1236 So the first part of the URL is the same (i.e. /default.aspx?z=review) and the unique part comes immediately after - so not as a folder. Looking at Google recommendations they show examples for ways to block 'folder directories' and 'individual pages' only. Question:
If I add the following to the Robots.txt file will it block all review pages? User-agent: *
Disallow: /default.aspx?z=review Much thanks,
Davinia0 -
Could you use a robots.txt file to disalow a duplicate content page from being crawled?
A website has duplicate content pages to make it easier for users to find the information from a couple spots in the site navigation. Site owner would like to keep it this way without hurting SEO. I've thought of using the robots.txt file to disallow search engines from crawling one of the pages. Would you think this is a workable/acceptable solution?
Intermediate & Advanced SEO | | gregelwell0 -
Blocking Dynamic URLs with Robots.txt
Background: My e-commerce site uses a lot of layered navigation and sorting links. While this is great for users, it ends up in a lot of URL variations of the same page being crawled by Google. For example, a standard category page: www.mysite.com/widgets.html ...which uses a "Price" layered navigation sidebar to filter products based on price also produces the following URLs which link to the same page: http://www.mysite.com/widgets.html?price=1%2C250 http://www.mysite.com/widgets.html?price=2%2C250 http://www.mysite.com/widgets.html?price=3%2C250 As there are literally thousands of these URL variations being indexed, so I'd like to use Robots.txt to disallow these variations. Question: Is this a wise thing to do? Or does Google take into account layered navigation links by default, and I don't need to worry. To implement, I was going to do the following in Robots.txt: User-agent: * Disallow: /*? Disallow: /*= ....which would prevent any dynamic URL with a '?" or '=' from being indexed. Is there a better way to do this, or is this a good solution? Thank you!
Intermediate & Advanced SEO | | AndrewY1