Skip indexing the search pages

mtthompsons

Hi,

I want all such search pages skipped from indexing

www.somesite.com/search/node/

So i have this in robots.txt (Disallow: /search/)

Now any posts that start with search are being blocked and in Google i see this message

A description for this result is not available because of this site's robots.txt – learn more.

How can i handle this and also how can i find all URL's that Google is blocking from showing

Thanks

Mark_Ginsberg

Sure - you have urls that are being blocked by robots - you have this line in your robots.txt -

Disallow: /questions/search

It is thus preventing urls from within that folder, questions, which start with the word search from being crawled. What are you trying to accomplish with this block? If it's the folder search, within questions, it should be /questions/search/.

And the other warning is telling you these pages take a long time to load - check your server or these individual pages and see why that is taking so long.

mtthompsons

Thanks a lot, I assumed this because of the below 2 screenshots

The Sitemap shows warnings,. Is this something that you can help with identifying why we get these errors. 2 images that explain more

ojpbkJO PLlTbxW

Mark_Ginsberg

As Saijo said above, the meta robots noindex tag is the way to go. When you block a folder via robots.txt, you prevent Google from visiting and crawling that folder and any content within it. If Google has already crawled the content, they won't remove the content from their index just if you block it with robots.txt. The old version they have of the page will be stored and saved in their index, and they just won't be able to show you an updated snippet of the page due to the robots.txt block.

To remove the pages from the index completely, you can do one of 2 things -

in webmaster tools, go to the url removal section, and remove that folder from the index - this will only work when it's blocked via robots.txt
you can add a meta robots noindex tag to the pages/page template, and remove the robots.txt block - you need to remove the robots.txt block so the search engines can recrawl the pages, see the meta robots directive, and follow the noindex guide to remove the page.

In general, I would recommend using the meta robots noindex directive over the robots.txt, because it should work for all search engines, and you won't have to go into webmaster tools for each one. You also will ensure that you don't accidentally block other urls.

From your example above, if you just blocked the folder /search/, a page that includes the word search in the url but isn't in the blocked folder shouldn't be blocked from the search engines because of that line - I would check in webmaster tools the robots.txt section, because it doesn't look to me, based on your robots.txt file, that any url with search in it should be blocked.

Good luck,

Mark