Robot.txt pattern matching

STPseo

Hola fellow SEO peoples!

Site: http://www.sierratradingpost.com

robot: http://www.sierratradingpost.com/robots.txt

Please see the following line: Disallow: /keycodebypid~*

We are trying to block URLs like this:

http://www.sierratradingpost.com/keycodebypid~8855/for-the-home~d~3/kitchen~d~24/

but we still find them in the Google index.

1. we are not sure if we need to specify the robot to use pattern matching.

2. we are not sure if the format is correct. Should we use Disallow: /keycodebypid*/ or /*keycodebypid/ or even /*keycodebypid~/?

What is even more confusing is that the meta robot command line says "noindex" - yet they still show up. <meta name="robots" content="noindex, follow, noarchive" />

Thank you!

SEOSHARK

ok, so not sure sure this was shared. Matt Cutts talking on this same subject.

| | <cite class="kvm">www.youtube.com/watch?v=I2giR-WKUfY</cite> |

STPseo

John, The article was a real eye-opener!Thanks again!

john4math

Somehow Google is finding these pages, but you're disallowing the Googlebot from reading the page, so it doesn't know anything about the meta noindex tag on the page. If you have meta noindex tags on all of these pages, you can remove that line in your robots.txt preventing bots from reading these pages, and as Google crawls these pages, they should remove them from their SERPs.

STPseo

Great point! I will remember that. However I have both the disallow line in the robots.txt file and I also have the noindex meta command. Yet Google shows 3000 of them!?!?!?!

http://www.google.com/search?q=site%3Awww.sierratradingpost.com+keycodebypid

cfguti

Well done John!!!

cfguti

Hi,

then you have the robots.txt and the meta tag. I think its better the metatag (http://www.seomoz.org/learn-seo/robotstxt)

Have you WebMaster Tools in your web? you can test your robots.txt file (http://www.google.com/support/webmasters/bin/answer.py?answer=156449)

john4math

Here's a good SEOMoz post about this: http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts. What's most likely happening is that the disallow in robots.txt is preventing the bots from indexing the page, so they're not going to find the meta noindex tag. If people link to one of these pages externally, the disallow in robots.txt does not prevent the page from appearing in search results.

The robots.txt syntax you're using now looks correct to me for what you're trying to do.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Robot.txt pattern matching

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Adding your sitemap to robots.txt

What are the negative implications of listing URLs in a sitemap that are then blocked in the robots.txt?

Guys & Gals anyone know if urllist.txt is still used?

Is having no robots.txt file the same as having one and allowing all agents?

Google (GWT) says my homepage and posts are blocked by Robots.txt

Impact of "restricted by robots" crawler error in WT

How long does it take for traffic to bounce back from and accidental robots.txt disallow of root?

Robots.txt Syntax