Robot.txt pattern matching

STPseo

Hola fellow SEO peoples!

Site: http://www.sierratradingpost.com

robot: http://www.sierratradingpost.com/robots.txt

Please see the following line: Disallow: /keycodebypid~*

We are trying to block URLs like this:

http://www.sierratradingpost.com/keycodebypid~8855/for-the-home~d~3/kitchen~d~24/

but we still find them in the Google index.

1. we are not sure if we need to specify the robot to use pattern matching.

2. we are not sure if the format is correct. Should we use Disallow: /keycodebypid*/ or /*keycodebypid/ or even /*keycodebypid~/?

What is even more confusing is that the meta robot command line says "noindex" - yet they still show up. <meta name="robots" content="noindex, follow, noarchive" />

Thank you!

SEOSHARK

ok, so not sure sure this was shared. Matt Cutts talking on this same subject.

| | <cite class="kvm">www.youtube.com/watch?v=I2giR-WKUfY</cite> |

STPseo

John, The article was a real eye-opener!Thanks again!

john4math

Somehow Google is finding these pages, but you're disallowing the Googlebot from reading the page, so it doesn't know anything about the meta noindex tag on the page. If you have meta noindex tags on all of these pages, you can remove that line in your robots.txt preventing bots from reading these pages, and as Google crawls these pages, they should remove them from their SERPs.

STPseo

Great point! I will remember that. However I have both the disallow line in the robots.txt file and I also have the noindex meta command. Yet Google shows 3000 of them!?!?!?!

http://www.google.com/search?q=site%3Awww.sierratradingpost.com+keycodebypid

cfguti

Well done John!!!

cfguti

Hi,

then you have the robots.txt and the meta tag. I think its better the metatag (http://www.seomoz.org/learn-seo/robotstxt)

Have you WebMaster Tools in your web? you can test your robots.txt file (http://www.google.com/support/webmasters/bin/answer.py?answer=156449)

john4math

Here's a good SEOMoz post about this: http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts. What's most likely happening is that the disallow in robots.txt is preventing the bots from indexing the page, so they're not going to find the meta noindex tag. If people link to one of these pages externally, the disallow in robots.txt does not prevent the page from appearing in search results.

The robots.txt syntax you're using now looks correct to me for what you're trying to do.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Robot.txt pattern matching

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

No: 'noindex' detected in 'robots' meta tag

Website URL, Robots.txt and Google Search Console (www. vs non www.)

Robots.txt on subdomains

Google Indexing Development Site Despite Robots.txt Block

Blocked by robots

Wordpress Robots.txt Sitemap submission?

Google (GWT) says my homepage and posts are blocked by Robots.txt

Robots.txt