Robots.txt Question

BMPIRE

For our company website faithology.com we are attempting to block out any urls that contain a ? mark to keep google from seeing some pages as duplicates.

Our robots.txt is as follows:

User-Agent: *
Disallow: /*?

User-agent: rogerbot
Disallow: /community/

Is the above correct?  We are wanting them to not crawl any url with a "?" inside, however we don't want to harm ourselves in seo.  

Thanks for your help!

Dr-Pete

You can use wild-cards, in theory, but I haven't tested "?" and that could be a little risky. I'd just make sure it doesn't over-match.

Honestly, though, Robots.txt isn't as reliable as I'd like. It can be good for preventing content from being indexed, but once that content has been crawled, it's not great for removing it from the index. You might be better off with META NOINDEX or using the rel=canonical tag.

It depends a lot on what parameters you're trying to control, what value these pages have, whether they have links, etc. A wholesale block of everything with "?" seems really dangerous to me, IMO.

If you want to give a few example URLs, maybe we could give you more specific advice.

BlueprintMarketing

if I were you I would want to be 100% sure I got it right. This tool has never let me down and the way you have Roger bot he may be blocked.

Why not use a free tool from a very reputable company to make your robot text perfect

http://www.internetmarketingninjas.com/seo-tools/robots-txt-generator/

http://www.searchenginepromotionhelp.com/m/robots-text-tester/

then lastly to make sure everything is perfect I recommend one of my favorite free tools up to 500 pages is as many times as you want that costs I believe $70 a year

http://www.screamingfrog.co.uk/seo-spider/

his one of the best tools on the planet

while you're at Internet marketing ninjas website look for other tools they have loads of excellent tools that are recommend here.

http://www.internetmarketingninjas.com/seo-tools/robots-txt-generator/

Sincerely,

Thomas

BlueprintMarketing

Yes you can

Robots.txt Wildcard Matching

Google and Microsoft's Bing allow the use of wildcards in robots.txt files.

To block access to all URLs that include a question mark (?), you could use the following entry:

User-agent: *
Disallow: /*?

You can use the $ character to specify matching the end of the URL. For instance, to block an URLs that end with .asp, you could use the following entry:

User-agent: Googlebot
Disallow: /*.asp$

More background on wildcards available from Google and Yahoo! Search.

More

http://tools.seobook.com/robots-txt/

hope I was of help,

Tom

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Robots.txt Question

Robots.txt Wildcard Matching

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Not sure how we're blocking homepage in robots.txt; meta description not shown

Canonicals question ref canonicals pointing to redundant urls

Question spam malware causing many indexed pages

Google PR & OSE DA/PA Question

Panda Recovery Question

Quick Rel Canonical Link Juice Question

Robots.txt 404 problem

Affiliate Site Duplicate Content Question