Question about Syntax in Robots.txt

DRSearchEngOpt

So if I want to block any URL from being indexed that contains a particular parameter what is the best way to put this in the robots.txt file?

Currently I have-
Disallow: /attachment_id

Where "attachment_id" is the parameter. Problem is I still see these URL's indexed and this has been in the robots now for over a month. I am wondering if I should just do

Disallow: attachment_id or Disallow: attachment_id= but figured I would ask you guys first.

Thanks!

Andy.Drinkwater

That's excellent Chris.

Use the Remove Page function as well - it might help speed things up for you.

-Andy

DRSearchEngOpt

I don't know how but I completely forgot I could just pop those URL's in GWT and see if they were blocked or not and sure enough, Google says they are. I guess this is just a matter of waiting.... Thanks much!

DRSearchEngOpt

I have previously looked into both of those documents and the issue remains that they don't exactly address how best to block parameters. I could do this through GWT but just am curious about the correct and preferred syntax for the robots.txt as well. I guess I could just look at sites like Amazon or other big sites to see what the common practices are. Thanks though!

Andy.Drinkwater

Problem is I still see these URL's indexed and this has been in the robots now for over a month. I am wondering if I should just do

It can take Google some time to remove pages from the index.

The best way to test if this has worked is hop into Webmaster Tools and use the Test Robots.txt function. If it has blocked the required pages, then you know it's just a case of waiting - you can also remove pages from within Webmaster Tools as well, although this isn't immediate.

-Andy

PatrickDelehanty

Hi there

Take a look at Google's resource on robots.txt, as well as Moz's. You can get all the information you need there. You can also let Google know about what URLs to exclude from it's crawls via Search Console.

Hope this helps! Good luck!

ATP

Im not a robots.txt expert by a long shot, but I found this, which is a little dated, which explained it to me in terms i could understand.

https://sanzon.wordpress.com/2008/04/29/advanced-usage-of-robotstxt-w-querystrings/

there is also a feature in Google Webmaster tools called URL parameters that lets you block URLs with set parameters for all sorts of reason to avoid duplicate content etc. I havn't used it myself but may be work looking into

https://support.google.com/webmasters/answer/6080550?hl=en

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Question about Syntax in Robots.txt

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Content Strategy/Duplicate Content Issue, rel=canonical question

[Moderator deleted question.]

Search Results Pages Blocked in Robots.txt?

Ecommerce Internal Linking Questions

Correct Syntax for Meta No Index

Question about Google Search Results

How long will Google take to read my robots.txt after updating?

Why are new pages not being indexed, and old pages (now in robots.txt) remain in the index?