Question about Syntax in Robots.txt
-
So if I want to block any URL from being indexed that contains a particular parameter what is the best way to put this in the robots.txt file?
Currently I have-
Disallow: /attachment_idWhere "attachment_id" is the parameter. Problem is I still see these URL's indexed and this has been in the robots now for over a month. I am wondering if I should just do
Disallow: attachment_id or Disallow: attachment_id= but figured I would ask you guys first.
Thanks!
-
That's excellent Chris.
Use the Remove Page function as well - it might help speed things up for you.
-Andy
-
I don't know how but I completely forgot I could just pop those URL's in GWT and see if they were blocked or not and sure enough, Google says they are. I guess this is just a matter of waiting.... Thanks much!
-
I have previously looked into both of those documents and the issue remains that they don't exactly address how best to block parameters. I could do this through GWT but just am curious about the correct and preferred syntax for the robots.txt as well. I guess I could just look at sites like Amazon or other big sites to see what the common practices are. Thanks though!
-
Problem is I still see these URL's indexed and this has been in the robots now for over a month. I am wondering if I should just do
It can take Google some time to remove pages from the index.
The best way to test if this has worked is hop into Webmaster Tools and use the Test Robots.txt function. If it has blocked the required pages, then you know it's just a case of waiting - you can also remove pages from within Webmaster Tools as well, although this isn't immediate.
-Andy
-
Hi there
Take a look at Google's resource on robots.txt, as well as Moz's. You can get all the information you need there. You can also let Google know about what URLs to exclude from it's crawls via Search Console.
Hope this helps! Good luck!
-
Im not a robots.txt expert by a long shot, but I found this, which is a little dated, which explained it to me in terms i could understand.
https://sanzon.wordpress.com/2008/04/29/advanced-usage-of-robotstxt-w-querystrings/
there is also a feature in Google Webmaster tools called URL parameters that lets you block URLs with set parameters for all sorts of reason to avoid duplicate content etc. I havn't used it myself but may be work looking into
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
[Question] 880 old back links suddenly disappeared from google console
Hi everyone Last month I bought a premium domain from an auction, right after I bought the domain, I added it into google webmaster tools. It showed me site traffic, and 893 backlinks. So until today, all those backlink count was there. And when I checked the console today, it shows only 13 new backlinks which I started building last week. what has happened to all those old 880 old backlinks?I checked from Moz, those backlinks are still there. Am i doing something wrong? Or is this another Google dance? URL https://goread.io
Intermediate & Advanced SEO | | gizmos120 -
Metatags on drupal question
Hi Im quite inexperienced on drupal (normally an umbraco user!) and im having some difficulty with the Metatags on the CMS. I have been applying Meta Title and descriptions to the individual pages however they only appear when i preview the page and not when the page is saved. When i go into the metatag section located at /admin/config/search/metatags i am given a list of settings including Global: Front Page and Node. Im sure the reason it keeps defaulting the metatags back is to do with this but im not sure what to change to apply my own Thanks in advance
Intermediate & Advanced SEO | | TheZenAgency1 -
Search box within search results question
I work for a Theater news website. We have two sister sites, theatermania.com in the US and whatsonstage.com in London. Both sites have largely the same codebase and page layouts. We've implemented markup that allows google to show a search box for our site in its results page. For some reason, the search box is showing for one site but not the other: http://screencast.com/t/CSA62NT8 We're scratching our heads. Does anyone have any ideas?
Intermediate & Advanced SEO | | TheaterMania0 -
Can't find X-Robots tag!
Hi all. I've been checking out http://www.unthankbooks.com/ as it seems to have some indexing problems. I ran a server header check, and got a 200 response. However, it also shows the following: X-Robots-Tag:
Intermediate & Advanced SEO | | Blink-SEO
noindex, nofollow It's not in the page HTML though. Could it be being picked up from somewhere else?0 -
Htaccess 301 regex question
I need some help with a regex for htaccess. I want to 301 redirect this: http://olddomain.com/oldsubdir/fruit.aspx to this: https://www.newdomain.com/newsubdir/FRUIT changes: different protocol (http -> https) add 'www.' different domain (olddomain and newdomain are constants) different subdirectory (oldsubdir and newsubdir are constants) 'fruit' is a variable (which will contain only letters [a-zA-Z]) is it possible to make 'fruit' UPPER case on the redirect (so 'fruit' -> 'FRUIT') remove '.aspx' I think it's something like this (placed in the .htaccess file in the root directory of olddomain): RedirectMatch 301 /oldsubdir/(.*).aspx https://www.newdomain.com/newsubdir/$1 Thanks.
Intermediate & Advanced SEO | | scanlin0 -
What Questions to Ask in SEO Interview
Tomorrow morning I have a call with an SEO company interested in doing some work with our company. Its a larger company who do a lot of SEO work, and seem to have good feedback around the place. But we have been very very white hat in our all our our SEO work so far, and some of their wording on their site talks about "Negotiations and acquisitions of link partners".. which gives me the feel they might be a little grey hat.. What are some good questions we should ask these guys to make sure what they are doing is legit, and not going to get us stung for anything? And what sort of work should we get them to do, if we are happy to take care of content creation, on page optimisation and social media activities? Thanks!
Intermediate & Advanced SEO | | timscullin0 -
Does It Really Matter to Restrict Dynamic URLs by Robots.txt?
Today, I was checking Google webmaster tools and found that, there are 117 dynamic URLs are restrict by Robots.txt. I have added following syntax in my Robots.txt You can get more idea by following excel sheet. #Dynamic URLs Disallow: /?osCsidDisallow: /?q= Disallow: /?dir=Disallow: /?p= Disallow: /*?limit= Disallow: /*review-form I have concern for following kind of pages. Shorting by specification: http://www.vistastores.com/table-lamps?dir=asc&order=name Iterms per page: http://www.vistastores.com/table-lamps?dir=asc&limit=60&order=name Numbering page of products: http://www.vistastores.com/table-lamps?p=2 Will it create resistance in organic performance of my category pages?
Intermediate & Advanced SEO | | CommercePundit0 -
Rel Canonical Syntax
My IT department is getting ready to setup the rel canonical tag, finally. I took a look at the code on our test server and see that they are using a single quote in the tag syntax (see code block below). Should I be concerned? Will Google read those lines the same? <link rel='canonical' href='[http://www.wholesalecostumeclub.com/easter-costumes/bunny-suits](view-source:http://www.wholesalecostumeclub.com/easter-costumes/bunny-suits)' />VS. **versus** <link rel="canonical" href="[http://www.wholesalecostumeclub.com/easter-costumes/bunny-suits](view-source:http://www.wholesalecostumeclub.com/easter-costumes/bunny-suits)" />
Intermediate & Advanced SEO | | costume0