Question about Syntax in Robots.txt
-
So if I want to block any URL from being indexed that contains a particular parameter what is the best way to put this in the robots.txt file?
Currently I have-
Disallow: /attachment_idWhere "attachment_id" is the parameter. Problem is I still see these URL's indexed and this has been in the robots now for over a month. I am wondering if I should just do
Disallow: attachment_id or Disallow: attachment_id= but figured I would ask you guys first.
Thanks!
-
That's excellent Chris.
Use the Remove Page function as well - it might help speed things up for you.
-Andy
-
I don't know how but I completely forgot I could just pop those URL's in GWT and see if they were blocked or not and sure enough, Google says they are. I guess this is just a matter of waiting.... Thanks much!
-
I have previously looked into both of those documents and the issue remains that they don't exactly address how best to block parameters. I could do this through GWT but just am curious about the correct and preferred syntax for the robots.txt as well. I guess I could just look at sites like Amazon or other big sites to see what the common practices are. Thanks though!
-
Problem is I still see these URL's indexed and this has been in the robots now for over a month. I am wondering if I should just do
It can take Google some time to remove pages from the index.
The best way to test if this has worked is hop into Webmaster Tools and use the Test Robots.txt function. If it has blocked the required pages, then you know it's just a case of waiting - you can also remove pages from within Webmaster Tools as well, although this isn't immediate.
-Andy
-
Hi there
Take a look at Google's resource on robots.txt, as well as Moz's. You can get all the information you need there. You can also let Google know about what URLs to exclude from it's crawls via Search Console.
Hope this helps! Good luck!
-
Im not a robots.txt expert by a long shot, but I found this, which is a little dated, which explained it to me in terms i could understand.
https://sanzon.wordpress.com/2008/04/29/advanced-usage-of-robotstxt-w-querystrings/
there is also a feature in Google Webmaster tools called URL parameters that lets you block URLs with set parameters for all sorts of reason to avoid duplicate content etc. I havn't used it myself but may be work looking into
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is robots met tag a more reliable than robots.txt at preventing indexing by Google?
What's your experience of using robots meta tag v robots.txt when it comes to a stand alone solution to prevent Google indexing? I am pretty sure robots meta tag is more reliable - going on own experiences, I have never experience any probs with robots meta tags but plenty with robots.txt as a stand alone solution. Thanks in advance, Luke
Intermediate & Advanced SEO | | McTaggart1 -
URL Structure Question
Am starting to work with a new site that has a domain name contrived to help it with a certain kind of long tail search. Just for fictional example sake, let's call it WhatAreTheBestRestaurantsIn.com. The idea is that people might do searches for "what are the best restaurants in seattle" and over time they would make some organic search progress. Again, fictional top level domain example, but the real thing is just like that and designed to be cities in all states. Here's the question, if you were targeting searches like the above and had that domain to work with, would you go with... whatarethebestrestaurantsin.com/seattle-washington whatarethebestrestaurantsin.com/washington/seattle whatarethebestrestaurantsin.com/wa/seattle whatarethebestrestaurantsin.com/what-are-the-best-restaurants-in-seattle-wa ... or what and why? Separate question (still need the above answered), would you rather go with a super short (4 letter), but meaningless domain name, and stick the longtail part after that? I doubt I can win the argument the new domain name, so still need the first question answered. The good news is it's pretty good content. Thanks... Darcy
Intermediate & Advanced SEO | | 945010 -
Recovering from robots.txt error
Hello, A client of mine is going through a bit of a crisis. A developer (at their end) added Disallow: / to the robots.txt file. Luckily the SEOMoz crawl ran a couple of days after this happened and alerted me to the error. The robots.txt file was quickly updated but the client has found the vast majority of their rankings have gone. It took a further 5 days for GWMT to file that the robots.txt file had been updated and since then we have "Fetched as Google" and "Submitted URL and linked pages" in GWMT. In GWMT it is still showing that that vast majority of pages are blocked in the "Blocked URLs" section, although the robots.txt file below it is now ok. I guess what I want to ask is: What else is there that we can do to recover these rankings quickly? What time scales can we expect for recovery? More importantly has anyone had any experience with this sort of situation and is full recovery normal? Thanks in advance!
Intermediate & Advanced SEO | | RikkiD220 -
Question about Google Search Results
I have a question regarding google search results. I have a website www.911signalusa.com when you type this into google search box the URL comes up repeatedly. I have several competitors here is one of them www.emergencycity.com when you type in their name it only come up as the first result. How did our SEO guys make this happen? I have another site tha when we type in the URL it only comes up as the first result. However when you do site:www.------.com All of these site are indexed in Google. It is not causing any problem we knoe of but it appears to me that our 1 site has it better? Or is it that maybe there are very minimal links to the site? Thank you for your time and consideration in answering my quesiton.
Intermediate & Advanced SEO | | scamper0 -
Question regarding error url while checking in open-site explorer tool
Hello friends, My website home url & inner page url shows error while checking the open-site site explorer tool from SEOMoz, for a website** eg.www.abc.com website as**
Intermediate & Advanced SEO | | zco_seo
**"Oh Hey! It looks like that URL redirects to www.abc.com/error.aspx?aspxerrorpath=/default.aspx. Would you like to see data for that URL instead?"**May I know the reason, why this url showing this result while checking back link report from the tool?**May I know on what basis this tool is evaluating the website url as well?****May I know, Will this affect the Google SERPs for this website?**Thanks0 -
Another Guest Blogging Question!
If you had 1 blog with good mozbar stats but hosted in the US and another with lets say 75% of the mozbar stats of the first but hosted in the UK, and your website is hosted in the UK which one would benefit SEO the most?
Intermediate & Advanced SEO | | activitysuper0 -
New server update + wrong robots.txt = lost SERP rankings
Over the weekend, we updated our store to a new server. Before the switch, we had a robots.txt file on the new server that disallowed its contents from being indexed (we didn't want duplicate pages from both old and new servers). When we finally made the switch, we somehow forgot to remove that robots.txt file, so the new pages weren't indexed. We quickly put our good robots.txt in place, and we submitted a request for a re-crawl of the site. The problem is that many of our search rankings have changed. We were ranking #2 for some keywords, and now we're not showing up at all. Is there anything we can do? Google Webmaster Tools says that the next crawl could take up to weeks! Any suggestions will be much appreciated.
Intermediate & Advanced SEO | | 9Studios0 -
Block all search results (dynamic) in robots.txt?
I know that google does not want to index "search result" pages for a lot of reasons (dup content, dynamic urls, blah blah). I recently optimized the entire IA of my sites to have search friendly urls, whcih includes search result pages. So, my search result pages changed from: /search?12345&productblue=true&id789 to /product/search/blue_widgets/womens/large As a result, google started indexing these pages thinking they were static (no opposition from me :)), but i started getting WMT messages saying they are finding a "high number of urls being indexed" on these sites. Should I just block them altogether, or let it work itself out?
Intermediate & Advanced SEO | | rhutchings0