Question about Syntax in Robots.txt
-
So if I want to block any URL from being indexed that contains a particular parameter what is the best way to put this in the robots.txt file?
Currently I have-
Disallow: /attachment_idWhere "attachment_id" is the parameter. Problem is I still see these URL's indexed and this has been in the robots now for over a month. I am wondering if I should just do
Disallow: attachment_id or Disallow: attachment_id= but figured I would ask you guys first.
Thanks!
-
That's excellent Chris.
Use the Remove Page function as well - it might help speed things up for you.
-Andy
-
I don't know how but I completely forgot I could just pop those URL's in GWT and see if they were blocked or not and sure enough, Google says they are. I guess this is just a matter of waiting.... Thanks much!
-
I have previously looked into both of those documents and the issue remains that they don't exactly address how best to block parameters. I could do this through GWT but just am curious about the correct and preferred syntax for the robots.txt as well. I guess I could just look at sites like Amazon or other big sites to see what the common practices are. Thanks though!
-
Problem is I still see these URL's indexed and this has been in the robots now for over a month. I am wondering if I should just do
It can take Google some time to remove pages from the index.
The best way to test if this has worked is hop into Webmaster Tools and use the Test Robots.txt function. If it has blocked the required pages, then you know it's just a case of waiting - you can also remove pages from within Webmaster Tools as well, although this isn't immediate.
-Andy
-
Hi there
Take a look at Google's resource on robots.txt, as well as Moz's. You can get all the information you need there. You can also let Google know about what URLs to exclude from it's crawls via Search Console.
Hope this helps! Good luck!
-
Im not a robots.txt expert by a long shot, but I found this, which is a little dated, which explained it to me in terms i could understand.
https://sanzon.wordpress.com/2008/04/29/advanced-usage-of-robotstxt-w-querystrings/
there is also a feature in Google Webmaster tools called URL parameters that lets you block URLs with set parameters for all sorts of reason to avoid duplicate content etc. I havn't used it myself but may be work looking into
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
An External link question? based on relevancy
hi lets suppose we have backlinks from these sites page a has 10 ext links. and all external links are industry/niche relevant, one of them points to my site page b has 1 ext link , which points to my site. which one is giving more seo boost.?
Intermediate & Advanced SEO | | SIMON-CULL0 -
Twitter Robots.TXT
Hello Moz World, So, I trying to wrap my head around all of the different robots.txt. I decided to dive into a site like Twitter, and look at their robot text. And now, I'm super confused. What are they telling the search engines with /hasttag/*src=. Why don't they just use: Useragent: * Disallow: But, they address each search engine. Is there any benefit to this? Thanks for all of the awesome responses!!! B/R Will H.
Intermediate & Advanced SEO | | MarketingChimp100 -
Using Meta Header vs Robots.txt
Hey Mozzers, I am working on a site that has search-friendly parameters for their faceted navigation, however this makes it difficult to identify the parameters in a robots.txt file. I know that using the robots.txt file is highly recommended and powerful, but I am not sure how to do this when facets are using common words such as sizes. For example, a filtered url may look like www.website.com/category/brand/small.html Brand and size are both facets. Brand is a great filter, and size is very relevant for shoppers, but many products include "small" in the url, so it is tough to isolate that filter in the robots.txt. (I hope that makes sense). I am able to identify problematic pages and edit the Meta Head so I can add on any page that is causing these duplicate issues. My question is, is this a good idea? I want bots to crawl the facets, but indexing all of the facets causes duplicate issues. Thoughts?
Intermediate & Advanced SEO | | evan890 -
Our Robots.txt and Reconsideration Request Journey and Success
We have asked a few questions related to this process on Moz and wanted to give a breakdown of our journey as it will likely be helpful to others! A couple of months ago, we updated our robots.txt file with several pages that we did not want to be indexed. At the time, we weren't checking WMT as regularly as we should have been and in a few weeks, we found that apparently one of the robots.txt files we were blocking was a dynamic file that led to the blocking of over 950,000 of our pages according to webmaster tools. Which page was causing this is still a mystery, but we quickly removed all of the entries. From research, most people say that things normalize in a few weeks, so we waited. A few weeks passed and things did not normalize. We searched, we asked and the number of "blocked" pages in WMT which had increased at a rate of a few hundred thousand a week were decreasing at a rate of a thousand a week. At this rate it would be a year or more before the pages were unblocked. This did not change. Two months later and we were still at 840,000 pages blocked. We posted on the Google Webmaster Forum and one of the mods there said that it would just take a long time to normalize. Very frustrating indeed considering how quickly the pages had been blocked. We found a few places on the interwebs that suggested that if you have an issue/mistake with robots.txt that you can submit a reconsideration request. This seemed to be our only hope. So, we put together a detailed reconsideration request asking for help with our blocked pages issue. A few days later, to our horror, we did not get a message offering help with our robots.txt problem. Instead, we received a message saying that we had received a penalty for inbound links that violate Google's terms of use. Major backfire. We used an SEO company years ago that posted a hundred or so blog posts for us. To our knowledge, the links didn't even exist anymore. They did.... So, we signed up for an account with removeem.com. We quickly found many of the links posted by the SEO firm as they were easily recognizable via the anchor text. We began the process of using removem to contact the owners of the blogs. To our surprise, we got a number of removals right away! Others we had to contact another time and many did not respond at all. Those we could not find an email for, we tried posting comments on the blog. Once we felt we had removed as many as possible, we added the rest to a disavow list and uploaded it using the disavow tool in WMT. Then we waited... A few days later, we already had a response. DENIED. In our request, we specifically asked that if the request were to be denied that Google provide some example links. When they denied our request, they sent us an email and including a sample link. It was an interesting example. We actually already had this blog in removem. The issue in this case was, our version was a domain name, i.e. www.domainname.com and the version google had was a wordpress sub domain, i.e. www.subdomain.wordpress.com. So, we went back to the drawing board. This time we signed up for majestic SEO and tied it in with removem. That added a few more links. We also had records from the old SEO company we were able to go through and locate a number of new links. We repeated the previous process, contacting site owners and keeping track of our progress. We also went through the "sample links" in WMT as best as we could (we have a lot of them) to try to pinpoint any other potentials. We removed what we could and again, disavowed the rest. A few days later, we had a message in WMT. DENIED AGAIN! This time it was very discouraging as it just didn't seem there were any more links to remove. The difference this time, was that there was NOT an email from Google. Only a message in WMT. So, while we didn't know if we would receive a response, we responded to the original email asking for more example links, so we could better understand what the issue was. Several days passed we received an email back saying that THE PENALTY HAD BEEN LIFTED! This was of course very good news and it appeared that our email to Google was reviewed and received well. So, the final hurdle was the reason that we originally contacted Google. Our robots.txt issue. We did not receive any information from Google related to the robots.txt issue we originally filed the reconsideration request for. We didn't know if it had just been ignored, or if there was something that might be done about it. So, as a last ditch final effort, we responded to the email once again and requested help as we did the other times with the robots.txt issue. The weekend passed and on Monday we checked WMT again. The number of blocked pages had dropped over the weekend from 840,000 to 440,000! Success! We are still waiting and hoping that number will continue downward back to zero. So, some thoughts: 1. Was our site manually penalized from the beginning, yet without a message in WMT? Or, when we filed the reconsideration request, did the reviewer take a closer look at our site, see the old paid links and add the penalty at that time? If the latter is the case then... 2. Did our reconsideration request backfire? Or, was it ultimately for the best? 3. When asking for reconsideration, make your requests known? If you want example links, ask for them. It never hurts to ask! If you want to be connected with Google via email, ask to be! 4. If you receive an email from Google, don't be afraid to respond to it. I wouldn't over do this or spam them. Keep it to the bare minimum and don't pester them, but if you have something pertinent to say that you have not already said, then don't be afraid to ask. Hopefully our journey might help others who have similar issues and feel free to ask any further questions. Thanks for reading! TheCraig
Intermediate & Advanced SEO | | TheCraig5 -
Link Building Question
Hi, I have a 2 month old blog with me, i have submitted a few press releases for the start, later in these 2 months, i got about 40 guest posts, which i've written and submitted at myblogguest site. My niche is in health. Currently my serps are at 16th page which is not a good position. I want to do more link building, but at myblogguest, no one are interested in my niche and don't want to publish content related to my niche, so it becoming hard for me to find guest blogs related to my niche. But i want to get more links in order for my blog to rank well. Is it ok if i write guest posts in other niches as well like technology and put a link in author's resource box? Does it become useful? please help as i find no other sources for my link building task, i tried researching for guest blogs in google also, but i don't find any related to my niche. Seems like, i cannot go further with my link building. Please help me. Thanks
Intermediate & Advanced SEO | | Vegitt
Dheer0 -
Summarize your question.Irrelevant Domain Name, relevant page name
Hello All, My website is a PC Repair website called 'Paulspcrepair.co.uk' (example). And recently I want to start my SEO branch to the website, now, would it be better : A: Use my domain Authority already obtained and create a new page, highly optimized for SEO searches. OR B: Create a whole new website, such as 'PaulsSEOWorkshop.co.uk'. If so, why? I can't decide what to try.
Intermediate & Advanced SEO | | Paul_Tovey0 -
Canonical or 301 redirect, that is the question?
So my site has duplicate content issues because of the index.html and the www and non www version of the site. What's the best way to deal with this without htaccess? Is it a 301 redirect or is it the canonical, or is it both?
Intermediate & Advanced SEO | | bronxpad0 -
Questions about 301 Redirects
I have about 10 - 15 URLs that are redirecting to http://www.domainname.comwww.domainname.com/. (which is an invalid URL)The website is on a Joomla platform. Does anyone know how I can fix this? I can't figure out where the problem is coming from.
Intermediate & Advanced SEO | | JohnParker27920