Robots.txt Question
-
For our company website faithology.com we are attempting to block out any urls that contain a ? mark to keep google from seeing some pages as duplicates.
Our robots.txt is as follows:
User-Agent: * Disallow: /*? User-agent: rogerbot Disallow: /community/ Is the above correct? We are wanting them to not crawl any url with a "?" inside, however we don't want to harm ourselves in seo. Thanks for your help!
-
You can use wild-cards, in theory, but I haven't tested "?" and that could be a little risky. I'd just make sure it doesn't over-match.
Honestly, though, Robots.txt isn't as reliable as I'd like. It can be good for preventing content from being indexed, but once that content has been crawled, it's not great for removing it from the index. You might be better off with META NOINDEX or using the rel=canonical tag.
It depends a lot on what parameters you're trying to control, what value these pages have, whether they have links, etc. A wholesale block of everything with "?" seems really dangerous to me, IMO.
If you want to give a few example URLs, maybe we could give you more specific advice.
-
if I were you I would want to be 100% sure I got it right. This tool has never let me down and the way you have Roger bot he may be blocked.
Why not use a free tool from a very reputable company to make your robot text perfect
http://www.internetmarketingninjas.com/seo-tools/robots-txt-generator/
http://www.searchenginepromotionhelp.com/m/robots-text-tester/
then lastly to make sure everything is perfect I recommend one of my favorite free tools up to 500 pages is as many times as you want that costs I believe $70 a year
http://www.screamingfrog.co.uk/seo-spider/
his one of the best tools on the planet
while you're at Internet marketing ninjas website look for other tools they have loads of excellent tools that are recommend here.
http://www.internetmarketingninjas.com/seo-tools/robots-txt-generator/
Sincerely,
Thomas
-
Yes you can
Robots.txt Wildcard Matching
Google and Microsoft's Bing allow the use of wildcards in robots.txt files.
To block access to all URLs that include a question mark (?), you could use the following entry:
User-agent: *
Disallow: /*?You can use the $ character to specify matching the end of the URL. For instance, to block an URLs that end with .asp, you could use the following entry:
User-agent: Googlebot
Disallow: /*.asp$More background on wildcards available from Google and Yahoo! Search.
More
http://tools.seobook.com/robots-txt/
hope I was of help,
Tom
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt & Disallow: /*? Question!
Hi, I have a site where they have: Disallow: /*? Problem is we need the following indexed: ?utm_source=google_shopping What would the best solution be? I have read: User-agent: *
Intermediate & Advanced SEO | | vetofunk
Allow: ?utm_source=google_shopping
Disallow: /*? Any ideas?0 -
Crawl Test Question
Good Morning, I am just looking for a little bit of advice, I ran a crawl report on our website www.swiftcomm.co.uk. I have resolved most of the issues myself, however I have two questions;- Screenshot image http://imgur.com/VlFEiZ2 Highlighted blue, we have two homepages www.swiftcomm.co.uk and www.swiftcomm.co.uk/ both are set with a Rel-Canonical Target of www.swiftcomm.co.uk/. Will this cause me any SEO issues and or other potential issue? If this may cause an issue how would I go about resolving? Highlighted yellow, Our contact and referral-form are showing as duplicate title and meta description. Both of these pages have separate title and meta desc which it does seem to be detecting. If I search the page in google it returns the correct title and meta desc. The only common denominator behind these pages is that both have php pages behind them for the contact form. Do you think that the moz crawl may be detecting the php page over the html? Could this be cause any issues when search engines crawl the site? Kind Regards Jonathan Mack VlFEiZ2
Intermediate & Advanced SEO | | JMack9860 -
Need help with Robots.txt
An eCommerce site built with Modx CMS. I found lots of auto generated duplicate page issue on that site. Now I need to disallow some pages from that category. Here is the actual product page url looks like
Intermediate & Advanced SEO | | Nahid
product_listing.php?cat=6857 And here is the auto generated url structure
product_listing.php?cat=6857&cPath=dropship&size=19 Can any one suggest how to disallow this specific category through robots.txt. I am not so familiar with Modx and this kind of link structure. Your help will be appreciated. Thanks1 -
E-Commerce Panda Question
I'm torn. Many of our 'niche' ecommerce products rank well, however I'm concerned that duplicate content is negatively effecting our overall rankings via Panda Algo. Here is an example that can be found through quite a few products on the site. This sub-category page (http://www.ledsupply.com/buckblock-constant-current-led-drivers) in our 'led drivers' --> 'luxdrive drivers' section has three products that are virtually identical with much of the same content on each page, except for their 'output current' - sort of like a shirt selling in different size attributes: S, M, L and XL. I could realistically condense 44 product pages (similar to example above) down to 13 within this sub-category section alone (http://www.ledsupply.com/luxdrive-constant-current-led-drivers). Again, we sell many of these products and rank ok for them, but given the outline for how Panda works I believe this structure could be compromising our overall Panda 'quality score', consequently keeping our traffic from increasing. Has anyone had similar issues and found that its worth the risk to condense product pages by adding attributes? If so, do I make the new pages and just 301 all the old URLs or is there a better way?
Intermediate & Advanced SEO | | saultienut0 -
Duplicate content question
Hi there, I work for a Theater news site. We have an issue where our system creates a chunk of duplicate content in Google's eyes and we're not sure how best to solve. When an editor produces a video, it simultaneously 1) creates a page with it's own static URL (e.g. http://www.theatermania.com/video/mary-louise-parker-tommy-tune-laura-osnes-and-more_668.html); and 2) displays said video on a public index page (http://www.theatermania.com/videos/). Since the content is very similar, Google sees them as duplicate. What should we do about this? We were thinking that one solution would to be dynamically canonicalize the index page to the static page whenever a new video is posted, but would Google frown on this? Alternatively, should we simply nofollow the index page? Lastly, are there any solutions we may have missed entirely?
Intermediate & Advanced SEO | | TheaterMania0 -
Advanced SEO question.
Hi, I manage and do the SEO for this site: www.aerlawgroup.com. If you Google "Los Angeles Criminal Defense Attorney", you can see I rank well (1st page). I have managed to achieve similar rankings for interior pages within the site: www.aerlawgroup.com/domestic-violence.html (Google: "Los Angeles Domestic Violence Attorney".) Here is my question. No matter how hard I try, I cannot get to the first page on Google for the search term: "Los Angeles DUI Lawyer", for the following interior page: www.aerlawgroup.com/dui.html. Is there anyway that I can pass the authority/ranking (not sure what to call it) that I have for www.aerlawgroup.com to www.aerlawgroup.com/dui.html so that internal page ranks higher for "Los Angeles DUI Lawyer"? I apologize if my question doesn't make sense. In a nutshell, I'm trying to understand if there is anyway to use the ranking I have for www.aerlawgroup.com to help me rank higher for Los Angeles DUI lawyer for the dui interior page. If not, are there any other suggestions anyone has to achieve a higher ranking? Thanks!
Intermediate & Advanced SEO | | mrodriguez14400 -
Robots.txt: Syntax URL to disallow
Did someone ever experience some "collateral damages" when it's about "disallowing" some URLs? Some old URLs are still present on our website and while we are "cleaning" them off the site (which takes time), I would like to to avoid their indexation through the robots.txt file. The old URLs syntax is "/brand//13" while the new ones are "/brand/samsung/13." (note that there is 2 slash on the URL after the word "brand") Do I risk to erase from the SERPs the new good URLs if I add to the robots.txt file the line "Disallow: /brand//" ? I don't think so, but thank you to everyone who will be able to help me to clear this out 🙂
Intermediate & Advanced SEO | | Kuantokusta0 -
Robots.txt is blocking Wordpress Pages from Googlebot?
I have a robots.txt file on my server, which I did not develop, it was done by the web designer at the company before me. Then there is a word press plugin that generates a robots.txt file. How Do I unblock all the wordpress pages from googlebot?
Intermediate & Advanced SEO | | ENSO0