Robots.txt Question
-
For our company website faithology.com we are attempting to block out any urls that contain a ? mark to keep google from seeing some pages as duplicates.
Our robots.txt is as follows:
User-Agent: * Disallow: /*? User-agent: rogerbot Disallow: /community/ Is the above correct? We are wanting them to not crawl any url with a "?" inside, however we don't want to harm ourselves in seo. Thanks for your help!
-
You can use wild-cards, in theory, but I haven't tested "?" and that could be a little risky. I'd just make sure it doesn't over-match.
Honestly, though, Robots.txt isn't as reliable as I'd like. It can be good for preventing content from being indexed, but once that content has been crawled, it's not great for removing it from the index. You might be better off with META NOINDEX or using the rel=canonical tag.
It depends a lot on what parameters you're trying to control, what value these pages have, whether they have links, etc. A wholesale block of everything with "?" seems really dangerous to me, IMO.
If you want to give a few example URLs, maybe we could give you more specific advice.
-
if I were you I would want to be 100% sure I got it right. This tool has never let me down and the way you have Roger bot he may be blocked.
Why not use a free tool from a very reputable company to make your robot text perfect
http://www.internetmarketingninjas.com/seo-tools/robots-txt-generator/
http://www.searchenginepromotionhelp.com/m/robots-text-tester/
then lastly to make sure everything is perfect I recommend one of my favorite free tools up to 500 pages is as many times as you want that costs I believe $70 a year
http://www.screamingfrog.co.uk/seo-spider/
his one of the best tools on the planet
while you're at Internet marketing ninjas website look for other tools they have loads of excellent tools that are recommend here.
http://www.internetmarketingninjas.com/seo-tools/robots-txt-generator/
Sincerely,
Thomas
-
Yes you can
Robots.txt Wildcard Matching
Google and Microsoft's Bing allow the use of wildcards in robots.txt files.
To block access to all URLs that include a question mark (?), you could use the following entry:
User-agent: *
Disallow: /*?You can use the $ character to specify matching the end of the URL. For instance, to block an URLs that end with .asp, you could use the following entry:
User-agent: Googlebot
Disallow: /*.asp$More background on wildcards available from Google and Yahoo! Search.
More
http://tools.seobook.com/robots-txt/
hope I was of help,
Tom
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Question on AMP
I'd like to utilize AMP for faster loading for one of my clients. However, it is essential that this client have chat. My developer is having trouble incorporating chat with AMP, and he claims that it isn't possible to integrate the two. Can anyone advise me as to whether this is accurate? If it is true that AMP and chat aren't compatible, are there any solutions to this issue? I'd appreciate any leads on this. Thanks!
Intermediate & Advanced SEO | | Joseph-Green-SEO0 -
Robots.txt Disallowed Pages and Still Indexed
Alright, I am pretty sure I know the answer is "Nothing more I can do here." but I just wanted to double check. It relates to the robots.txt file and that pesky "A description for this result is not available because of this site's robots.txt". Typically people want the URL indexed and the normal Meta Description to be displayed but I don't want the link there at all. I purposefully am trying to robots that stuff outta there.
Intermediate & Advanced SEO | | DRSearchEngOpt
My question is, has anybody tried to get a page taken out of the Index and had this happen; URL still there but pesky robots.txt message for meta description? Were you able to get the URL to no longer show up or did you just live with this? Thanks folks, you are always great!0 -
2 menus Responsive website Seo question
Hi there Thanks for reading my post. İ am fairly new to SEO and dont know much coding. İ purchased an opencart theme and am working with a developer to modify it to make it more user friendly. The website is responsive and İ modified the menu for the desktop but now it doesnt have categories but just products. So it doesnt have URL for categories but just filter. So the developer recommended to add the mobile menu which has categories, and subcategories back to the desktop menu. İm not sure if this is a kosher approach to seo. Here is the link: As you can see there is a menu in the top menu and menu on the Main Menu. THoughts? will this be problem for duplicate content? The Main menu keywords are crucial and is what the website is revolving around. New website
Intermediate & Advanced SEO | | socratic-goat7770 -
Redirection question
How would I redirect this URL: http://www.members.mysite.com/ to this URL: http://www.mysite.com/ ? I cant figure it out
Intermediate & Advanced SEO | | JohnPeters0 -
301 redirect or Robots.txt on an interstatial page
Hey guys, I have an affiliate tracking system that works like this : an affiliate puts up a certain code on his site, for example : www.domain.com/track/aff_id This url leads to a page where the hit is counted, analysed and then 302 redirects to my sales page with the affiliates ID in the url : www.mysalespage.com/?=aff_id. However, we've noticed recently that one affiliate seems to be ranking for our own name and the url google indexed was his tracking url (domain.com/track/aff_id). Which is strange because there is absolutely nothing on that page, its just an interstatial page so that our stats tracking software can properly filter hits. To remove the affiliate's url from showing up in the serps, I've come up with 2 solutions : 1 - Change the redirect to a 301 redirect on his track page. 2 - Change our robots.txt page to block all domain.com/track/ pages from being indexed. My question is : if I 301 redirect instead of 302, will I keep the affiliates from outranking me for my own name AND pass on link juice or should I simply block google from crawling the interstatial tracking pages?
Intermediate & Advanced SEO | | CrakJason0 -
Worldwide Stores - Duplicate Content Question
Hello, We recently added new store views for our primary domain for different countries. Our primary url: www.store.com Different Countries URLS: www.store.com/au www.store.com/uk www.store.com/nz www.store.com/es And so forth and so on. This resulted in an almost immediate rankings drop for several keywords which we feel is a result of duplicate content creation. We've thousands of pages on our primary site. We've assigned a "no follow" tags to all store views for now, and trying to roll back the changes we did. However, we've seen some stores launching in different countries with same content, but with a country specific extensions like .co.uk, .co.nz., .com.au. At this point, it appears we have three choices: 1. Remove/Change duplicate content in country specific urls/store views. 2. Launch using .co.uk, .com.au with duplicate content for now. 3. Launch using .co.uk, .com.au etc with fresh content for all stores. Please keep in mind, option 1, and 3 can get very expensive keeping hundreds of products in untested territories. Ideally, we would like test first and then scale. However, we'd like to avoid any duplicate penalties on our main domain. Thanks for your help and answers on the same.
Intermediate & Advanced SEO | | globaleyeglasses0 -
What Questions Should I Be Asking?
I just read a discussion that was originally posted by Steve Ollington on May 22, 2011 where he states that many people are asking the wrong types of questions on this forum. He said that he wonders if he will see a shift from people asking questions on "how to rank" to questions dealing with "how to work out the best KPIs" (Key Performance Indicators - yes I had to google it). I was once told we learn more by asking questions about a topic than by just listening. I've also been told that sometimes the right question to ask is, "What questions should I be asking?" So here is my question, what types of questions should I be asking to be better at SEO? Perhaps these are some of them: Is it possible to be good at SEO when it is not a full-time job? It is very tempting to look for easy answers when you only have limited time. What are considered KPI's? Are they different for every industry? How do you know what is junk information vs what is truly good SEO advice? Is it just simply trial and error? It seems to me that if people find truly good SEO information, they aren't going to be sharing it so easily. It's the whole, "You get what you pay for". Maybe some of you can tell me more of the questions I should be asking.
Intermediate & Advanced SEO | | kadesmith1 -
Search Engine Blocked by robots.txt for Dynamic URLs
Today, I was checking crawl diagnostics for my website. I found warning for search engine blocked by robots.txt I have added following syntax to robots.txt file for all dynamic URLs. Disallow: /*?osCsid Disallow: /*?q= Disallow: /*?dir= Disallow: /*?p= Disallow: /*?limit= Disallow: /*review-form Dynamic URLs are as follow. http://www.vistastores.com/bar-stools?dir=desc&order=position http://www.vistastores.com/bathroom-lighting?p=2 and many more... So, Why should it shows me warning for this? Does it really matter or any other solution for these kind of dynamic URLs.
Intermediate & Advanced SEO | | CommercePundit0