Robots.txt Question
-
For our company website faithology.com we are attempting to block out any urls that contain a ? mark to keep google from seeing some pages as duplicates.
Our robots.txt is as follows:
User-Agent: * Disallow: /*? User-agent: rogerbot Disallow: /community/ Is the above correct? We are wanting them to not crawl any url with a "?" inside, however we don't want to harm ourselves in seo. Thanks for your help!
-
You can use wild-cards, in theory, but I haven't tested "?" and that could be a little risky. I'd just make sure it doesn't over-match.
Honestly, though, Robots.txt isn't as reliable as I'd like. It can be good for preventing content from being indexed, but once that content has been crawled, it's not great for removing it from the index. You might be better off with META NOINDEX or using the rel=canonical tag.
It depends a lot on what parameters you're trying to control, what value these pages have, whether they have links, etc. A wholesale block of everything with "?" seems really dangerous to me, IMO.
If you want to give a few example URLs, maybe we could give you more specific advice.
-
if I were you I would want to be 100% sure I got it right. This tool has never let me down and the way you have Roger bot he may be blocked.
Why not use a free tool from a very reputable company to make your robot text perfect
http://www.internetmarketingninjas.com/seo-tools/robots-txt-generator/
http://www.searchenginepromotionhelp.com/m/robots-text-tester/
then lastly to make sure everything is perfect I recommend one of my favorite free tools up to 500 pages is as many times as you want that costs I believe $70 a year
http://www.screamingfrog.co.uk/seo-spider/
his one of the best tools on the planet
while you're at Internet marketing ninjas website look for other tools they have loads of excellent tools that are recommend here.
http://www.internetmarketingninjas.com/seo-tools/robots-txt-generator/
Sincerely,
Thomas
-
Yes you can
Robots.txt Wildcard Matching
Google and Microsoft's Bing allow the use of wildcards in robots.txt files.
To block access to all URLs that include a question mark (?), you could use the following entry:
User-agent: *
Disallow: /*?You can use the $ character to specify matching the end of the URL. For instance, to block an URLs that end with .asp, you could use the following entry:
User-agent: Googlebot
Disallow: /*.asp$More background on wildcards available from Google and Yahoo! Search.
More
http://tools.seobook.com/robots-txt/
hope I was of help,
Tom
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
A Really Specific Question about 301 Redirect Strategies
Hi there: As part of a site redesign project, we've been doing a lot of 301 redirects, as we retire old URLs or rename them. My question is: is it necessary to redirect ALL old URLS? What about URLs with no links and low authority? Are these really necessary to redirect, since they're not referenced on the web and there's obviously a global redirect happening at the level of the root domain? Just curious; I'm not sure I've ever really understood this...
Intermediate & Advanced SEO | | Daaveey0 -
Robots.txt wildcards - the devs had a disagreement - which is correct?
Hi – the lead website developer was assuming that this wildcard: Disallow: /shirts/?* would block URLs including a ? within this directory, and all the subdirectories of this directory that included a “?” The second developer suggested that this wildcard would only block URLs featuring a ? that come immediately after /shirts/ - for example: /shirts?minprice=10&maxprice=20 BUT argued that this robots.txt directive would not block URLS featuring a ? in sub directories - e.g. /shirts/blue?mprice=100&maxp=20 So which of the developers is correct? Beyond that, I assumed that the ? should feature a * on each side of it – for example - /? - to work as intended above? Am I correct in assuming that?
Intermediate & Advanced SEO | | McTaggart0 -
Questions about Event Calendar Format and Duplicate Content
Hi there: We maintain a calendar of digital events and conferences on our website here: https://splatworld.tv/events/ . We're trying to add as many events as we can and I'm wondering about the descriptions of each. We're pulling them from the conference websites, mostly, but I'm worried about the scraped content creating duplicate content issues. I've also noticed that most calendars of this type which rank well are not including actual event descriptions, but rather just names, locations and a link out to the conference website. See https://www.semrush.com/blog/the-ultimate-calendar-of-digital-marketing-events-2017/ and http://www.marketingterms.com/conferences/ . Anyone have any thoughts on this? Thanks, in ..advance..
Intermediate & Advanced SEO | | Daaveey0 -
If Robots.txt have blocked an Image (Image URL) but the other page which can be indexed has this image, how is the image treated?
Hi MOZers, This probably is a dumb question but I have a case where the robots.tags has an image url blocked but this image is used on a page (lets call it Page A) which can be indexed. If the image on Page A has an Alt tags, then how is this information digested by crawlers? A) would Google totally ignore the image and the ALT tags information? OR B) Google would consider the ALT tags information? I am asking this because all the images on the website are blocked by robots.txt at the moment but I would really like website crawlers to crawl the alt tags information. Chances are that I will ask the webmaster to allow indexing of images too but I would like to understand what's happening currently. Looking forward to all your responses 🙂 Malika
Intermediate & Advanced SEO | | Malika11 -
Robots.txt vs noindex
I recently started working on a site that has thousands of member pages that are currently robots.txt'd out. Most pages of the site have 1 to 6 links to these member pages, accumulating into what I regard as something of link juice cul-d-sac. The pages themselves have little to no unique content or other relevant search play and for other reasons still want them kept out of search. Wouldn't it be better to "noindex, follow" these pages and remove the robots.txt block from this url type? At least that way Google could crawl these pages and pass the link juice on to still other pages vs flushing it into a black hole. BTW, the site is currently dealing with a hit from Panda 4.0 last month. Thanks! Best... Darcy
Intermediate & Advanced SEO | | 945010 -
Simple Link Question
Hi Guys, I will appreciate if you answer 1 small question..... Will our site benefit from that link?
Intermediate & Advanced SEO | | Webdeal
Valuable website related to our business ---nofollow link--> PDF Doc(on second site) ---link to our site ---> Kind Regards,
webdeal0 -
Redirect on exact match domain to Brand domain question :)
Hi, If I have a website with the domain crazysocks.co.uk and a title tag 'black socks' would I see any benefit redirecting blacksocks.co.uk to crazysocks.co.uk, to give my keyword 'black socks' a boost in the SE's from the EMD. I see it loads where an EMD is indexed for its term but when you click the result it redirects to a branded domain. I personally cant see this being true but wanted to double check.
Intermediate & Advanced SEO | | activitysuper0 -
Multiple Keyword Research Questions, Help
Hello , I've been trying for several days to understand how keyword research works for a multi purpose website,I've read guides, articles even some chapters from the book" The Art of Seo" by O'Reilly and still no luck. It seems i can't wrap my head around keyword research,lets say I have a social gaming community website and I'm trying to rank it first on some low competition keywords + some long tail keywords.The website has functions like leaderboards, profiles,events, competitions,etc so it's not actually a news related website but it will have a blog. My website being on the games niche It would imply that I should target words that contain the word "Games" but this word generates millions of searches globally so ranking first its nearly impossible if the website is brand new. This made me pursue generic keywords formed with 2 / 3 words like fresh games, new games, mmorpg games, fps games,etc which still generate lets say 30.000 searches globally each. Due to the different areas of the website like latest game events,latest games competitions,etc I'm confused If i should pursue website specific keywords like latest games events, fresh games events, latest games competitions, upcoming games competitions but these too generate 30.000 global searches each,so... 0.should i use generic keywords or keywords that include site features? So let's say I decide to pursue generic "games" keywords,due to a high competition based on the keyword I decide to go a layer deeper and for the keyword "fresh games" I obtain keywords like** "fresh games 2011,top fresh games 2011, upcoming fresh games** " and thus building a list of 30 keywords that contain " fresh games".If i do this for the rest of the keywords: ** new games, mmorpg games, fps games,etc** I end up with a list of 10.000 keywords or more since each keyword generates other keywords. Is this the correct approach ? since generating 10.000 keywords sounds a lot and I'm getting the feeling that It's not how it supposed to be done,like were would I insert 10.000 keywords? So how do I know which keywords to pick and aim in order to try to get no.1 ranking? and why those? How many keywords should I use? and where should i put them? since it's not a news website so writing a lot of articles isn't an option. Should I focus on 2 words keywords with around 10.000-30.000 seaches or 2 words keywords + long tail keywords with less traffic like 100 - 5000? Is there a guide for the Keyword Analysis Tool since if i enter "fresh new games" i get a 39% keyword difficulty,is that hard to rank? and I don't know what all those color mean since some of them have higher numbers then others that are found at the top and how can i get beat a website that has has rank 10. So hopefully with your help & by some miracle I will finally be able to build a keyword list. Thank you !
Intermediate & Advanced SEO | | arching0