Robots.txt Allowed
-
Hello all,
We want to block something that has the following at the end:
http://www.domain.com/category/product/some+demo+-text-+example--writing+here
So I was wondering if doing:
/*example--writing+here
would work?
-
Yes, that should work just fine. As Logan mentioned, I recommend you test it in the robots.txt testing tool in Google Search Console.
-
Yes, that would work. I'm sure everyone already knows that if in case you have a product that has the word example at the end of URL, it would block that too. A little off tangent here but blocking in robots.txt does not mean that every single spiders out there is going to honor this rule. The major ones like Google Spiders does honor this. Also, it doesn't mean that the URL won't be indexed. Sorry for the long winded answer but just make sure that if this is truly an example or demo page that you don't want search engines to index to make sure that you include "noindex, nofollow" in the metainfo.
I agree with Logan Ray. In case you want the "Robots TXT" Tester, you can google it "Robots Txt Tester" and the first one should be from support.google.com
-
Hi Thomas,
That should work. You can confirm this by modifying your robots.txt file in Search Console and testing a handful of URLs to ensure they're blocked the way you want.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Pages blocked by robots
**yazılım sürecinde yapılan bir yanlışlıktı.** Sorunu hızlı bir şekilde nasıl çözebilirim? bana yardım et. ```[XTRjH](https://imgur.com/a/XTRjH)
Intermediate & Advanced SEO | | mihoreis0 -
SSL and robots.txt question - confused by Google guidelines
I noticed "Don’t block your HTTPS site from crawling using robots.txt" here: http://googlewebmastercentral.blogspot.co.uk/2014/08/https-as-ranking-signal.html Does this mean you can't use robots.txt anywhere on the site - even parts of a site you want to noindex, for example?
Intermediate & Advanced SEO | | McTaggart0 -
Using folder blocked by robots.txt before uploaded to indexed folder - is that OK?
I have a folder "testing" within my domain which is a folder added to the robots.txt. My web developers use that folder "testing" when we are creating new content before uploading to an indexed folder. So the content is uploaded to the "testing" folder at first (which is blocked by robots.txt) and later uploaded to an indexed folder, yet permanently keeping the content in the "testing" folder. Actually, my entire website's content is located within the "testing" - so same URL structure for all pages as indexed pages, except it starts with the "testing/" folder. Question: even though the "testing" folder will not be indexed by search engines, is there a chance search engines notice that the content is at first uploaded to the "testing" folder and therefore the indexed folder is not guaranteed to get the content credit, since search engines see the content in the "testing" folder, despite the "testing" folder being blocked by robots.txt? Would it be better that I password protecting this "testing" folder? Thx
Intermediate & Advanced SEO | | khi50 -
Google showing high volume of URLs blocked by robots.txt in in index-should we be concerned?
if we search site:domain.com vs www.domain.com, We see: 130,000 vs 15,000 results. When reviewing the site:domain.com results, we're finding that the majority of the URLs showing are blocked by robots.txt. They are subdomains that we use as production environments (and contain similar content as the rest of our site). And, we also find the message "In order to show you the most relevant results, we have omitted some entries very similar to the 541 already displayed." SEER Interactive mentions that this is one way to gauge a Panda penalty: http://www.seerinteractive.com/blog/100-panda-recovery-what-we-learned-to-identify-issues-get-your-traffic-back We were hit by Panda some time back--is this an issue we should address? Should we unblock the subdomains and add noindex, follow?
Intermediate & Advanced SEO | | nicole.healthline0 -
Meta No INDEX and Robots - Optimizing Crawl Budget
Hi, Sometime ago, a few thousand pages got into Google's index - they were "product pop up" pages, exact duplicates of the actual product page but a "quick view". So I deleted them via GWT and also put in a Meta No Index on these pop up overlays to stop them being indexed and causing dupe content issues. They are no longer within the index as far as I can see, i do a site:www.mydomain.com/ajax and nothing appears - So can I block these off now with robots.txt to optimize my crawl budget? Thanks
Intermediate & Advanced SEO | | bjs20100 -
Robots.txt Question
For our company website faithology.com we are attempting to block out any urls that contain a ? mark to keep google from seeing some pages as duplicates. Our robots.txt is as follows: User-Agent: * Disallow: /*? User-agent: rogerbot Disallow: /community/ Is the above correct? We are wanting them to not crawl any url with a "?" inside, however we don't want to harm ourselves in seo. Thanks for your help!
Intermediate & Advanced SEO | | BMPIRE0 -
Robots.txt error message in Google Webmaster from a later date than the page was cached, how is that?
I have error messages in Google Webmaster that state that Googlebot encountered errors while attempting to access the robots.txt. The last date that this was reported was on December 25, 2012 (Merry Christmas), but the last cache date was November 16, 2012 (http://webcache.googleusercontent.com/search?q=cache%3Awww.etundra.com/robots.txt&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a). How could I get this error if the page hasn't been cached since November 16, 2012?
Intermediate & Advanced SEO | | eTundra0 -
Effect duration of robots.txt file.
in my web site there is demo site in that also, index in Google but no need it now.so i have created robots file and upload to server yesterday.in the demo folder there are some html files,and i wanna remove all these in demo file from Google.but still in web master tools it showing User-agent: *
Intermediate & Advanced SEO | | innofidelity
Disallow: /demo/ How long this will take to remove from Google ? And are there any alternative way doing that ?0