Robots.txt question
-
I want to block spiders from specific specific part of website (say abc folder).
In robots.txt, i have to write -
User-agent: *
Disallow: /abc/
Shall i have to insert the last slash. or will this do
User-agent: *
Disallow: /abc
-
I will do so. And hope to get that back.
-
If you contact the help desk, they can probably help you get your old account back.
-
I am the same person with the username seoug, but lost that account. So, had to start afresh ! I was a PR0 member, but accidently deleted that account ( it was not intentional ). And now , when i tried login in, i get a message that seoug name is already taken.
-
Thanks for clearing my doubts.
-
at least our answers agree, so no Atul is doubley sure of how to do it...
-
EGOL does it to me all the time!
-
Hi Atul,
Add the trailing slash.
/abc could be a page url. Where as /abc/ is definitely a folder.
http://www.robotstxt.org/robotstxt.html <-- Everything you ever wanted to know about robots.txt
Regards
Aran
[EDIT: Damn it, Ryan submitted whilst I was answering! Must type faster ]
-
Use the trailing slash.
More about robots.txt can be learned at this site: http://www.robotstxt.org/
The trailing slash indicates you are blocking a folder. Without the slash the object would be considered a file (i.e. page). I am not sure what the result would be if you tried to block a folder without the trailing slash. Even if it worked it would not be the correct code and may lead to various bots treating it differently.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Question re: spammy internal links on site
Hi all, I have a blog (managed via WordPress) that seems to have built spammy internal links that were not created by us on our end. See "site:blog.execu-search.com" in Google search results. It seems to be a pharma-hack that's creating spammy links on our blog to random offers re: viagra, paxil, xenical, etc. When viewing "Security Issues", GSC doesn't state that the site has been infected and it seems like the site is in good health according to Google. Will anyone be able to provide any insight on the best necessary steps to take to remove these links and to run a check on my blog to see if it is in fact infected? Should all spammy internal links by disavowed? Here are a couple of my findings: When looking at "internal links" in GSC, I see a few mentions of these spammy links. When running a site crawl in Moz, I don't see any mention of these spammy links. The spammy links are leading to a 404 page. However, it appears some of the cached version in Google are still displaying the page. Please lmk. Any insight would be much appreciated. Thanks all! Best,
Technical SEO | | hdeg
Sung0 -
Robots txt. in page with 301 redirect
We currently have a a series of help pages that we would like to disallow from our robots txt. The thing is that these help pages are located in our old website, which now has a 301 redirect to current site. Which is the proper way to go around? 1- Add the pages we want to disallow to the robots.txt of the new website? 2- Break the redirect momentarily and add the pages to the robots.txt of the old one? Thanks
Technical SEO | | Kilgray0 -
Subdomain/subfolder question
Hi community, Let's say I have a men's/women's clothing website. Would it be better to do clothing.com/mens and clothing.com/womens OR mens.clothing.com and womens.clothing.com? I understand Moz's stance on blogs that it should be clothing.com/blog, but wanted to ask for this different circumstance. Thanks for your help!
Technical SEO | | IceIcebaby0 -
Block Domain in robots.txt
Hi. We had some URLs that were indexed in Google from a www1-subdomain. We have now disabled the URLs (returning a 404 - for other reasons we cannot do a redirect from www1 to www) and blocked via robots.txt. But the amount of indexed pages keeps increasing (for 2 weeks now). Unfortunately, I cannot install Webmaster Tools for this subdomain to tell Google to back off... Any ideas why this could be and whether it's normal? I can send you more domain infos by personal message if you want to have a look at it.
Technical SEO | | zeepartner0 -
Indexation question
Hi Guys, i have a small problem with our development website. Our development website is website.dev.website.nl This page shouldn't be indexed bij Google but unfortunately it is. What can i do to deindex it and ask google not to index this website. In the robots.txt or are there better ways to do this? Kind regards Ruud
Technical SEO | | RuudHeijnen0 -
Google place listings and search results- quick question.
Has anybody else noticed that they are ranking better on 'places' yet they have dropped off in the actual search results? We've had no message through webmaster tools. The same seems to have happened to our competitors.
Technical SEO | | onlinechester0 -
Site Architecture Question on Ties.com - Navigation
I'm looking at the navigation structure of Ties.com. They have various categories like color, pattern, length, brand, etc. Once you click one of the main categories you get the option to "Narrow Your Choices". The structure starts like this: (URL 1) ties.com/black-ties Then when you narrow your search you get this: (URL 2) ties.com/animal-print**+**black-ties (notice + sign) My question: how does Google see URL 2? Is it just like any other link?
Technical SEO | | ErikDster0 -
Robots.txt and robots meta
I have an odd situation. I have a CMS that has a global robots.txt which has the generic User-Agent: *
Technical SEO | | Highland
Allow: / I also have one CMS site that needs to not be indexed ever. I've read in various pages (like http://www.jesterwebster.com/robots-txt-vs-meta-tag-which-has-precedence/22 ) that robots.txt always wins over meta, but I have also read that robots.txt indicates spiderability whereas meta can control indexation. I just want the site to not be indexed. Can I leave the robots.txt as is and still put NOINDEX in the robots meta?0