Block or remove pages using a robots.txt
-
I want to use robots.txt to prevent googlebot access the specific folder on the server,
Please tell me if the syntax below is correct
User-Agent: Googlebot Disallow: /folder/
I want to use robots.txt to prevent google image index the images of my website ,
Please tell me if the syntax below is correct
User-agent: Googlebot-Image
Disallow: /
-
Yes, that is correct.
Once you have it set up, you can test what pages are blocked in your GWT account. Health > Blocked URLs > select which bot you want to test at the bottom of the page.
Resources:
Google's bots - http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1061943
WMT Blocked URL testing - https://www.google.com/webmasters/tools/robots-analysis
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Canonical tag use for ecommerce product page detail
Hi, I have a category page I want to rank. This page has 24 different products quite similar but not exactly the same.
Technical SEO | | amastone
I want to use canonical tag in any product to the parent category.
Is this a right use of the canonical?
Category page I'm talking about is : Finger bits If I understand how to use canonical tags I can improve all my category pages. thanks marco0 -
Why Google ranks a page with Meta Robots: NO INDEX, NO FOLLOW?
Hi guys, I was playing with the new OSE when I found out a weird thing: if you Google "performing arts school london" you will see w w w . mountview . org. uk at the 3rd position. The point is that page has "Meta Robots: NO INDEX, NO FOLLOW", why Google indexed it? Here you can see the robots.txt allows Google to index the URL but not the content, in article they also say the meta robots tag will properly avoid Google from indexing the URL either. Apparently, in my case that page is the only one has the tag "NO INDEX, NO FOLLOW", but it's the home page. so I said to myself: OK, perhaps they have just changed that tag therefore Google needs time to re-crawl that page and de-index following the no index tag. How long do you think it will take to don't see that page indexed? Do you think it will effect the whole website, as I suppose if you have that tag on your home page (the root domain) you will lose a lot of links' juice - it's totally unnatural a backlinks profile without links to a root domain? Cheers, Pierpaolo
Technical SEO | | madcow780 -
Googlebot does not obey robots.txt disallow
Hi Mozzers! We are trying to get Googlebot to steer away from our internal search results pages by adding a parameter "nocrawl=1" to facet/filter links and then robots.txt disallow all URLs containing that parameter. We implemented this late august and since that, the GWMT message "Googlebot found an extremely high number of URLs on your site", stopped coming. But today we received yet another. The weird thing is that Google gives many of our nowadays robots.txt disallowed URLs as examples of URLs that may cause us problems. What could be the reason? Best regards, Martin
Technical SEO | | TalkInThePark0 -
How do I 301 redirect a number of pages to one page
I want to redirect all pages in /folder_A /folder_B to /folder_A/index.php. Can I just write one or two lines of code to .htaccess to do that?
Technical SEO | | Heydarian0 -
How to allow one directory in robots.txt
Hello, is there a way to allow a certain child directory in robots.txt but keep all others blocked? For instance, we've got external links pointing to /user/password/, but we're blocking everything under /user/. And there are too many /user/somethings/ to just block every one BUT /user/password/. I hope that makes sense... Thanks!
Technical SEO | | poolguy0 -
How to block google robots from a subdomain
I have a subdomain that lets me preview the changes I put on my site. The live site URL is www.site.com, working preview version is www.site.edit.com The contents on both are almost identical I want to block the preview version (www.site.edit.com) from Google Robots, so that they don't penalize me for duplicated content. Is it the right way to do it: User-Agent: * Disallow: .edit.com/*
Technical SEO | | Alexey_mindvalley0 -
I am trying to block robots from indexing parts of my site..
I have a few websites that I mocked up for clients to check out my work and get a feel for the style I produce but I don't want them indexed as they have lore ipsum place holder text and not really optimized... I am in the process of optimizing them but for the time being I would like to block them. Most of my warnings and errors on my seomoz dashboard are from these sites and I was going to upload the folioing to the robot.txt file but I want to make sure this is correct: User-agent: * Disallow: /salondemo/ Disallow: /salondemo3/ Disallow: /cafedemo/ Disallow: /portfolio1/ Disallow: /portfolio2/ Disallow: /portfolio3/ Disallow: /salondemo2/ is this all i need to do? Thanks Donny
Technical SEO | | Smurkcreative0 -
Does page speed affect what pages are in the index?
We have around 1.3m total pages, Google currently crawls on average 87k a day and our average page load is 1.7 seconds. Out of those 1.3m pages(1.2m being "spun up") google has only indexed around 368k and our SEO person is telling us that if we speed up the pages they will crawl the pages more and thus will index more of them. I personally don't believe this. At 87k pages a day Google has crawled our entire site in 2 weeks so they should have all of our pages in their DB by now and I think they are not index because they are poorly generated pages and it has nothing to do with the speed of the pages. Am I correct? Would speeding up the pages make Google crawl them faster and thus get more pages indexed?
Technical SEO | | upper2bits0