Robots.txt
-
www.mywebsite.com**/details/**home-to-mome-4596
www.mywebsite.com**/details/**home-moving-4599
www.mywebsite.com**/details/**1-bedroom-apartment-4601
www.mywebsite.com**/details/**4-bedroom-apartment-4612 We have so many pages like this, we do not want to Google crawl this pages
So we added the following code to Robots.txt
User-agent: Googlebot
Disallow: /details/
This code is correct?
-
and this disallows everything in the /details/ folder so if there were some exceptions to the rule (some pages or sub folders in that folder) you would need to add some allow directives, or make a more specific disallow(s)
-
Looks correct. Great reccomendation by Chris. The * (wildcard) just means "all".
-
Looks good but you only want to exclude google from crawling those pages? If you want to exclude other search engines, you could use:
User-agent: *
instead of
User-agent: Googlebot
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Do robot.txts permanently affect websites even after they have been removed?
A client has a Wordpress blog to sit alongside their company website. They kept it hidden whilst they were developing what it looked like, keeping it un-searchable by Search Engines. It was still live, but Wordpress put a robots.txt in place. When they were ready they removed the robots.txt by clicking the "allow Search Engines to crawl this site" button. It took a month and a half for their blog to show in Search Engines once the robot.txt was removed. Google is now recognising the site (as a "site:" test has shown) however, it doesn't rank well for anything. This is despite the fact they are targeting keywords with very little organic competition. My question is - could the fact that they developed the site behind a robot.txt (rather than offline) mean the site is permanently affected by the robot.txt in the eyes of the Search Engines, even after that robot.txt has been removed? Thanks in advance for any light you can shed on the situation.
Technical SEO | | Driver720 -
Log in, sign up, user registration and robots
Hi all, We have an accommodation site that asks users only to register when they want to book a room, in the last step. Though this is the ideal situation when you have tons of users, nowadays we are having around 1500 - 2000 per day and making tests we found out that if we ask for a registration (simple, 1 click FB) we mail them all and through a good customer service we are increasing our sales. That is why, we would like to ask users to register right after the home page ie Home/accommodation or and all the rest. I am not sure how can I make to make that content still visible to robots.
Technical SEO | | Eurasmus.com
Will the authentication process block google crawling it? Maybe something we can do? We are not completely sure how to proceed so any tip would be appreciated. Thank you all for answering.3 -
How to use robots.txt to block areas on page?
Hi, Across the categories/product pages on out site there are archives/shipping info section and the texts are always the same. Would this be treated as duplicated content and harmful for seo? How can I alter robots.txt to tell google not to crawl those particular text Thanks for any advice!
Technical SEO | | LauraHT0 -
Robots.txt checker
Google seems to have discontinued their robots.txt checker. Is there another tool that I can use to check my text instead? Thanks!
Technical SEO | | theLotter0 -
How many times robots.txt gets visited by crawlers, especially Google?
Hi, Do you know if there's any way to track how often robots.txt file has been crawled? I know we can check when is the latest downloaded from webmaster tool, but I actually want to know if they download every time crawlers visit any page on the site (e.g. hundreds of thousands of times every day), or less. thanks...
Technical SEO | | linklater0 -
Do you get credit for an external link that points to a page that's being blocked by robots.txt
Hi folks, No one, including me seems to actually know what happens!? To repeat: If site A links to /home.html on site B and site B blocks /home.html in Robots.txt, does site B get credit for that link? Does the link pass PageRank? Will Google still crawl through it? Does the domain get some juice, but not the page? I know there's other ways of doing this properly, but it is interesting no?
Technical SEO | | DaveSottimano0 -
What are your thoughts on security of placing CMS-related folders in a robots.txt file?
So I was just about to add a whole heap of CMS-related folders to my robots.txt file to exclude them from search, and thought "hey, I'm publicly telling people where my admin folders are"...surely that's not right?! Should I leave them out of the robots.txt file, and hope for the best that they never get indexed? Should I use noindex meta data on every page? What are people's thoughts? Thanks, James PS. I know this is similar to lots of other discussions around meta noindex vs. robots.txt, but I'm after specific thoughts around the security aspect of listing your admin folders in a robots.txt file...
Technical SEO | | James-Distinction0 -
Does RogerBot read URL wildcards in robots.txt
I believe that the Google and Bing crawlbots understand wildcards for the "disallow" URL's in robots.txt - does Roger?
Technical SEO | | AspenFasteners0