Robots.txt question
-
Hello,
What does the following command mean -
User-agent: * Allow: /
Does it mean that we are blocking all spiders ? Is Allow supported in robots.txt ?
Thanks
-
It's a good idea to have an xml site map and make sure the search engines know where it is. It's part of the protocol that they will look in the robots.txt file for the location for your sitemap.
-
I was assuming that by including / after allow, we are blocking the spiders and also thought that allow is not supported by search engines.
Thanks for clarifications. A better approach would be
User-Agent: * Allow:
right ?
The best one of course is
**User-agent: * Disallow:**
-
That's not really necessary unless there URLs or directories you're disallowing after the allow in your robots.txt. Allow is a directive supported by major search engines, but search engines assume they're allowed to crawl everything they find unless you disallow it specifically in your robots.txt.
The following is universally accepted by bots and essentially means the same thing as what I think you're trying to say, allowing bots to crawl everything:
User-agent: * Disallow:
There's a sample use of the Allow directive on the wikipedia robots.txt page here.
-
There's more information about robots.txt from SEOmoz at http://www.seomoz.org/learn-seo/robotstxt
SEOmoz and the robots.txt site suggest the following for allowing robots to see everying and list your sitemap:
User-agent: *
Disallow:Sitemap: http://www.example.com/none-standard-location/sitemap.xml
-
Any particular reason for doing so ?
-
That robots txt should be fine.
But you should also add your XML sitemap to the robots.txt file, example:
User-Agent: * Allow: / Sitemap: http://www.website.com/sitemap.xml
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Question on URL wording and structure best practices
We're mapping out some URL structures and trying to figure out what would be best for separating folders for articles and videos regarding wording in the folder say: www.site.com/category/article/name-of-article/id#/ ---- www.site.com/category/video/name-of-video/id#/ vs. www.site.com/category/a/name-of-article/id#/ ---- www.site.com/category/v/name-of-video/id#/ Second option came about the ''shorter is better' way of thinking. Downside I see to it is if the link would be copied and pasted somewhere probably would be best for a user to make it clear they are clicking into an article or a video, don't think just an 'a' or a 'v' would be very telling in that scenario. Would it be better for search engines to make it clearer with the whole word in there? Any other pros and cons to each? Not sure what's the best route here.
Technical SEO | | SBRMarketing0 -
Google indexing despite robots.txt block
Hi This subdomain has about 4'000 URLs indexed in Google, although it's blocked via robots.txt: https://www.google.com/search?safe=off&q=site%3Awww1.swisscom.ch&oq=site%3Awww1.swisscom.ch This has been the case for almost a year now, and it does not look like Google tends to respect the blocking in http://www1.swisscom.ch/robots.txt Any clues why this is or what I could do to resolve it? Thanks!
Technical SEO | | zeepartner0 -
Keyword density question.
For instance, if the keyword I'm targeting on a specific page is "New Orleans", the Keyword is everywhere it's supposed to be, title, meta, content, internal links, etc, .... So when I check my most relative key words with different tools, it always breaks the word up like: new - 12 times 2.3% orleans - 12 times 2.3% Should I try to fix this? or is this normal? and does google view this as 1 keyword when evaluating my site?
Technical SEO | | Nola5040 -
Webmaster Tools Site Map Question
I have TLD that has authority and a number of micro-sites built off of the primary domain. All sites relate to the same topic, as I am promoting a destination. The primary site and each micro-site have their own CMS installation, but the domains are mapped accordingly. www.regionalsite.com/ <- primary
Technical SEO | | VERBInteractive
www.regioanlsite.com/theme1/ <- theme 1
www.regioanlsite.com/theme2/ <- theme 2
www.regionalsite.com/theme3/ <- theme 3 Question: Should my XML site map for Webmaster Tools feed all sites off of the primary domain site map or are there penalties for this? Thanks.0 -
Google Places phone number question
Hi, A hotel/resort has a main phone number of 1-234-567-8901. This phone number is consistent in over 50 directories. However, they have a spa and restaurant with the same phone number. The front-desk answers the phone and routes the call to either the restaurant or spa. The name of the spa and restaurant are also found in the local listing directories under different DBA's with the same phone number as the Hotel/Resort. For example: ABC Resort - 1-234-567-8901 Spa Cuts - 1-234-567-8901 (same address as ABC Resort) The Spa - 1-234-567-8901 ) same address as ABC Resort) Will this phone number that is used by the 3 separate entities penalize the Google listing placements for the actual Hotel/Resort in Google Places? Thanks everyone!
Technical SEO | | hawkvt10 -
Use of Robots.txt file on a job site
We are performing SEO on a large niche Job Board. My question revolves around the thought of no following all the actual job postings from their clients as they only last for 30 to 60 days. Anybody have any idea on the best way to handle this?
Technical SEO | | WebTalent0 -
Home Page Indexing Question/Problem
Hello Everyone, Background: I recently decided to change the preferred domain settings in WM Tools from the non www version of my site to the www version. I did this because there is a redirect from the non www to the www and I've built all of my internal links with the www. Everything I read on SEO Moz seemed to indicate that this was a good move. Traffic has been down/volatile but I think it's attributable mostly to a recent site change/redesign. Having said that the preferred domain change did seem to drop traffic an additional notch. I made the move two weeks ago. Here is the question: When I google my site, the home page shows up as the site title without the custom title tags I've written. The page that displays in the SERP is still the non www version of the site. a site:www.mysite.com search shows an internal page first but doesn't return the home page as a result. All other pages pop up indexed with the www version of the page. a site:mysite.com (notice lack of www) search DOES SHOW my home page and my custom title tags but with a non www version of the page. All other pages pop up indexed with the www version of the page. Any one have thoughts on this? Is this a classic example of waiting on Google to catch up with the changes to my tiny little site?
Technical SEO | | JSOC0 -
Site not being Indexed that fast anymore, Is something wrong with this Robots.txt
My wordpress site's robots.txt used to be this: User-agent: * Disallow: Sitemap: http://www.domainame.com/sitemap.xml.gz I also have all in one SEO installed and other than posts, tags are also index,follow on my site. My new posts used to appear on google in seconds after publishing. I changed the robots.txt to following and now post indexing takes hours. Is there something wrong with this robots.txt? User-agent: * Disallow: /cgi-bin Disallow: /wp-admin Disallow: /wp-includes Disallow: /wp-content/plugins Disallow: /wp-content/cache Disallow: /wp-content/themes Disallow: /wp-login.php Disallow: /wp-login.php Disallow: /trackback Disallow: /feed Disallow: /comments Disallow: /author Disallow: /category Disallow: */trackback Disallow: */feed Disallow: */comments Disallow: /login/ Disallow: /wget/ Disallow: /httpd/ Disallow: /*.php$ Disallow: /? Disallow: /*.js$ Disallow: /*.inc$ Disallow: /*.css$ Disallow: /*.gz$ Disallow: /*.wmv$ Disallow: /*.cgi$ Disallow: /*.xhtml$ Disallow: /? Disallow: /*?Allow: /wp-content/uploads User-agent: TechnoratiBot/8.1 Disallow: ia_archiverUser-agent: ia_archiver Disallow: / disable duggmirror User-agent: duggmirror Disallow: / allow google image bot to search all imagesUser-agent: Googlebot-Image Disallow: /wp-includes/ Allow: /* # allow adsense bot on entire siteUser-agent: Mediapartners-Google* Disallow: Allow: /* Sitemap: http://www.domainname.com/sitemap.xml.gz
Technical SEO | | ideas1230