Robots.txt - Allow and Disallow. Can they be the same?
-
Hi All,
I need some help on the following:
Are the following commands the same?
User-agent: *
Disallow:
or
User-agent: *
Allow: /
I'm a bit confused. I take it that the first one allows all the bots but the second one blocks all the bots.
Is that correct?
Many thanks,
Aidan
-
Hi Aidan
I'm getting a similar problem on a site I'm working on. The on page rank checker "can't reach the page". I've checked everything obvious (at least I think I have!)
May I ask how you eventually resolved it?
Thanks Aidan
-
Hi
you can use this tool for be sure that the crawler see your files
http://pro.seomoz.org/tools/crawl-test
but you must wait for receive the report to a email.
when you say:
"get the following msg when I try to run On Page Analysis:"
the tools is this?
http://pro.seomoz.org/tools/on-page-keyword-optimization/new
for check the website you can use this:
http://www.opensiteexplorer.org
Ciao
Maurizio
-
Hi,
Thanks for the clarification. So the Robots.txt isn't blocking anything.
Do you know why then i cannot use SEOMoz On Page Analysis and Xenu and Screaming Frog only return 3 URLs?
I get the following msg when I try to run On Page Analysis:
"Oops! We were unable to reach the papge you requested for your report. Please try again later."
Would there be something else blocking me? GWMT Parameters maybe?
-
E' un piacere.
but I don't understand the problem.
if the site have this robots.txt
**User-agent: ***
Allow: /
every crawler can index and see all files of the this website and Seo moz also.
Maybe the problem is different?
Ciao
-
Thanks Maurizio,
I need to do some analysis on this site. Is there a way to use my SEO tools (screaming frog, SEOMoz) to ignore the robots.txt to enable me to do a good site audit?
Thanks again for the answers. Much appreciated
Aidan
-
Hi Aidan
User-agent: *
Disallow:and
User-agent: *
Allow: /are the same
Ciao
Maurizio -
Hi Maurizio,
The reason I asked is because I am working on a site and it's robots.txt is :
User-agent: *
Allow: /
Why would they have this?
I can't use On-Page Analysis or Screaming Frog as it only results in 3 URLs.
Thanks again,
Aidan
-
Hi
1° example:
User-agent: *
Disallow:all User-agent can index your files
2° example
User-agent: *
Disallow: /never User-agent"can index you files
other example here:
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449
Ciao
Maurizio
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Needs clarification: How "Disallow: /" works?
Hi all, I need clarification on this. I have noticed we have given "Disallow: /" in one of our sub-directory beside homepage. So, how it going to work now? Will this "Disallow: /" at sub-directory level going to disallow only that directory or entire website? If it is going to work for entire website; we have already given one more Disallow: / at homepage level blocking few folders. How it is going to handle with two Disallow: / commands? Thanks
Web Design | | vtmoz0 -
Https pages indexed but all web pages are http - please can you offer some help?
Dear Moz Community, Please could you see what you think and offer some definite steps or advice.. I contacted the host provider and his initial thought was that WordPress was causing the https problem ?: eg when an https version of a page is called, things like videos and media don't always show up. A SSL certificate that is attached to a website, can allow pages to load over https. The host said that there is no active configured SSL it's just waiting as part of the hosting package just in case, but I found that the SSL certificate is still showing up during a crawl.It's important to eliminate the https problem before external backlinks link to any of the unwanted https pages that are currently indexed. Luckily I haven't started any intense backlinking work yet, and any links I have posted in search land have all been http version.I checked a few more url's to see if it’s necessary to create a permanent redirect from https to http. For example, I tried requesting domain.co.uk using the https:// and the https:// page loaded instead of redirecting automatically to http prefix version. I know that if I am automatically redirected to the http:// version of the page, then that is the way it should be. Search engines and visitors will stay on the http version of the site and not get lost anywhere in https. This also helps to eliminate duplicate content and to preserve link juice. What are your thoughts regarding that?As I understand it, most server configurations should redirect by default when https isn’t configured, and from my experience I’ve seen cases where pages requested via https return the default server page, a 404 error, or duplicate content. So I'm confused as to where to take this.One suggestion would be to disable all https since there is no need to have any traces to SSL when the site is even crawled ?. I don't want to enable https in the htaccess only to then create a https to http rewrite rule; https shouldn't even be a crawlable function of the site at all.RewriteEngine OnRewriteCond %{HTTPS} offor to disable the SSL completely for now until it becomes a necessity for the website.I would really welcome your thoughts as I'm really stuck as to what to do for the best, short term and long term.Kind Regards
Web Design | | SEOguy10 -
Can only get a few pages indexed on by google
Hi I've touched upon this before on previous questions so apologies for repeating myself. In a nutshell out of the 60 webpages submitted to Google 11 have been indexed and out of the 140 images submitted none have indexed any ideas would be great! Here is a screen shot of what Google Webmaster is showing http://www.tidy-books.com/sitemapshow.png and here is the sitemap - > http://www.tidy-books.com/sitemap/us/sitemap.xml Thanks
Web Design | | tidybooks0 -
Need to hire a tech for find out why google spider can't access my site
Google spider can't access my site and all my pages are gone out of serps. I've called network solutions tech support and they say all is fine with the site which is wrong. does anyone know of a web tech who i can hire to fix this issue? Thanks, Ron
Web Design | | Ron100 -
What can this charity site do to improve SEO?
Hello wise ones, We have been working with the charity Volunteers of America to create a new site for their car donation program at carshelpingpeople.org They are a national charity with extensive local programs run by regional affiliates, so the site is divided into a small national section linked to Regional Affiliate sections. You get to the regional sections either by entering your zip code or clicking on your state in the bottom nav of the national pages. Right now we have developed regional sections for Michigan, Nevada, Maryland, Washington D.C., New Jersey, Delaware and the Philadelphia area. The site is about 2 1/2 months old, and while our conversion rate is pretty good (7%) our organic search ranking isn't improving as quickly as we'd like to see. Car donation is a very competitive space, and we would appreciate any advice on how to improve the SEO of the site. Thanks so much.
Web Design | | Phibnax0 -
Can "poor" subdomains drop PR of the root domain?
The page rank of my company's website has dropped from a 6 to a 4 over the past year or so. In that time, we implemented subdomains for development sites to show clients progress on their websites. I noticed that our "dev" sites are being indexed while in development and my question is, will Google drop pagerank of our root domain purely off of these "dev" subdomains? Example - our site is www.oursite.com Dev site - development1.oursite.com I just began investigating the drop and this came to my mind yesterday but am not too sure what type of impact these non-credible subdomains will have on our root domain. Any thoughts?
Web Design | | ckilgore0 -
Correct use for Robots.txt
I'm in the process of building a website and am experimenting with some new pages. I don't want search engines to begin crawling the site yet. I would like to add the Robot.txt on my pages that I don't want them to crawl. If I do this, can I remove it later and get them to crawl those pages?
Web Design | | EricVallee340