Confused about robots.txt

Netpace

There is a lot of conflicting and/or unclear information about robots.txt out there. Somehow, I can't make out what's the best way to use robots even after visiting the official robots website. For example I have the following format for my robots.

User-agent: *
Disallow: javascript.js
Disallow: /images/
Disallow: /embedconfig
Disallow: /playerconfig
Disallow: /spotlightmedia
Disallow: /EventVideos
Disallow: /playEpisode

Allow: /

Sitemap: http://www.example.tv/sitemapindex.xml
Sitemap: http://www.example.tv/sitemapindex-videos.xml
Sitemap: http://www.example.tv/news-sitemap.xml

Is this correct and/or recommended? If so, then how come I see a list of over 200 or so links blocked by robots when Im checking out Google Webmaster Tools!

Help someone, anyone! Can't seem to understand this robotic business!

Regards,

crvw

Google may still index pages excluded by robots.txt if the pages are backlinked either internally or externally.

For best results, use meta noindex to tell search engines they're not allowed to show the link in results, and meta nofollow to tell robots not to follow any links on the page.

Webmaster Tools Help: Using meta tags to block access to your site

You can also explicitly address goooglebot in the meta tag, as opposed to just robots. If you use both a robots.txt and meta robots tags and there are conflicting directives, googlebot will follow the most restrictive one.

irvingw

I would also recommend to go to the site configuration - crawler access page in Google Webmaster and test many of your sites URL's to ensure that robots can access them. Test every unique URL format on your site like the search results page, product pages, category pages, etc... I always use this tool whenever I make any change in the robots.txt

Entrusteddev

Hi,

Allow: / isn't valid syntax in a robots.txt file, Anything that isn't disallowed is allowed by default.

Other than that all looks good. Perhaps the 200 or so links to blocked pages were indexed before the robots.txt was last updated with the disallows?

Regards

Aran

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Confused about robots.txt

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

I have two robots.txt pages for www and non-www version. Will that be a problem?

Google is Still Blocking Pages Unblocked 1 Month ago in Robots

Easy Question: regarding no index meta tag vs robot.txt

Sub Domains and Robot.txt files...

Disallow: /search/ in robots but soft 404s are still showing in GWT and Google search?

How to add a disclaimer to a site but keep the content accessible to search robots?

Search engines have been blocked by robots.txt., how do I find and fix it?

Use of Robots.txt file on a job site