Confused about robots.txt

Netpace

There is a lot of conflicting and/or unclear information about robots.txt out there. Somehow, I can't make out what's the best way to use robots even after visiting the official robots website. For example I have the following format for my robots.

User-agent: *
Disallow: javascript.js
Disallow: /images/
Disallow: /embedconfig
Disallow: /playerconfig
Disallow: /spotlightmedia
Disallow: /EventVideos
Disallow: /playEpisode

Allow: /

Sitemap: http://www.example.tv/sitemapindex.xml
Sitemap: http://www.example.tv/sitemapindex-videos.xml
Sitemap: http://www.example.tv/news-sitemap.xml

Is this correct and/or recommended? If so, then how come I see a list of over 200 or so links blocked by robots when Im checking out Google Webmaster Tools!

Help someone, anyone! Can't seem to understand this robotic business!

Regards,

crvw

Google may still index pages excluded by robots.txt if the pages are backlinked either internally or externally.

For best results, use meta noindex to tell search engines they're not allowed to show the link in results, and meta nofollow to tell robots not to follow any links on the page.

Webmaster Tools Help: Using meta tags to block access to your site

You can also explicitly address goooglebot in the meta tag, as opposed to just robots. If you use both a robots.txt and meta robots tags and there are conflicting directives, googlebot will follow the most restrictive one.

irvingw

I would also recommend to go to the site configuration - crawler access page in Google Webmaster and test many of your sites URL's to ensure that robots can access them. Test every unique URL format on your site like the search results page, product pages, category pages, etc... I always use this tool whenever I make any change in the robots.txt

Entrusteddev

Hi,

Allow: / isn't valid syntax in a robots.txt file, Anything that isn't disallowed is allowed by default.

Other than that all looks good. Perhaps the 200 or so links to blocked pages were indexed before the robots.txt was last updated with the disallows?

Regards

Aran

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Confused about robots.txt

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Confused about repeated occurences of URL/essayorg/topic/ showing up as 404 errors in our site logs

Robots.txt & meta noindex--site still shows up on Google Search

Do I need a separate robots.txt file for my shop subdomain?

Robots file set up

Will an XML sitemap override a robots.txt

Robots.txt - What is the correct syntax?

Meta-robots Nofollow on logins and admins

How long does it take for traffic to bounce back from and accidental robots.txt disallow of root?