Twitter Robots.TXT

MarketingChimp10

Hello Moz World,

So, I trying to wrap my head around all of the different robots.txt. I decided to dive into a site like Twitter, and look at their robot text. And now, I'm super confused. What are they telling the search engines with /hasttag/*src=. Why don't they just use:

Useragent: *

Disallow:

But, they address each search engine. Is there any benefit to this?

Thanks for all of the awesome responses!!!

B/R

Will H.

MarketingChimp10

Thanks Martijn. That makes a lot of sense. I'm working with small websites, but hopefully I will be moving on to bigger fish

MarketingChimp10

Thank you for the awesome response and taking the time to write this all out. It was very helpful!

Martijn_Scheijbeler

To answer your question around why they would set-up different statements for different search engines. When huge sites become more complicated in their structure you also want to have a chance to see how different engines deal with pages and crawling some of them. By setting up the statements differently it creates a better overview in what is being crawled for a specific one and what isn't.

ChrisAshton

At a glance, I couldn't tell you what their motivation is to do so but it seems they're addressing individual search engines to show/block various things on a per-engine basis.

Being Twitter I'm sure they have their reasons for doing this but from the outside, it's beyond me what that motivation is!

What are they telling the search engines with /hasttag/*src=

The full line _Allow: /hashtag/*?src= _says to allow the respective engine to crawl the hashtag pages.

To better explain exactly what's going on here, let's take a look at a working example. If you click on a #SEO hashtag on Twitter (note, you have to click on one, not just search for one, that's a different string) you'll arrive at this URL:

https://twitter.com/hashtag/SEO?src=hash

A * is known as a wildcard and is essentially a variable so anything can go in that place and the statement still applies. In this particular example, it's /hashtag/SEO?src=hash. The bolded "SEO" could be replaced by any other hashtag name like the other examples below and the Allow statement would still apply.

/hashtag/Marketing?src=hash
/hashtag/SEM?src=hash
/hashtag/WebDesign?src=hash
/hashtag/Digital?src=hash

As a general rule, I'd suggest looking at more basic websites for a better example to follow - these big guys have to handle some issues that the rest of us don't so a normal Robots.txt is rarely more than 10 lines if the site is built correctly.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Twitter Robots.TXT

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

What happens to crawled URLs subsequently blocked by robots.txt?

Robots.txt wildcards - the devs had a disagreement - which is correct?

Robots.txt - Do I block Bots from crawling the non-www version if I use www.site.com ?

Now that Google will be indexing Twitter, are Twitter backlinks likely to effect website rank in the SERPs?

How to handle a blog subdomain on the main sitemap and robots file?

Should all pages on a site be included in either your sitemap or robots.txt?

Robots.txt error message in Google Webmaster from a later date than the page was cached, how is that?

Robots.txt: Can you put a /* wildcard in the middle of a URL?