Twitter Robots.TXT

MarketingChimp10

Hello Moz World,

So, I trying to wrap my head around all of the different robots.txt. I decided to dive into a site like Twitter, and look at their robot text. And now, I'm super confused. What are they telling the search engines with /hasttag/*src=. Why don't they just use:

Useragent: *

Disallow:

But, they address each search engine. Is there any benefit to this?

Thanks for all of the awesome responses!!!

B/R

Will H.

MarketingChimp10

Thanks Martijn. That makes a lot of sense. I'm working with small websites, but hopefully I will be moving on to bigger fish

MarketingChimp10

Thank you for the awesome response and taking the time to write this all out. It was very helpful!

Martijn_Scheijbeler

To answer your question around why they would set-up different statements for different search engines. When huge sites become more complicated in their structure you also want to have a chance to see how different engines deal with pages and crawling some of them. By setting up the statements differently it creates a better overview in what is being crawled for a specific one and what isn't.

ChrisAshton

At a glance, I couldn't tell you what their motivation is to do so but it seems they're addressing individual search engines to show/block various things on a per-engine basis.

Being Twitter I'm sure they have their reasons for doing this but from the outside, it's beyond me what that motivation is!

What are they telling the search engines with /hasttag/*src=

The full line _Allow: /hashtag/*?src= _says to allow the respective engine to crawl the hashtag pages.

To better explain exactly what's going on here, let's take a look at a working example. If you click on a #SEO hashtag on Twitter (note, you have to click on one, not just search for one, that's a different string) you'll arrive at this URL:

https://twitter.com/hashtag/SEO?src=hash

A * is known as a wildcard and is essentially a variable so anything can go in that place and the statement still applies. In this particular example, it's /hashtag/SEO?src=hash. The bolded "SEO" could be replaced by any other hashtag name like the other examples below and the Allow statement would still apply.

/hashtag/Marketing?src=hash
/hashtag/SEM?src=hash
/hashtag/WebDesign?src=hash
/hashtag/Digital?src=hash

As a general rule, I'd suggest looking at more basic websites for a better example to follow - these big guys have to handle some issues that the rest of us don't so a normal Robots.txt is rarely more than 10 lines if the site is built correctly.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Twitter Robots.TXT

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Should I use noindex or robots to remove pages from the Google index?

Have a Robots.txt Issue

Robots.txt for Facet Results

Google Indexing Duplicate URLs : Ignoring Robots & Canonical Tags

When you add 10.000 pages that have no real intention to rank in the SERP, should you: "follow,noindex" or disallow the whole directory through robots? What is your opinion?

Robots.txt error message in Google Webmaster from a later date than the page was cached, how is that?

Will blocking urls in robots.txt void out any backlink benefits? - I'll explain...

Robots.txt unblock