In 2012 I read Trust me, I’m lying: Confessions of a media manipulator, a depressing book by a vaguely awful person about his work planting news stories and generating controversy for brands he represented, while successfully controlling the narrative to get his intended message across. Definitely a relevant book for today. After finishing reading I naturally wanted to try my hand at manipulating the media in some way and thought about how my own computer security skills could help me achieve that.

I settled on creating thousands of fake twitter accounts and using them to trick a journalist into thinking something was happening somewhere. Thus my Twitter Botnet was born. It’s not a good name for it as it’s not like a traditional botnet, but it is a network of bots. On a social network. So, Twitter Botnet it is. At least here.

To be able to trick a journalist the twitter accounts have to be believable as real humans, which is no easy feat with current AI technology. I didn’t want to create another bad markov bot spewing out half english or have a set of canned responses as they’re both ineffective. Instead I settled on using real humans. My botnet concept was simple: it starts with a single patient zero account which it clones. By clone, I mean it:

  • signs up to Twitter with a name as close as possible to the original account
  • takes all the originals info/pictures/location/whatever it get
  • and tweets whenever the original account tweets

Now we have a replica account which will echo the real person in real time. If the account we cloned tweets at @account, we clone @account, and rewrite our tweet to @ the newly created clone. Now we’re two accounts! And so it grows organically as people naturally talk to each other.

This means we never have to worry about having believable bots as they’re real people talking about real events in real time. There are of course huge limitations to this approach such as it being very easy to see that it doesn’t touch the real twitter (only @’s themselves, I put a _ at the end of #hashtags so the bots wouldn’t appear next to their real versions), but it would be decent enough especially when run for a long time.

A journalist discovering they’ve been had isn’t really much of an issue as corrections/updates to a story don’t have much effect compared to the original.

Design / Technical details

Twitter allows you to access it through an API but this was something I wanted to avoid because I assumed that it would be tightly monitored. You have to request it specifically for your account; another potential red flag. After toying around with Twitter I discovered that if your browser is very old, such as IE6, Twitter will serve you up a nearly JavaScriptless plain HTML 1990’s style Twitter which is amazing if you want to parse it / interact. I decided to do everything from scratch using python + urllib + beautifulsoup to craft the basic HTTP requests needed to communicate with this wonderful “Windows IE6 Ready” Twitter. Literally just some POST’s/GET’s and beautifulsoup parsing. Nothing exciting other than having to pull out the CSRF details from the page.

To illustrate, here’s the code for tweeting as a user.

def tweetAsUser(cj, text):
    # First we have to get the authenticity token from the compose page
    authtoken = getAuthToken(cj)
        
    if not authtoken:
        log("Couldn't get the auth token for user %s" % cj)
        return False
        
    url = "https://mobile.twitter.com/"

    # cut off the front, not the end. This is to stop us accidentally cutting off an extension
    # to a legit username, e.g. @blah2 making it @blah and alerting the person          
    text = text[-140:]

    values = { 
        'authenticity_token' : authtoken,
                'tweet[text]' : decode(text), 
                'commit' : 'Tweet' 
    }   

    content_type, body = encode_multipart_formdata(values, [("media", "", "")])
        
    headers = { 
        'User-Agent' : 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)',
        'Content-Type' : content_type
    }   
        
    data = getURL(cj, url, body, headers)
        
    if not data:
        log("Error sending tweet for user %s" % cj)
        return None

    if "Forgot password?" in data:
        log("Account has been suspended, probably")
        return None
                
    return data

You can tell it’s robust code by the error handling.

Twitter used to have a CAPTCHA when signing up to a new account, which was trivial to bypass using a CAPTCHA breaking service like antigate.com. I spent $10 USD there and that gets you around 10k CAPTCHA’s typed out by individuals around the world. The service has their own API (hosted by Atlassian!) which you send the CAPTCHA image(s) to and receive the typed out words in under a minute.

Antigate uses a “lowest bid wins” system to determine the amount of money a CAPTCHA is broken for, which keeps the prices low and made me feel slightly bad about exploiting the workers there. Even if you want to pay the workers more per CAPTCHA you can’t as your only interaction is setting the highest amount you’d pay per CAPTCHA. Another exploitative technique antigate uses is not paying workers when they make a mistake. If a worker makes an error while transcribing the CAPTCHA, you can get your tiny payment automatically refunded through an API.

Interestingly Twitter has removed CAPTCHA’s when signing up since I wrote my original code. Presumably they instead scrutinise newly created accounts more closely and the CAPTCHA’s weren’t worth the extra sign up friction.

antigate.png

After signing up, we have to confirm the email address used. Initially, I used mailinator.com for every account and my script would ‘click’ on the confirmation link sent through. But after a while the emails stopped coming through as Mailinator had blocked Twitter sign ups! Jerks. To stop relying on a single provider I wrote more dirty beautifulsoup parsing of various free email services and was able to load balance across 5 of them.

Monitoring the original accounts that I cloned for new tweets was something I left up to Twitter by simply having an account reserved for following them. Monitoring individual accounts wouldn’t scale very well (do I poll every cloned account every minute, for ever? no thanks) I quickly ran into a problem with Twitter not liking that I was following so many accounts so quickly, so I spread the following across a pool of accounts that grew as needed. Each of these “reader” accounts would follow up to a maximum number of the original accounts and funnel it all together into a single stream for my tool to consume. As new tweets came in, they were given to a worker who accessed Twitter using their cloned buddy accounts stored cookie and tweeted the same tweet.

This queue of tweets eventually turned into a pseudo scheduler of actions for workers as it allowed me to easily add a bit more humanity to my bots - adding delay to actions. This is useful to avoid certain bot detection heuristics like actions that are too close together for a human. For example, as a person signing up to Twitter it takes you a bit of time to create your account, upload a picture, fill out all the details, maybe confirm your email later on when you check it, etc. A naive bot would do all these actions way too quickly, right after the account is created. Instead, we create separate jobs for each of these and sprinkled them queue, but not too close, and in a random order.

Testing

Once I had my chicken wire python mostly working I needed to test it thoroughly on real people, which meant waiting for them to tweet. To try and keep the downtime of waiting for someone to say something on twitter, I went straight to a person who I know can’t stop typing on there - Asher Wolf. I don’t really know much about them but they seem to tweet nearly constantly and that’s all I really needed for a patient zero. Which they did, initially, but I hadn’t realised by choosing a very left wing radical twitter personality that I would end up with a cloned army of the mostly left-wing political activists who hung around Asher Wolf.

This lead to the hilarious situation where my code had a bug in it in which the original account was sometimes tweeted at instead of the cloned one. This alerted the original person to the existence of my mirror twitter world, who immediately thought it was a government conspiracy designed to attack leftwing activists!

After that run in I didn’t want any more scrutiny until I had worked out all the bugs. I had to find a new group of people who tweet all the time but aren’t technically savvy like the activists - so I settled on Beliebers. They were a perfect group because they tweeted constantly and were not technically competent. The biggest issue was reading the Bieber tweets while debugging. I’m not sure if you’ve had the pleasure of spending time in the Justin Bieber fandom but it isn’t pretty. I started getting some overlap with other groups and soon had many One Directioners cloned too.

With most of the issues worked out, the network slowly grew until it hit about 3k accounts and I switched off its growth engine. I wanted to age the accounts and see if they stuck around or would get discovered. After a couple of months, it was eventually whittled down to about 1k accounts and sits at the number today. Though they haven’t had anything posted there since 2014.

Trolling

In case you don’t know, I lecture Computer Security at the University of New South Wales in Sydney Australia and every year since 2012 my students have won CySCA - a hacking competition run by .gov.au between the different Australian universities. During the 2013 competition, I was bored at the uni watching the students compete and wanted to have some fun by abusing the #CySCA2013 hashtag. But what to tweet?

I could have been a little bit boring and childish by going with something overtly shocking like spamming the hashtag with swearwords, pictures of dicks, or jokes at the expense of the other universities. But I thought, why mess with perfection? So I just let the accounts do what they do - tweet their inane tweets about Justin Bieber/One Direction just with the #CySCA2013 hashtag along with it.

twitter1.png

Spot the odd one out?

Unfortunately, it didn’t 100% work at the time. It was my first time trying to get tweets into a hashtag and found that it’s very geographic based. My bots had their location set to same as the original account, so they weren’t clustered up enough to make much of a impact on their local #trending. They mostly didn’t show up in the feed, at least from what my twitter account could see, except for the occasional loner. But it was here that I found my favourite, so I wasn’t too disappointed :>

twitter2.png

Me too Allena, me too