Tuesday, March 27, 2012

Bayesian Network to Combat Twitter Spam

Ever since I learned about Bayesian networks I see them everywhere.  I'm constantly constructing them in my head and sometimes I even put them into samiam so I can see what it looks like and play around with some of the values.  Recently I started thinking about applying this to Twitter spam.  I've seen an uptick in the amount of spam I'm getting on Twitter so this is constantly coming up for me.

Combatting Spam

The techniques for combating spam are fairly well known, things like reputation of the sender and the content of the message.  There are, of course, problems for dealing with spam in email.  For example, my MTA usually doesn't know much about the sender of the email message that it is evaluating.  But Twitter is in a unique position in that it knows every tweet that has been sent.  When I send a tweet the Twitter application knows how long I have been a member, how many followers I have, when I last reset my password, what IP addresses or phone numbers I use and what time I typically tweet.  I imagine if my MTA knew that kind of detail about the sender of every email message that it evaluates spam would have been a solved problem by now.

Now whenever you're talking about spam it's a good idea to read the classic Slashdot post that often appears when talking about spam.  If you read Slashdot, you've seen this: http://tech.slashdot.org/comments.pl?sid=954433&cid=24892585.  The reason I suspect that Twitter is in a different place is that it is the only source of Tweeting, so if you have a solution that requires everyone to change at once Twitter is actually able to make that happen.  Anonymous senders from other countries are less of a problem because it isn't like there are any open Twitter relays out there (are there?).

Bayesian Network



So this is what my Bayesian network looks like.  The network tries to decide if it should allow a message, throttle a message, or completely block the sending account and make them go through some hoop to restore access.  The decision to allow/throttle/block is made based on the reputation of the tweeter, suspiciousness of the message, and any indicators that the account has been compromised.

The indicators of compromise is a set of information that Twitter would likely be able to gather that an MTA would never be able to reliably evaluate.  The network would evaluate whether the user's account had a recent password change and if the message was being sent from a geographic region or time of day that is unusual for the user.  The network would also look for significant upticks in the number of tweets being sent by the user.

Of course any spam system needs to evaluate the content of the message so mine would look for keywords or phrases and the presence of links.  In particular, a message that includes only a link and a handle would be viewed suspiciously.  Also of note is the relationship between the sender and recipient.  If there has been two-way communication the message will be less suspicious but if there is only one-way communication or zero previous communication the message again becomes suspicious in the presence of the other indicators.

The most complicated part of the network is evaluating the reputation of the sender.  The reputation is made up of how much the sender has participated in Twitter and how the user has been accepted by the Twitter community.  Users that have many followers and retweets are viewed more favorably than new users with few followers.  Recent spam complaints affect both the acceptance of the user and are also an indicator of compromise.  The network would also try to evaluate how long the sender has been a member of twitter, how often the user tweets and how many people the sender is following.

Getting Scientific

One beautiful thing about Twitter being the sole source of tweets is that they likely have excellent data about the whole population of tweets and twitters.  For example, of new accounts being created how many are shut down within one week for spamming.  How many spam complaints come in for a given reporting period.  This is data that we can use to populate the probabilities in our network and measure success down the road.

One technique I would try is to build this network and then have it select 1000 random tweets that it would have blocked and have those tweets reviewed by a human.  That should give us an idea of our false positive rate.  After the system is implemented we can look for a reduction in the volume of spam complaints.  There are obviously more metrics to look at; the point is that we want to find some evidence to support whether a system like this actually reduces the volume of spam.  If we can validate the system that way, then we have real evidence to back up our best practices and suggest these measures to the next social media site that is going to blow up.  I'm talking to you, Pinterest.

Skepticism

I recently finished reading The Drunkards Walk and How We Decide and picked up a few tips for making better judgements.  One trick that I really liked was to try to list three reasons why you're wrong.  Look for evidence that your hypothesis is wrong rather than evidence that it is correct.  Hypothesis1 is that a Bayesian network like this will measurably reduce Twitter spam.  Hypothesis2 is that the system is practical.  I'm going to cheat a bit and just come up with three objections rather than six.
  1. This is too hard for a computer to do for the high volume of tweets that Twitter deals with. (Hypothesis2)
  2. The false positive rate will be too high for users to accept. (Hypothesis2)
  3. Spammers will be able to easily adjust to avoid the system as they always have (Hypothesis1)
I think the first two of these objections can actually be proven or disproven so I'm not too worried about them.  With the last one it is difficult to prove or disprove and there is a lot of evidence that spammers are able to work around spam filters, so that might derail the whole approach.  I would say more thought is required on that one.

Simple Fix

Sometimes its better to implement something simple that isn't as effective because you can implement that quickly for little cost.  I wonder what kind of dent it would put in Twitter spam if Twitter simply throttled messages that are going from a sender to a recipient that have never interacted before and which only contain a link and a couple words.  Then while the tweets are throttled, block the account after a few spam complaints.  

No comments: