How can machine learning algorithms find drunk Twitter users?
Loading...
Using Twitter to follow trends is nothing new; the social media platform is known for actively tracking popular topics and highlighting them on its website. But a new algorithm may be able to detect a different type of pattern among its users: drinking habits.
Twitter keeps track of what its users post, when they post, and where they post from, and with that data a team of University of Rochester researchers was able to develop a method for evaluating how and where Twitter users drink alcohol.
鈥淎nalysis of Twitter has become a widespread approach for , such as alcohol consumption and exercise, and human latent states, such as sickness and depression,鈥 the researchers wrote in a summary of their study.
鈥淗owever, nearly all prior work 鈥 does not attempt to distinguish mere mentions of activities or states from self-reports of activity. Moreover, no attempt has been made to distinguish reports about future or past activities and in-the-moment reports that provide finer details when geo-tagged tweets are used to map specific locations of activities,鈥 they added, highlighting what they hoped to address through their investigation.
In order to track regional drinking habits through Twitter, the team came up with a system with which they could identify relevant tweets. The Rochester analysts came up with a series of three questions they used to determine if a tweet originated from a drinking user: Does the tweet mention alcoholic beverages 鈥 did they use words such as 鈥渄runk,鈥 鈥渂eer,鈥 or 鈥渁lcohol?鈥 Is the tweet about the tweeter consuming such beverages? And, is it likely the tweet was sent while the tweeter was drinking?
The study used volunteers on Amazon's Mechanical Turk 鈥 an online marketplace where 鈥渞equesters鈥 can post tasks to be completed by human 鈥渢urkers鈥 鈥 to best evaluate how to find drinking-related tweets. Using data from the human trials, the team was able to program a support vector machine to follow the same line of inquiry as the humans did in order to accurately find relevant tweets.
Using that initial process, and further machine learning predictive algorithms to estimate tweeters鈥 locations, an analysis of Twitter users鈥 alcohol consumption habits was compiled. All tweets in the study were taken from the New York City metropolitan area, and the results are based around drinking preferences in the city versus the suburbs, and drinking at home versus drinking away from home.
The Rochester team found that most drinkers stay relatively close to home when imbibing in both residential situations, with suburban drinkers more likely to stray farther away. The researchers also found a positive correlation between the density of 鈥渁lcohol outlets鈥 such as liquor stores and bars and the amount of Tweets sent out about drinking. While the paper notes that 鈥渃orrelation does not necessarily imply causation,鈥 it cites several previous studies that arrived at similar conclusions regarding alcohol availability and drinking.
The final results painted an interesting picture of New York鈥檚 drinking habits, but also suggested that similar algorithms and research methodology could be used to 鈥渉elp to create a tool for improving a community鈥檚 health, given social networks can become a resource to spread positive health behaviour,鈥 wrote the researchers. They did, however, note one significant bias in the report: the relatively high rate of young and minority users on the Twitter platform. But they said that studies in all fields see similar problems and could be weighted accordingly, and that their final conclusions were fairly successful in analyzing the New York drinking scene, with high potential for the future of complementary Twitter-based systematic studies.
鈥淥ur results demonstrate that tweets can provide powerful and fine-grained cues of activities going on in cities,鈥 the team said.