Researchers develop formula that reveals home location based on tweets
IBM researches announced Friday they successfully developed an algorithm to track down any Twitter user’s home city based on metadata contained in their last 200 tweets.
The formula, which researchers said could benefit targeted advertising for marketers or locating major news events for journalists, has an almost 70 percent rate of accuracy according to MIT Technology Review, and is the latest research finding to highlight the possible danger to privacy and security presented by metadata collection and analysis.
One of Twitter’s option features allows for the location tagging of every tweet a user posts. Research head Jalal Mahmud lead the IBM team’s effort, which began with a question: Is it possible to predict the location of a Twitter account holder’s location by analyzing tweets and matching the content against their geotagged metadata?
The team started by tracking geotagged tweets from the 100 largest cities in America between July and August 2011, and isolated 100 users out of each location. Researchers then examined the last 200 tweets from each user, discounting private tweets from the mix, and were left with 1.5 million geotagged tweets from almost 10,000 users.
Ten percent of the data was then set aside to test against later, while the bulk 90 percent was analyzed layer upon layer to create the location-estimating formula.
Key to the formula is the additional information users are including in their tweets – 100,000 pulled from the team’s data collection were submitted by users linking their Twitter accounts to the popular Foursquare location-based social networking platform, and in 300,000 other cases, users included the names of cities from the U.S. Geological Service gazetteer in tweets.
The team also found the national distribution of tweets was more or less constant on a daily basis, which allowed them to isolate user’s time zones based on their tweet pattern. Even the specifically-worded content of posts themselves aided tracking when users would type in things like the name of a sports team, for example.
With their algorithm established, the team then used it on the 10 percent of data set aside before analysis, and found that in less than one second for each individual it was able to correctly identify a user’s home city 68 percent of the time, home state 70 percent, and time zone 80 percent.
Researchers said the algorithm could be even more accurate in the future by including tweet mentions of specific areas and landmarks, for example.
Follow Giuseppe on Twitter