Strava Map Exposes Weaknesses in Understanding Complexities of Pervasive Data | Informatics @ the University of California, Irvine

It was recently reported that Strava unknowingly revealed U.S. military bases when it produced a heat map showing the movement of people around the world who use its exercise-tracking app. In reviewing the map, a college student from Australia realized that he could locate military bases in counties such as Iraq and Syria, where the app was almost exclusively used by American soldiers.

This prompted the U.S. military to review its security practices, and it renewed talks of privacy concerns, but according to Informatics Professor Matthew Bietz, “privacy is probably the wrong framework here.” The issue is much more complex.

Aggregated Data’s Synergistic Effect

Matthew Bietz

In a statement about the incident, Strava explained that users have “the ability to opt out of heat maps altogether.” But as Bietz notes, there was no apparent need. Usually, releasing data in the aggregate helps protect privacy. “I don’t care if Strava knows where I run,” says Bietz, “and they’re pretty good about protecting an individual’s privacy.”

Sharad Mehrotra

Computer Science Professor Sharad Mehrotra agrees, adding that “the issue at hand is not so much about an individual’s privacy being violated but rather about sensitive information leaking through aggregate data release.”

Both assert that we don’t have a solid understanding of aggregated data or a framework in place to address it. “It turns out that patterns reveal places,” says Bietz. Furthermore, pulling data from a variety of sources can reveal not only places but people. “When data meets other data, it can be a transformative experience,” says Bietz. Combining GPS data with social media posts and fitness data can lead to information that is greater than the sum of its parts — which is when aggregated data can threaten individual privacy.

The Need for New Norms

The lack of public understanding and social norms around aggregated data is cause for concern, as is our need for well-defined laws and regulations. Bietz explains, “we’re essentially carrying surveillance devices in our pockets, and we don’t have a good sense of how that data travels and where it goes. The scary part isn’t the intended use of the data. It’s what else the data can be used for.”

More importantly, there are now industries that rely on this collected data for advertising, for example. “We’ve built a whole economy on this secondary inference from data,” says Bietz. The business model for many companies is to sell the data they collect from their apps, which isn’t necessarily dangerous. Some of the clients might be cities that want to know where to develop walking paths. “It’s not that all secondary uses are bad or scary,” says Bietz, “but we don’t have a good understanding of what those secondary uses are — let alone an idea of how to control or regulate them.”

As Mehrotra notes, “fixing it for the future does not eliminate the leakage since the cat is already out of the bag.” There is thus a critical need to better understand how users and companies can unknowingly reveal sensitive information.

Pervasive Projects in the Works

Bietz says that this is part of the motivation behind the four-year, $3 million Pervasive Data Ethics (PERVADE) for Computational Research grant that he and a team of multidisciplinary researchers received from theNational Science Foundation. The researchers, who lightheartedly refer to themselves as the “Data Justice League,” are studying how people experience the reuse of their personal data for computational research.

In particular, the PERVADE team aims to develop a framework for use by the institutional review boards that approve human subject research. Currently, human data — as opposed to human subjects — constitutes a gray area for IRBs, so Bietz says the team is “looking into this new ethical landscape.” Although the focus is on computational research, the lessons surrounding secondary uses of data should extend beyond research facilities. As Bietz says, “there’s all this data out there and data exhaust. We’re in a space now where the norms and processes we’ve developed don’t work.”

Mehrotra is also working to develop new norms with a $5 million TIPPERS (Testbed for IoT-based Privacy-Preserving Pervasive Spaces) project, which is part of DARPA’s Brandeis program. Mehrotra and his team have built a system that monitors a variety of sensors distributed throughout UCI’s Donald Bren Hall. The team has been testing a group of apps, and widespread deployment is currently underway, allowing students, staff and faculty to use the apps and provide feedback.

“The TIPPERS framework has enabled us to create a live smart-building testbed,” says Mehrotra. The team is using the testbed to explore privacy-versus-utility tradeoffs and to understand secondary uses of sensor data and the leakage that occurs. One goal is to determine how data mining mechanisms can be used to identify secondary inferences and how such knowledge can be exploited to prevent the unintended leakage of sensitive information — hopefully helping us avoid another Strava-type incident in the future.

— Shani Murray