PCAST, Big Data, and Privacy

Jun 04, 2014 | Labs Blog

By Leslie Francis
reprinted from HealthLawProf Blog

The President’s Council of Advisors on Science and Technology (PCAST) has issued a report intended to be a technological complement to the recent White House report on big data. This PCAST report, however, is far more than a technological analysis—although as a description of technological developments it is wonderfully accessible, clear and informative. It also contains policy recommendations of sweeping significance about how technology should be used and developed. PCAST’s recommendations carry the imprimatur of scientific expertise—and lawyers interested in health policy should be alert to the normative approach of PCAST to big data.

Here, in PCAST’s own words, is the basic approach: “In light of the continuing proliferation of ways to collect and use information about people, PCAST recommends that policy focus primarily on whether specific uses of information about people affect privacy adversely. It also recommends that policy focus on outcomes, on the “what” rather than the “how,” to avoid becoming obsolete as technology advances. The policy framework should accelerate the development and commercialization of technologies that can help to contain adverse impacts on privacy, including research into new technological options. By using technology more effectively, the Nation can lead internationally in making the most of big data’s benefits while limiting the concerns it poses for privacy. Finally, PCAST calls for efforts to assure that there is enough talent available with the expertise needed to develop and use big data in a privacy-sensitive way.” In other words:  assume the importance of continuing to collect and analyze big data, identify potential harms and fixes on a case-by-case basis possibly after the fact, and enlist the help of the commercial sector to develop profitable privacy technologies.

The report begins with an extremely useful (and particularly frightening if you aren’t familiar with the internet of things) description of big data possibilities, now and in the near-term future. The description emphasizes the distinction between data “born digital”—that is, created in digital form—and data “born analog”—arising from the characteristics of the physical world and then becoming accessible in digital form. Data born analog are highly likely to contain more information than just that of particular digital interest; for example, surveillance cameras record everything that is occurring in a particular location, not just acts that are the target of surveillance. But with analytics that allow data fusion, the combination of data sources may reveal new meanings, for example profiling individuals. Big data are high volume, high velocity, and high variety, an intersection that presents serious privacy challenges.

PCAST then attempts to anticipate the privacy harms that might be associated with big data collection and analysis. The harms are in the main presented as byproducts of the benefits of developments of particular types of technologies. The list is impressive, but may miss additional harms associated with the development of a big data world. Here’s a table listing developments, benefits, and harms; I’ve marked with an asterisk benefits that I’ve reconstructed from what PCAST says but that PCAST does not state explicitly.

Technological development Benefit Associated Harm
Digital communication Social networking across geographical boundaries; social and political participation on a far larger scale Shared pipelines and the possibility of interception
Virtual home Ability to store, organize, and share personal records, e.g. cloud storage of photographs. “Home as one’s castle” should extent to “castle in the cloud,” not currently protected
Inferred facts about individuals Delivery of desired or needed services, e.g. targeted marketing Inferences may be drawn about highly sensitive facts about the individual (e.g. sexual orientation)—facts of which the individual may not even be aware (e.g. early demen
Locational identification Services such as navigation or routes, finding people or services nearby, avoiding hazards Stalking and tracking
Personal profiles Benefits of use of statistically valid algorithms False conclusions about individuals may be drawn
Discovery of special cases that apply to individuals within a population May allow tailoring of services to special cases—e.g. personalized medicine, instruction linked to learning styles* Foreclosure of autonomy—individuals may not want to take the predicted path
Identification of individuals May allow individuals to be warned or protected or otherwise benefited* Loss of desired anonymity

PCAST intentionally omitted from this list desires that information be used fairly and that individuals know what others know about them or are doing with their information. In the view of PCAST, neither of these “harms” can be sufficiently defined to enable policy recommendations. Also omitted from this list are more overarching concerns such as effects on identity, security, stigmatization of groups, freedom of expression, or political liberty.

PCAST’s discussion of the current technologies of privacy protection is highly informative and readers with interests in this area would do well to read the report—I won’t summarize it here. The report also debunks several standard methods for privacy protection: notice and choice (a “fantasy”), de-identification (ineffective in light of the development of analytics enabling re-identification), and non-retention or deletion (hopeless given potential for copying including the creation of multiple copies at the point analog data become digital).

Instead, the report suggests several different approaches for protection against data misuse. As a successor to notice/consent, PCAST recommends the development of “privacy preference profiles,” perhaps by third parties such as the ACLU or Consumer Reports; apps or other internet entities could then indicate whether their privacy policies comport with a profile specified by the consumer. Or, the profile developers might offer the service of vetting apps.  Ideally, technologies could be developed to perform the vetting automatically.  PCAST also recommends developing use controls associated with data collection, use, and subsequent transmission of data or uses. Metadata might serve this purpose but there is clearly need for further development. Another suggested strategy is audit capability as a deterrent to misuse. Finally, PCAST suggests implementing the Consumer Privacy Bill of Rights through recognition of potential harmful uses of data. Emphasis should be placed on development of best practices to prevent inappropriate data use throughout the data life cycle.

Five major policy approaches (they are called recommendations, but they are far better characterized as general directions rather than specific recommendations) conclude the report. They are:

–attention should focus on uses of big data rather than collection and analysis

–policies should not be stated in terms of technical solutions but in terms of intended outcomes

–the US should strengthen privacy-related research, including relevant social science informing successful application of technologies

–the US Office of Science and Technology Policy should increase education and training efforts

–the US should take international leadership by adopting policies that stimulate the development of privacy protective technologies.

These recommendations seem remarkably anodyne after the detailed discussion of technologies that preceded them. Moreover, they are also preceded by some other, less anodyne policy observations (I found these quite troubling—for reasons I just begin to suggest parenthetically below):

–basing policy on data collection is unlikely to succeed, except in very limited contexts (such as health information) where there may be possibilities for meaningful notice and consent. (Why, I ask, is notice/consent the only way to approach collection practices? What about other sorts of restrictions on collection? Or, is the thought that getting the data is both inevitable and desirable, no matter what the context?)

–regulating at the moment individuals are particularized by analytics might be technically possible—but even so, it’s preferable to focus on harms downstream (Doesn’t this expose people to risks of harm, correctable only after the fact? Shouldn’t we consider building ways to detect and deter re-identification that could intervene before the harm occurs?)

–drafting savvy model legislation on cyber-torts might help improve the current patch-work of liability rules for privacy violations (Why not a public law approach to violations rather than placing the onus on individual litigation?)

–forbidding the government from certain classes of uses might be desirable, even if these uses remain available in the private sector (So is the government the only or even primary problem with big data use???)