海角大神

Privacy, accuracy, and the looming 2020 census

|
Bebeto Matthews/AP
People walk through New York's Times Square on Aug. 22, 2019. With just a few months left before America starts taking its biggest self-portrait, the Census Bureau is grappling with a host of concerns about the head count, including how to ensure that it is secure and accurate.

Ten seconds pass as Joe Salvo considers how New York City, his longtime home, could prepare an emergency response strategy to a hurricane 鈥 without using census data.

He draws a blank.

鈥淲here do you go first? Where you send resources that are by definition limited?鈥 says the head of the population division at the city鈥檚 planning department. 鈥淲e would try to come up with something to figure out where the resources should go, but it would always involve census data. There鈥檚 no other source like this.鈥

Why We Wrote This

While accuracy is important in a head count, so is individuals鈥 privacy. The Census Bureau has changed its process as it works to ensure identities are shielded 鈥 but some are concerned the data won鈥檛 be as useful.

But soon that source might be less useful than it once was. Demographers, social scientists, and other data users like Mr. Salvo are concerned that their ability to draw upon 2020 census data for city planning and other routine purposes could be affected by a Census Bureau decision to adopt a more rigorous system to ensure and measure privacy for survey participants.

Census scientists say this framework 鈥 known as 鈥渄ifferential privacy鈥 鈥 is necessary to help ensure hackers can鈥檛 take census data, mash it up with other public datasets, and identify individuals. But the shift has proved divisive. Critics argue that privileging the data鈥檚 privacy undermines its accuracy and utility and restricts the general public鈥檚 ability to access it. They say it could jeopardize everything from the聽data that cities use to prepare for natural disasters to the redistricting process that determines United States representatives.聽

鈥淚t鈥檚 not just the most important source in the social sciences,鈥 says Steven Ruggles, a University of Minnesota history and population studies professor. 鈥淚t鈥檚 one of the most widely used scientific resources in the world.鈥澛

The Census Bureau has long used ad hoc tweaks to balance accuracy and privacy in its data. Years of technological innovation and highly public data breaches have complicated that task. Switching to differential privacy allows the bureau to better see just how much privacy protection might be needed. While tightening up could rankle data users, a privacy breach could erode public trust 鈥 something in short supply since the Trump administration tried to defy the Supreme Court and add an unpopular citizenship question.聽

鈥淭he census is only as good as the public鈥檚 willingness to participate, and that participation hinges significantly on perceptions that the Census Bureau will keep personal information confidential,鈥 says Terri Ann Lowenthal, a former congressional aide for a House subcommittee that oversees the census. 鈥淚f public confidence in the confidentiality of data erodes, then participation in the census will likely decline.鈥

The case for differential privacy

By law, the U.S. Census Bureau is required to protect the privacy of individuals and establishments, ensuring that they can鈥檛 be identified from census-released data.

That used to just mean withholding names. But as computers became more powerful, it became theoretically possible to combine outside databases such as property records, credit reports, and voter rolls with census data tables on age, ethnicity, geography, and so forth to try to get a statistical picture of actual Americans.

To guard against this, the bureau has long injected inaccuracies, or 鈥渘oise,鈥 into the data. It wouldn't release details on such efforts to prevent reverse engineering of the process.

The bureau鈥檚 old 鈥渄isclosure avoidance methods鈥 were 鈥渕ore art than science,鈥 says John Abowd, the chief scientist.

Meanwhile, computer science has continued to advance. The bureau鈥檚 interest in differential privacy has been pushed in part by development of something called the database reconstruction theorem. This theorem holds that given enough information, researchers can take collections of summary tables and reconstruct them into approximate records of individuals.

The bureau has never had a data breach, but internal tests of the 2010 census showed that the bureau had been able to match the race and ethnicity of nearly 20% of the 308,745,538 people counted using publicly available information. Corroborating the data鈥檚 veracity required access to confidential bureau information, which limits the data鈥檚 utility and potential harm, but the very fact that the data was vulnerable frightened researchers across the field.

Guarding against this is a big reason the Census Bureau has turned to differential privacy, which offers data scientists a tantalizing prospect: a way to confidently measure the extent of data鈥檚 confidentiality.

Popularized by tech companies like Apple and Facebook, differential privacy is a system that formally injects noise into data and then produces a numerical value that describes how much privacy loss a person will experience with a given noise amount. The term 鈥渆psilon鈥 is used to symbolize this value.

It鈥檚 a trade-off: More noise means more privacy, but less accuracy. Less noise means less privacy, but information might be more usable to researchers.

鈥淭he decision to balance accuracy with privacy is not a scientific or technical decision. It鈥檚 a political, moral, ethical decision,鈥 says Indivar Dutta-Gupta, co-executive director of the Georgetown Center on Poverty and Inequality. 鈥淧eople whose lives, livelihoods, and well-being depend on the data will generally share both goals of ensuring that privacy is protected and that there is some accuracy. And we need to sort of think through where we strike that balance.鈥

Most people agree that the bureau needs stronger confidentiality protections, but there is still debate over how much noise to inject. Differential privacy will require the bureau to聽navigate聽the trade-off between privacy and accuracy, as one would fiddle with a knob to adjust the temperature of a bath.聽

鈥淲hen [the Census Bureau] puts out a public table, it can鈥檛 ignore the fact that someone could try to match things with outside data,鈥 says Mr. Dutta-Gupta. 鈥淎s far as I can tell, differential privacy is the only way to think this through. I think a lot of the concerns that people have about it could be addressed in part just by how you spend the privacy budget [and] where you set the epsilon.鈥

Why differential privacy may be an overreaction

Where the bureau sets the epsilon matters a lot to Andrew Beveridge, a sociology professor at Queens College and the Graduate Center of the City University of New York. He uses a lot of redistricting data in his research and worries that differential privacy will introduce so much noise as to make it unusable.聽

鈥淚t would be fine if it would behave like real data; I have no problem with that,鈥 he says. 鈥淚 just ... don鈥檛 know if it does [behave like that]. There still is some evidence that it doesn鈥檛.鈥

Dr. Beveridge even suggests the bureau could be sued if the data is too fuzzy. Dr. Abowd downplays that concern and says redistricting data will be as robust as in previous censuses. What鈥檚 changed is that the noise injection is public.

Mr. Salvo of the New York planning department has accepted the inevitability of noise, but he wants the bureau to 鈥渆mpirically demonstrate that [the noise] will not damage what is the essence or the mission of the bureau: to give us data.鈥澛

Other experts call it a 鈥渞adical reinterpretation鈥 that too greatly privileges confidentiality.

鈥淚 think the Census Bureau chief decision-makers are underestimating the possibility that there could be a real crisis in confidence if at some point in the future, it鈥檚 discovered that differential privacy caused the Census Bureau data to be relied upon when it was in fact not accurate,鈥 says Jane Bambauer, a law professor at the University of Arizona.

In late September, the bureau will release test products based on the 2010 census to show how different levels of noise might affect the data. It has some data users cautiously optimistic, but the bureau won鈥檛 provide any answers for how differential privacy will apply to the more granular American Community Survey, which is more widely used by data users and the public.

鈥淚t would have been helpful for the Census Bureau to convene its key stakeholders who are data users and other experts in the technology and data fields before it publicly rolled out its plan for differential privacy,鈥 says Ms. Lowenthal, the former congressional aide. 鈥淚t would have been better to have more buy-in before announcing a plan and I think would have gone a long way towards ensuring public confidence in whatever final method the bureau settles on.鈥

To his credit, Dr. Abowd largely agrees with that assessment.

鈥淚t took awhile for all of us at the Census Bureau to understand how to message this to very diverse interest groups,鈥 he says. 鈥淚 think our willingness to continuously improve the way we鈥檙e doing that should be taken as evidence that we understand we haven鈥檛 always effectively communicated the message.鈥

What is decided here will have far-reaching consequences, says Mr. Dutta-Gupta.

鈥淭his is not just about the census, it鈥檚 about other surveys, it鈥檚 about future censuses. And this is more broadly about trust in government. The most important thing for ensuring a fair and accurate count in 2020 is trust.鈥

You've read  of  free articles. Subscribe to continue.
Real news can be honest, hopeful, credible, constructive.
海角大神 was founded in 1908 to lift the standard of journalism and uplift humanity. We aim to 鈥渟peak the truth in love.鈥 Our goal is not to tell you what to think, but to give you the essential knowledge and understanding to come to your own intelligent conclusions. Join us in this mission by subscribing.
QR Code to Privacy, accuracy, and the looming 2020 census
Read this article in
/USA/Politics/2019/0916/Privacy-accuracy-and-the-looming-2020-census
QR Code to Subscription page
Start your subscription today
/subscribe