Frank Pasquale unravels the new machine age of algorithms and bots

In his book "The Black Box Society," Pasquale exposes secret algorithms behind the scenes of corporate America.

A total of 100 bodies of Robi prefabricated humanoid robot perform during a promotional event for the third edition launch of its DIY weekly magazine that comes with the robot kits. (AP Photo/Eugene Hoshiko)

Eugene Hoshiko/AP

Privacy

By Evan Selinger, Columnist

January 28, 2015

Slate recently said Frank Pasquale's new book, "The Black Box Society: The Secret Algorithms That Control Money and Information," attempts to "come to grips with the dangers of 'runaway data' and 'black box algorithms' more comprehensively than any other book to date.'聽

I recently spoke with Pasquale about his new book and about how algorithms play a major role in our everyday lives 鈥� from what we see and don't see on the Web, to how companies and banks classify consumers, to influencing the risky deals made by investors.聽Edited excerpts follow.

厂别濒颈苍驳别谤:听What's a black box society? 聽聽

As US seizes oil tanker, Venezuelans focus on daily survival

Pasquale: The term 鈥榖lack box鈥� can refer to a recording device, like the data-monitoring systems in planes, trains, and cars. Or it can mean a system whose workings are mysterious. We can observe its inputs and outputs, but can鈥檛 tell how one becomes the other. Every day, we confront these two meanings. We鈥檙e tracked ever more closely by firms and the government. We often don鈥檛 have a clear idea of just how far this information can travel, how it鈥檚 used, or its consequences.

Selinger: Why are secret algorithms so important to the story you鈥檙e telling?

Pasquale: Sometimes there are runaway algorithms, which, by themselves, take on very important decisions. They may become even more important in the future. For example, autonomous weapon systems could accidentally trigger skirmishes or even wars, based on misinterpreted signals. Presently, algorithms themselves cause problems or snafus that are not nearly as serious but still foreshadow much more troubling developments. Think of the uncontrolled algorithmic trading that led to a major stock market disruption in 2010, or nearly destroyed the firm Knight Capital. Similar technology is now used by small businesses, with occasionally devastating results, as the program "The Spark" at the CBC . Credit scores can also have a direct, negative impact on individuals, without them knowing the basis for sharp changes in their scores.

But one thing I emphasize in the book is that it鈥檚 not only 鈥� and often not primarily 鈥� the algorithms, or even the programmers of algorithms, who are to blame. The algos also serve as a way of hiding or rationalizing what top management is doing. That鈥檚 what worries me most 鈥� when 鈥渄ata-driven鈥� algorithms that are supposedly objective and serving customers and users, are in fact biased and working only to boost the fortunes of an elite.

Selinger: Are you talking about people diffusing power and masking shady agendas through algorithms and hoping they won鈥檛 get caught? Or are you suggesting that delegating decisions to algorithms is a strategy that actually immunizes folks from blame?

Letter from Paris: France weighs an end to its four-day school week

Pasquale: I think both things are happening, actually. There are people at the top of organizations who want to take risks without taking responsibility. CEOs, managers, and others can give winks and nudges that suggest the results they want risk analysts and data scientists to create. Algorithmic methods of scoring and predictive analytics are flexible, and can accommodate many goals. Let鈥檚 talk about an example that鈥檚 currently being litigated. It concerns a ratings agency that exaggerated the creditworthiness of mortgage-backed securities.

One key part of the case comes down to whether 600,000 mortgages should have been added to a sample of 150,000 that clearly served the interests of the firm鈥檚 main clients, but was increasingly less representative of the housing market. Turns out that once the sample increases the loans at the heart of the housing crisis start to look more risky. And so here鈥檚 the problem. By avoiding the data, you can give AAA ratings to many more clients who pay top dollar for the ratings than you鈥檇 be able to if you used more accurate information.

'At present, algorithms are ripe for manipulation and corruption.' 鈥� Pasquale聽

It鈥檚 not as if there鈥檚 something wrong with math or models. And there鈥檚 nothing wrong with computational algorithms in themselves. The problem is that, at present, algorithms are ripe for manipulation and corruption. New disclosure requirements could help. But complexity can defeat those, too. For example, the rating agencies now, after Dodd-Frank, must disclose certain aspects of their models and use of data. But firms that want to model the value of mortgage-backed securities can deploy proprietary software, sometimes costing millions of dollars, to do so. That once again obfuscates matters for people who can鈥檛 access the software. There is a persistent worry that creates more financial uncertainty and instability.

Selinger: There have always been strategies for getting nefarious things done while hiding the dirt on your hands. Managers can create perverse incentive structures and then blame employees for the inevitable malfeasance. Just demonize 鈥渂ad apples鈥� who broke the rules, and deny that the titled system had anything to do with their behavior. So, what鈥檚 new here?

Pasquale: Purportedly scientific and data-driven business practices are now being billed as ways of making our world more secure and predictable. For good faith actors, those aspirations are laudable. But less scrupulous managers have found ways of taking risks on the basis of contestable models containing massaged data. And this, in turn, creates more instability.

There鈥檚 a paradox at the heart of the black box society. We鈥檙e constantly driven to justify things scientifically, even when that鈥檚 not possible. So in far too many contexts, there鈥檚 pressure to find some, any kind of data, in order to meet arbitrary or self-serving 鈥渜uality鈥� standard. All too often, in finance, dubious data can enter manipulable models created for opportunistic traders answering to clueless CEOs.

Selinger: Any other examples spring to mind that illustrate problems with lack of algorithmic accuracy or transparency?

Pasquale: It鈥檚 also a major issue in credit scoring for individuals. Exposes have shown how careless the big credit bureaus are with requests for correction. Even more frighteningly, in a report called Pam Dixon and Bob Gellman have shown that there are hundreds of credit scores that people don鈥檛 even know about, which can affect the opportunities they get.

In terms of processing the data, there are some worries about major Internet firms. But because of the black box nature of the algorithms, it鈥檚 often hard to definitively prove untoward behavior. For example, various companies have complained that Google suddenly, and unfairly, dropped them in search engine results. , a British firm, was a most noted example. It has argued that Google reduced its ranking (and those of other shopping sites) in order to promote its own shopping alternatives. Yelp has also about unfair practices.

Critical questions came up. Can we trust that Google is putting user interests first in its rankings? Or are its commercial interests distorting what comes up in results? Foundem said they were being disappeared to help make room for Google Shopping. And when you consider how vital Google now considers it to be, to , it makes some sense that the firm would do more to shade its results to favor its own properties in subtle ways that are barely detectable to the average consumer.

Then, there鈥檚 the use of data. Who knows exactly how Google is using all the data they collect on us? There are documented examples of secondary uses of data, in other sectors, that are very troubling. Many people were surprised to learn, back in 2008, that their prescription records were being used by insurers to determine whether they should get coverage. Basically, to apply for insurance, they had to waive their HIPAA protections, and allow the insurer to consult data brokers who produced health scores predicting how sick they were likely to get. And the insurers had their own 鈥渞ed flags鈥� 鈥� for example, anyone who鈥檇 been on Prozac was assumed to be too risky.

'Presently, lots of people consider being on top of Twitter鈥檚 trending topics, or Google or Amazon search results, an important bragging right. But if these results are relatively easy to manipulate, or are really dictated by the corporate interests of the big Internet firms, they should be seen less as the 鈥渧oice of the people鈥� than as a new form of marketing.' 鈥� Pasquale

So we鈥檝e covered bad data, bad processing of data, and bad uses of data. Sometimes, all three concerns come together. Think of the controversy about Twitter during the time of Occupy Wall Street. In 2011 tons of people were tweeting #OccupyWallStreet, but it never came up as a trending topic 鈥� that is, one of the topics that appears on all users鈥� home pages. In response to accusations of censorship, Twitter stated that its algorithms don鈥檛 focus on popularity, but rather the velocity of accelerating trends. According to Twitter, the Occupy trend may have been popular, but it was too slow to gain popularity to be recognized as a trending topic.

That may be right. And the folks at Twitter are perfectly entitled to make that decision. But I have some nagging concerns. First, is bot behavior counted? There are on Twitter. And it鈥檚 easy to imagine some savvy group of programmers manipulating algorithms to get their favored topics pride of place. That鈥檚 a data integrity problem. The data processing problem is related. Clearly popularity has to be some part of the 鈥渢rending鈥� algo. No hashtag is getting to the top just by suddenly having 10 mentions instead of one. But how much does it matter? No one outside the company knows. Or if they do, they may well be jealously guarding that secret, to gain some commercial advantage.

This brings us to use. Presently, lots of people consider being on top of Twitter鈥檚 trending topics, or Google or Amazon search results, an important bragging right. But if these results are relatively easy to manipulate, or are really dictated by the corporate interests of the big Internet firms, they should be seen less as the 鈥渧oice of the people鈥� than as a new form of marketing. Or, to use Rob Walker鈥檚 term, 鈥渕urketing.鈥� For example, has anyone audited the data behind Twitter鈥檚 trending topics? I don鈥檛 think so. There鈥檚 no reliable way to access the data and algos you鈥檇 need to do a scientifically valid job there to be convincing. It may seem trivial now, but the more these kind of methods are used in the mass media, the more important they鈥檒l be.

Selinger: Was the public response to how Twitter characterized Occupy Wall Street an indication of how little most of us understand about how algorithms are constructed? Or was it a justified indictment of Twitter for not making relevant information publicly available?

Pasquale: Media literacy is important, but in an era of score-driven education, it鈥檚 exactly the type of humanities education that鈥檚 on the chopping block. So new media should shoulder an obligation here. They could have something relatively unobtrusive but standard, like an , that links to a page explaining how information is being presented and what decisions go into it. Even right now, if I log onto Twitter, I won鈥檛 see anything that explains what the trends are.

Of course, total transparency provides an incentive to game the system. But at least some broad outline of the standards used, and purposes of, categories like 鈥淭rends鈥� would offer a baseline for understanding what鈥檚 going on.

Let鈥檚 also recall that when the activists heard about the trending problem they developed a new API called Thunderclap that lets a whole bunch of tweets come to users at once. Twitter responded by Thunderclap鈥檚 access to the platform. Maybe that decision reflects users鈥� interests. It might be annoying to get 100 tweets on the same topic at once in one鈥檚 timeline. But there are other ways of managing such issues. It just might take more time to implement them. But when a firm needs to demonstrate a potential for rapidly scaling revenue growth to Wall Street, it can鈥檛 afford to experiment. It鈥檚 left making rapid, blunt decisions.

Lilly Irani recently explained why that might be the case, referencing the heuristics and perhaps of venture capitalists and other investors. If a firm is classified as a tech firm, as opposed to a more traditional one, it鈥檚 often just assumed that it can scale to serve much larger numbers of people without proportionally increasing its own labor costs. So the investors鈥� algorithms favor firms that use algorithms to deal with controversies, problems, and information overload.

'Unsurprisingly, they found that even educated, power users don鈥檛 have a good idea of how Facebook鈥檚 algorithmic curation works.' - Pasquale

That might be fine if major Internet firms were only market actors seeking to maximize profits. But they also have major social, cultural, and political impact. And they brag about that impact. Think, for instance, about US tech firms鈥� self-proclaimed role in the Arab Spring protests.

My big point, here, is that people need to have a better understanding of when algorithmic standards are actually promoting users鈥� interests, as opposed to their monetization.

Selinger: Can you draw any general lessons from reflecting on what went wrong in some of the cases where missteps occurred?

Pasquale: With respect to the dominant media platforms discussed in my book, ranging from Twitter to Apple, Facebook, and Google, I think they鈥檙e obligated to enhance new media literacy. To go further, I would say that Facebook 鈥� which is much worse than Twitter, because it鈥檚 so algorithmically filtered, and so often misleading 鈥� has a duty to allow its users to understand how its filtering works. For example, it should permit them to see everything their friends post, if they want to. An API to do just that has been released to researchers. Unsurprisingly, they found that even educated, power users don鈥檛 have a good idea of how Facebook鈥檚 algorithmic curation works.

Selinger: Let鈥檚 get into the 鈥�Right to be Forgotten.鈥� Do you disagree with the typical American response to the issue? Are we missing out on anything by insisting that First Amendment protection absolves Google of needing to remove links to information people deem irrelevant or outdated? Bottom line: Can we do better?

Pasquale: We can. There are two really clear examples of the U.S. embracing the erasure of information, or, at the very least, the non-actionability of information. These are the and . Thanks to the Fair Credit Reporting Act, bankruptcies need to be removed after a certain amount of time elapses, so that the information doesn鈥檛 dog people forever. Human Resources is going to look for the worst thing on reports that come their way, and in many cases, not hire someone who declared bankruptcy. Expunging various types of criminal records is a very important right, especially since right now many employment practices are driven algorithmically and will knock out a candidate just for having been arrested, without human review on a case-by-case basis.

Over time we鈥檝e realized that it鈥檚 important to give people second chances. Now that we have so many algorithmically driven decisions, and now that one piece of information can wreck someone鈥檚 life, it鈥檚 incredibly important for people to be able to get some information out of the system altogether. But even if one believes that no information should be 鈥渄eleted鈥� 鈥� that every slip and mistake anyone makes should be on their permanent record forever 鈥� we could still try to influence the processing of the data to reduce the importance of misdeeds from long-ago.

Selinger: What, if any, chance do we see of having such ideals implemented here when the current American response to the Right to be Forgotten seems to be 鈥� that鈥檚 the 鈥�European way鈥� of looking at things?

Pasquale: Well, if there are First Amendment absolutists in the Senate willing to filibuster, all bets are off. But there鈥檚 still hope. Think about the medical records that were part of the Sony hacks, and the iCloud nude photo hacks. It鈥檚 very troubling to think that, even if the original hackers were punished, later 鈥渞e-publishers鈥� of the same material could just keep it available perpetually in thousands of 鈥渕irror sites.鈥�

Of course, the First Amendment does permit republication of material that鈥檚 . But not everything is really of 鈥減ublic concern.鈥� I鈥檝e talked to Congressional staffers about this. People are starting to realize that you can鈥檛 have a regime where everything is deemed a matter of public concern and you can have persistent, uncontrolled, runaway publications of images and sensitive data that serve no public purpose. That epiphany will provide a foothold for something like the Right to be Forgotten to emerge in the United States.

Selinger: Does this way of looking at things cast aspersions on the freedom available to data brokers? They鈥檝e got access to lots of sensitive information, including medical data, and little stands in their way of profiting from selling it over and over again.

Pasquale: Data brokers have sprung up out of relatively humble soil, like direct marketing and shoppers鈥� lists, and become a multibillion dollar industry that (in the aggregate) has the goal of psychologically, medically, politically, profiling everyone in the world. People don鈥檛 fully appreciate the extent to which data brokers can trade information amongst themselves to create ever more detailed profiles of individuals.

People are starting to realize that you can鈥檛 have a regime where everything is deemed a matter of public concern and you can have persistent, uncontrolled, runaway publications of images and sensitive data that serve no public purpose. That epiphany will provide a foothold for something like the Right to be Forgotten to emerge in the United States.

In a 2011 case, IMS Health v. Sorrell, the Supreme Court did recognize a First Amendment right for a data broker to gather information in order to help its clients' target marketing. But that case made clear that it did not affect . So to come back to the point about public concern, let鈥檚 say that, for example, a law eventually passed that subjected data brokers to HIPPA-like regulations and limited the information they could use about people鈥檚 health status. Right now they literally have lists of millions of people said to be diabetics, have cancer, have AIDS, be depressed, be impotent, et cetera.

I believe that individuals have the right to stand in relation to the list-creators, pursuant to a future data protection law, as they now do in relation to their health care providers, pursuant to HIPAA. Thanks to HIPAA, I can review my medical records at a HIPAA-covered entity, and object to many uses of them, and even see who looked at them. In other words, I can demand an 鈥渁ccounting of disclosures.鈥� If data brokers really want to extend IMS Health to stop similar rules applying to them 鈥� well, then, they might just create precedent to destroy HIPAA itself. And in that case, your doctor could claim a 鈥淔irst Amendment right鈥� to tell everyone he knows about your medical conditions. Or nurses with access to the records could do the same. It鈥檚 an absurd result.

Selinger: I鈥檇 like to end our conversation by switching gears slightly so that we can talk about what philosophers call the epistemology of big data. Boosters see big data as capable of revealing everything, from who we really are to how law enforcement can do predictive policing. Is this exaggeration? If so, is this is an issue for the black box society?

Pasquale: I鈥檇 like to write a piece called 鈥淭he impoverished epistemology of big data鈥� because it鈥檚 driving a lot of decisions in crudely behavioristic ways. For example, we constantly hear the that found out that people who buy felt pads [to put on furniture legs so that chairs and couches and desks won't scratch the floor] are more reliable 鈥� for paying back their debts 鈥� than their credit scores would lead you to believe. This becomes a story about big data unlocking secret signals that can help us arrange the world in a more just and equitable manner because now the people who buy felt pads finally will be rewarded. I鈥檓 skeptical, and for several reasons.

'People don鈥檛 fully appreciate the extent to which data brokers can trade information amongst themselves to create ever more detailed profiles of individuals.' 鈥� Pasquale

First, when this gets out there you鈥檒l have a certain class of people who will buy the felt pads just to get better credit scores. If I predict the weather better, clouds aren鈥檛 going to conspire to outwit me. When a retailer predicts consumer behavior, he may well create a gameable classifier. This means, a once-reliable indicator becomes less reliable as it鈥檚 better known. Now maybe that is a good thing. Maybe the world would be better and more sustainable if people who bought felt furniture pads were systematically rewarded, and floors were systematically saved thanks to their efforts. But, second, is it fair that the credit card company gets to analyze us this way, all too often opaquely? In the same article where the felt pads were examined, a credit card company also admitted to using evidence of marriage counseling as a determinant of rates and credit limits. I don鈥檛 think what鈥檚 written in the terms of service should give a company the right to sort me with such a category. We鈥檙e talking about health privacy norms, here. And, quite possibly, if this factor is widely known, it will discourage people who need marriage counseling from getting it.

Third, I want to know why felt pads indicate reliability. Big data mavens could come up with some just-so story. For example, maybe the pad users are more likely to get their full security deposit back, or will have a higher resale value for their house, with its lovely, pristine, unscratched floors. But who knows if that鈥檚 the case? Maybe they鈥檙e just more uptight. Maybe they鈥檙e richer on average, and that is the key variable driving all the rest. In that case, what began as a moralistic way of rewarding the virtuous floor-protectors among us, ends up just being one more way to reward the rich for already being rich.

There is no guarantee that big data will be used to help us better understand society. It could just as easily be a that uses the veneer of science to unscientifically reinforce them. That was a worry in the White House Report on big data, and it鈥檚 a leitmotif of my book.

I see lots of cases like this where people want to tell moralistic stories about big data. But I doubt that there鈥檚 enough social science research to support the claims being made. The Wall Street Journal recently published a about determining who is obese by seeing who has a minivan, no children, and a premium cable subscription. Some said, in response, 鈥淥h, that shows once again the damaging effects of television viewing.鈥� But again, who knows what鈥檚 really causal. Maybe the really key variable is lack of opportunity, which could be driving over consumption of food and TV. It鈥檚 just too stereotypically convenient to assume that the TV viewing drives the obesity, or that the obesity drives the TV viewing. We need much richer, of society to capture the complexity of the problem here.

海角大神