When the humanities meet big data
A "close reading" of canonical texts have long been a staple of the humanities. Now, technology is enabling a "distant reading" of everything else.
A "close reading" of canonical texts have long been a staple of the humanities. Now, technology is enabling a "distant reading" of everything else.
Being a voracious reader is a prerequisite for academics in the humanities, but even the most dedicated bookworm needs to eat, sleep, and socialize.
Not so for computers, which are known for being tireless, thorough, and very fast. And, when asked the right kinds of questions, these electronic speed-readers can grasp patterns that would otherwise lie beyond the reach of human scholars.
That鈥檚 exactly what happened when a team of researchers used machine-learning techniques to plow聽through transcripts of 40,000 speeches聽in a parliamentary assembly during the first two years of the French Revolution, according to a paper published in the Proceedings of the National Academy of Sciences last month. By quantifying the novelty of speech patterns and the extent to which those patterns were copied by subsequent speakers, the researchers illustrated how much of the important intellectual work of the revolution was initially carried out in committees, rather than in the whole assembly.
鈥淲e鈥檙e really getting a quantitative sense of large-scale patterns,鈥 says co-author Simon DeDeo, a professor at Carnegie Mellon University and the Santa Fe Institute, a research center in New Mexico that specializes in complexity science. 鈥淭here鈥檚 a lot of data here. You couldn鈥檛 have run this on a machine from 2000 or 2005.... Now you can do this on a desktop.鈥
Professor DeDeo received his doctorate from Princeton University in 2005 鈥 not in European history, but in astrophysics. That was the tail of an inflationary period in DeDeo鈥檚 chosen field, and opportunities to tackle cosmology鈥檚 big questions were dwindling. 鈥淚t was the end of the golden age,鈥 he says. 鈥淚 went off [and] I spent some time at the Santa Fe Institute, and that鈥檚 where I kind of converted into whatever I am now.鈥
The academy still hasn鈥檛 quite settled on a name for what DeDeo does, but the leading contender is 鈥渄igital humanities,鈥 a term that captures the field鈥檚 deeply interdisciplinary approach. Other digital humanities projects have brought together historians, librarians, literary critics, mathematicians, and computer scientists to analyze the聽complete works of Shakespeare,聽Time magazine covers,聽the ancient graffiti of Pompeii, and聽one million pages of Japanese manga.
鈥淥ne of the exciting things is, can the humanities and the sciences team up?鈥 DeDeo asks. 鈥淭here鈥檚 a huge amount of knowledge and wisdom that the humanists have that the scientists don鈥檛.鈥
Digital humanities can be traced to beginnings that are as diverse as the disciplines of its practitioners. One influential figure was Roberto Busa, an Italian Jesuit priest who, beginning in the 1940s, began rendering the works of St. Thomas Aquinas into a machine-readable format. Another is Franco Moretti, a Marxist-trained Italian literary critic who argues that understanding literature comes not from a close reading of the literary canon 聽鈥 literature鈥檚 equivalent to the one percent 鈥 but from a 鈥渄istant reading鈥 of the entire corpus.
Whether inspired by Thomistic completism, Marxist inclusivity, or something else entirely, digital humanities holds the potential to shift the way we look at history. 鈥淭here鈥檚 no way that a single academic could have read all 10,000 bad pulpy novels published in the 19th century,鈥 says Indiana University historian Rebecca Spang, a co-author on the French Revolution paper. 鈥淪o you could ask different kinds of questions because you get different kinds of information.鈥
In the case of the French parliamentary assembly analysis, researchers found that, unlike Democrats and Republicans today, the bourgeoise and the aristocrats tended to use same language patterns. 鈥淭here isn鈥檛 a sort of discursive spectrum that we can identify,鈥澛燩rofessor Spang says,聽鈥漺here you鈥檝e got speakers on the right who use one vocabulary and the speakers on the left using another.鈥
Distant reading also results in a different understanding of the subject matter, one that is more holistic but also stands at a greater remove.
From the point of view of the computer, says Professor Spang, 鈥渋t doesn鈥檛 matter what 鈥榞hijk鈥 means or says, just that it鈥檚 not 鈥榓bcdef.鈥
鈥淭his kind of work is not going to give us a kind of emotionally or narratively satisfying historical explanation,鈥 says David Andress, a historian at the University of Portsmouth in Britain and an expert on the French Revolution, 鈥渂ut it鈥檚 certainly going to show us things that we then have to explain, that that we then have to explore why we鈥檝e got that result.鈥
This explanatory gap is why Dr. Andress doesn鈥檛 see digital humanities as a threat to traditional scholarship. 鈥淭he readers of history and the general public are always going to want to have the story told to them in terms of people,鈥 he says.
[Editor's note: An earlier version misstated the year DeDeo was awarded his doctorate.]