In an age where ‘privacy is dead, and social media holds the smoking gun,’ Christian Rudder’s new book Dataclysm, published today, provides an irreverent, provocative, and visually fascinating look at what our online lives reveal about who we really are – and how this deluge of data will transform the science of human behaviour. Big Data is used to spy on us, hire and fire us, and sell us things we don’t need. In Dataclysm, Christian Rudder, founder of one of the world’s biggest dating websites OkCupid, puts this flood of information to an entirely different use: understanding human nature. Drawing on terabytes of data from Twitter, Facebook, Reddit, OkCupid, and many other sites, he examines the terrain of human experience to answer a range of questions: Does it matter where you went to school? How racist are we? How do political views alter relationships?
Philosophers, psychologists, gene hunters and neuroscientists have tried to explain our flaws and foibles. Rudder shows that in today’s era of social media, a powerful new approach is possible, one that reveals how we actually behave when we think no one’s looking.
You write that ‘if Big Data’s two running stories have been surveillance and money, for the last three years I’ve been working on a third: the human story.’
What led you to pursue this objective, and how, ultimately, do you ‘humanise’ reams of numbers?
Running a dating site, you can’t help but see the people behind the numbers. OkCupid’s business is love and relationships, and those are just about the most personal things a site can offer to the world, and frankly we don’t have the scale to make data mining efficient for advertising. So the human side of the data was what we focused on from the beginning. In the book, to make numbers and statistics more approachable I use humour and bring myself into it a bit.
What makes this moment in time—and this set of data—different from the massive data surveys of the past, such as Pew, Gallup, or the Kinsey Institute?
The data in my book is almost all passively observed — there’s no questionnaire, no contrived experiment to simulate ‘real life.’ This data is real life. Online you have friends, lovers, enemies, and intense moments of truth without a thought for who’s watching, because ostensibly no one is — except of course the computers recording it all. This is how digital data circumvents that old research obstacle: people’s inability to be honest when the truth makes them look bad. You could never ask people these days if they like racist jokes and get a real answer. Yet lo and behold the country’s most notorious slur for black people is incredibly popular as a Google Search term; it still appears in a half-million searches a month in the United States. As I say in the book, the epithet is more American than ‘apple pie’—we search for it about 30 percent more often. Digital data’s ability to get at the private mind like this is unprecedented and very powerful.
Why hasn’t this kind of approach been taken before?
Well, there are people doing a lot of great work, some of which I weave into the book. Facebook has built a world-class data team to look at things like migration patterns and how ideas spread. Google, via search data, has looked at the effect of race on American politics, among many other important topics. Unfortunately, this work is under-publicized — it’s the dollars and scandal that gets the ink. Dataclysm tries to show that data is useful to everyone who cares about humanity. Not just accountants and the police.
I loved your line that ‘unlike clay tablets, unlike papyrus, unlike paper, newsprint, celluloid, or photo stock, disk space is cheap and nearly inexhaustible. On a hard drive, there’s room for more than just the heroes.’ I really embraced the anti-elitist, ‘every person counts’ philosophy behind your drive for data. Do you think the internet subverts or exacerbates the inequalities of real-life?
The Internet, like most inventions, has made the world both better and worse. As for its effect specifically on the inequalities of real-life, there can only be one answer: we don’t yet know. As pervasive as it is, the Internet has been a social medium for less than a decade. Even something as ‘basic’ as Wikipedia has only really been a meaningful site for seven or eight years. I hope that the democratic potential of the Internet — the connectivity, the easy exchange of information — will win out in the end.
Are you uneasy about the amount of personal data that people willingly submit online?
Well, I think the key word there is ‘willingly’. As long as everyone realizes that whatever they give is stored and analysed then it’s hard to be uncomfortable with the exchange. I have more misgivings about the security cameras that blanket most of the developed world (and especially the U.K.) and the coming plague of quad-copter camera drones. In five years, cameras will be flitting everywhere, and the images will inevitably hit some facial-recognition api.
I particularly loved the statistic that feeling the same way about horror movies is the biggest indicator of whether or not a couple will ‘make it’ long-term. What’s your favourite statistic from the book and why?
That’s the biggest indicator among the ‘lighter’ questions we have. Heavier stuff like ‘Do you want kids?’ and ‘Do you believe in God?’ are of course more powerful predictors. But, still, poll your married friends — they’ll be very likely to have the same opinion, love them or hate them, on horror movies. I’m not sure I have a favourite statistic. Numbers always need context, so pulling them out like that isn’t something I’m used to doing. If I had to choose, I think my favourite part of the book is the part where I pull the ‘most typical’ text from people’s self-descriptions: whites, Asians, men, women, look at what people are saying about themselves and you get clichés right out of a Seinfeld routine.
While reading the book, I was struck by and deeply moved/horrified/amused by the ways in which people actually behave when they think no-one is watching. It’s like peeling back a thick coat of lacquer to reveal the murky whorls and grains of the wood below. What is the most surprising piece of data you uncovered in the course of writing the book?
The more time you spend with social data, the more it just seems to confirm what you already know. Men prefer younger women, being beautiful is helpful, American blacks get short shrift, none of this is that surprising. But the specificity and transparency that data gives to these social phenomena is unprecedented. To know exactly what’s going on gives us power over the trends—in the case of racism, simply knowing can change things for the better. The data on race was surprising only in its stubborn predictability—for all the glitzy technology, the results could’ve been from the 1950s. I grew up in Little Rock and graduated from Central High, the first school in the South to be integrated: Eisenhower, the National Guard, mobs of white people screaming at nine black children, that’s Central. The school embraces its history and is now over half black. I’m no brave crusader, but race (and racism) were part of my education. So when, in researching the book, I unpacked three separate databases and found that in every one white people gave black people short-shrift, I wasn’t shocked, you know? Asians and Latinos apply the same penalty to African Americans that white folks do, which says something about how even (relatively) recent additions to the ‘American experience’ have acquired its biases.
Are you worried about any of this?
I have mixed feelings about the implications. I myself almost never tweet, post, or share anything about my personal life. At the same time, I’ve just spent three years writing about how interesting all this data is, and I cofounded OkCupid. My hope is that this ambivalence makes me a trustworthy guide through the thickets of technology and data. I admire the knowledge that social data can bring us; I also fear the consequences.
Finally, what is your favourite book, and what genre do you most gravitate towards?
My favourite book is Shelby Foote’s three-volume The Civil War. I’ve read it through three times. For something more pertinent to a Briton, I also enjoyed William Manchester’s biographies of Winston Churchill, the Last Lion series, particularly the third and final book, which I thought addressed the many flaws of its hero head-on. I read a lot of history. For fiction, Confederacy of Dunces and The Longships are two recent favourites.
Dataclysm is out now.
Interview by Tara Al Azzawi.
Post by Georgia Bird.