Tuesday, 19 September 2017

What We Reveal Online

first published in The Star, 19 September 2017

"Everybody lies" is a favourite maxim of Hugh Laurie's character, Dr Gregory House, in the medical drama series House. Despite this, he often gets to the bottom of what ails his patients.

Opinion writer for The New York Times and former Google data scientist Seth Stephens-Davidowitz also believes that everybody lies. He says people lie to their friends, bosses, kids, parents, doctors, husbands, wives, and even to themselves.

"And they damn sure lie in surveys."

The images of perfect lives on Facebook and Instagram aren't the whole picture, either.

What people lie less to, according to Stephens-Davidowitz, are search engines.

"The everyday act of typing a word or phrase into a compact, rectangular white box leaves a small trace of truth that, when multiplied by millions, eventually reveals profound realities."

Which is why in his book, Everybody Lies, he posits that these small traces of truth make Google searches a gargantuan pool of "honest" data that holds insights into our true nature. But instead of volume, he focuses on the quality of the information and analysis: "You don't always need a ton of data to find important insights. You need the right data."

Stephens-Davidowitz explains why big data – a catch-all term for all the data out there, including searches, blog posts and everything else we put online – is powerful. It is so huge that even small samples can yield meaningful results, which is how companies such as Google and Facebook can conduct random, controlled experiments online to find out what works and what doesn't.

Big data also offers new types of information and ways to look at things from other angles. Who knew that the brightness of a place at night can indicate its economic situation?

There are limits, of course. The author tells us what can't and shouldn't be done with data, highlighting instances where it can be misused. The low-down on customers' buying patterns can help companies sell more products, for instance, but shouldn't be used to keep customers hooked.

At fewer than 290 pages, not including the acknowledgements, notes and the index, the book is small and digestible for its genre. It covers just enough about big data to make the case for its potential and leave one wanting to know more. The language is pretty straightforward and the tone is conversational.

Occasional displays of wit can be found in the text and the footnotes, particularly in observations about sex and porn, of which there are quite a few – which is perhaps unavoidable when discussing what's on the World Wide Web.

But several of these footnotes feel uncomfortably confessional. For instance, the author hints that he might be an unreliable narrator, particularly in relation to how hard he worked on the book. In a footnote, he says, "Since everybody lies, you should question much of this story." Because, that footnote concludes, "Everybody lies. Every narrator is unreliable."

Even big data, it seems, but that depends on how one interprets its multiple facets. And how much do fake news, bots, and hackers affect its "honesty"? Can this pool of Google searches be rigged to skew certain findings? The book does not appear to address any of this.

Nor does he trust many of us to finish reading the book: "No matter how hard I work on polishing my prose, most people are going to read the first fifty pages, get a few points, and move on with their lives." Maybe that's why, compared to all the information about data, the conclusion looks hastily scribbled, almost like an afterthought.

One can be easily swamped by all the revelations that support his argument: What we Google mirrors our true selves and can help us understand people better, but do we really want to? We reconsider our relationships with people, places, and the world at large anyway, from time to time. Some might feel they are being told what they might already know (eg, people can be horrible, and why they lie), except for the scope and intricacies of that knowledge.

Stephens-Davidowitz may not consider himself a focused author, but we can probably trust his work on big data, given his experience and reputation in this field, and how convincing (and perhaps a little biased) his case for it looks.

However, one should also bear in mind his advice to question everything one reads, online or offline. Mountains of information do not make a source, be it a database or a person, infallible. What we require is the wisdom to sift through all that data without letting it overwhelm us.

When we begin re-evaluating what we read, look for and wish to share online, the ever-growing mound of digital bread crumbs we leave in cyberspace will, hopefully, become a more authentic reflection ... of our better selves.

Everybody Lies
Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are

Seth Stephens-Davidowitz
Dey St.
338 pages
ISBN: 978-0-06-239085-1


Post a Comment

Got something to say? Great!