Creative people know any field where Intellectual Property is the primary value of something is going to be treated subjectively like the customer. It’s one thing to have a plumber who can either fix a leaky pipe properly and on time or who can’t. But writing a book, making a movie or music video, or a card game are totally subjective. These tend to have the very top 1% making a lot of money from their IP, maybe another 3% earning solid money from their IP, and everyone else is just doing it for the passion.
For years people have wondered though, why do some people make so much more money than others? How come one idea takes off, while 20 similar ideas don’t? Two researchers tried to crack that code this week with their new book, The Bestsellers Code. I’ve added my reactions to help explain what they mean:
Back in the spring of 2010, Stieg Larsson’s agent was having a good day. On June 13, The Girl Who Kicked the Hornets’ Nest—third in the series from a previously unknown author—debuted at number one in hardback in the New York Times…
The following month Amazon would announce Larsson was the first author ever to sell a million copies on the Kindle, and over the next two years sales in all editions would top seventy-five million. Not bad for an unknown political activist—turned-novelist from a little Scandinavian country, especially one who had chosen a rather uncharming title in Swedish and had written some brutal scenes of rape and torture.Men Who Hate Women—or The Girl with the Dragon Tattoo as it was renamed in English—was the sensation book of the year in more than thirty countries.
The press didn’t understand the success. Major newspapers commissioned opinion pieces on what on earth was going on in the book world. Why this book? Why the frenzy? What was the secret? Who could have known?
Answers were lackluster. Reviewers scratched their heads about it. They found fault with the novel’s structure, style, plotting, and character. They groaned over the translations. They complained about the stupidity of the reading public. But still copies sold as fast as they were printed—whether you were in the UK, the U.S., in Japan, or in Germany; whether you were male, female, old, young, black, white, straight, or gay. Whoever you were, practically anywhere, you knew people who were reading those books.
That doesn’t happen very often in the book world…The level of sales his trilogy achieved without even the backing of its author was supposedly just unfathomable. Freakish. Unpredictable.
Let’s consider some numbers. A company in Delaware called Bowker is the global leader in bibliographic information and the exclusive provider for unique identification numbers (ISBN) for books in the U.S. Their annual report states that approximately fifty to fifty-five thousand new works of fiction are published every year. Given the increasing number of self-published ebooks that carry no ISBN, this is a conservative number. In the U.S., about 200-220 novels make the New York Times bestseller lists every year. Of that…even fewer hit the bestseller lists and stay there week after week to become what the industry calls a “double-digit” book. Only handfuls of authors manage those ten or more weeks on the list, and of those maybe just three or four will sell a million copies of a single title in the U.S. in one year. Why those books?
Traditionally, it is believed that there are certain skills a novelist needs to master in order to win readers: a sense of plot, compelling characters, more than basic competence with grammar. Writers with big fan bases have mastered more: an eye for the human condition, the twists and turns of plausibility, that rare but appropriate use of the semicolon…But when it comes to the kind of success involved in hundreds of thousands of people reading the same book at the same time—well, unless Oprah is involved, that signals the presence of a fine stardust that’s apparently just too difficult to detect. The sudden and seemingly blessed success of books like the Dragon Tattoo Trilogy, Fifty Shades of Grey, The Help, Gone Girl, and The Da Vinci Code is considered very lucky, but as random as winning the lottery.
So these guys are essentially admitting that publishers have no idea how to identify a bestseller, right? And that there’s a lot of random chance in why one book is “it” and 5,000 other books similar to “it” just don’t have “it.”
The bold claim of this book is that the novels that hit the New York Times bestseller lists are not random, and the market is not in fact as unknowable as others suggest. Regardless of genre, bestsellers share an uncanny number of latent features that give us new insights into what we read and why. What’s more, algorithms allow us to discover new and even as yet unpublished books with similar hallmarks of bestselling DNA.
There is a commonly repeated “truth” in publishing that success is all about an established name, marketing dollars, or expensive publicity campaigns. Sure, these things have an impact, but our research challenges the idea it’s all about hype in a way that should appeal to those writers who toil over their craft. Five years of study suggests that bestselling is largely dependent upon having just the right words in just the right order, and the most interesting story about the NYT list is about nothing more or less than the author’s manuscript, black ink on white paper, unadorned.
Using a computer model that can read, recognize, and sift through thousands of features in thousands of books, we discovered that there are fascinating patterns inherent to the books that are most likely to succeed in the market, and they have their own story to tell about readers and reading. In this book we will describe how and why we built such a model and how it discovered that eighty to ninety percent of the time the bestsellers in our research corpus were easy to spot. Eighty percent of New York Times bestsellers of the past thirty years were identified by our machines as likely to chart. What’s more, every book was treated as if it were a fresh, unseen manuscript and then marked not just with a binary classification of “likely to chart” or “likely not to,” but also with a score indicating its likelihood of being a bestseller. These scores are fascinating in their own right, but as we show how they are made we will also share our explanation for why that book on your bedside table is so hard to put down.
Consider some of these percentages. The computer model’s certainty about the success of Dan Brown’s latest novel, Inferno, was 95.7 percent. For Michael Connelly’s The Lincoln Lawyer it was 99.2 percent. Both were number one in hardback on the NYT list, which for a long time has been one of the most prestigious positions to occupy in the book world. These are veteran authors, of course, already established. But the model is unaware of an author’s name and reputation and can just as confidently score an unknown writer. The score for The Friday Night Knitting Club, the first novel by Kate Jacobs, was 98.9 percent. The Luckiest Girl Alive, a very different debut novel by Jessica Knoll, had a bestselling success score of 99.9 percent based purely on the text of the manuscript. Both Jacobs and Knoll stayed on the list for many weeks. The Martian (before Matt Damon’s interest in playing the protagonist) got 93.4 percent. There are examples from all genres: The First Phone Call from Heaven, a spiritual tale by Mitch Albom, 99.2 percent; The Art of Fielding, a literary debut by Chad Harbach, 93.3 percent; and Bared to You, an erotic romance by Sylvia Day, 91.2 percent.
These figures, which provide a measure of bestselling potential, have made some people excited, others angry, and more than a few suspicious. In some ways that is fair enough: the scores are disruptive, mind-bending. To some industry veterans, they are absurd. But they also could just change publishing, and they will most certainly change the way that you think about what’s inside the next bestseller you read.
We should make it clear that none of the books we reference were acquired based on our model’s figures, and figures, beyond the ones you’ll read about here, have never been formally shared with any agent or publishing house. We should also be clear that these figures are specific to the closed world of our research corpus, a corpus we designed to look like what you’d see if you walked into a Barnes & Noble with a wide selection to choose from. Agents and editors do a good job of putting books in front of consumers—it’s not as though we are short of things to read. And some individuals in publishing have a particular reputation for the Midas touch. But remember that the bestseller rate in the industry as it stands is less than one-half of one percent. That’s a lot of gambling before a big win. Note, too, that year after year, the lists comprise the names of the same long-standing mega-authors. Stephen King is sixty-eight. James Patterson is sixty-eight. Danielle Steel is sixty-eight. As much as fans are still thrilled by another new novel from one of these veteran writers, it is telling that the publishing world has not discovered the next generation of authors who will similarly enjoy thirty to forty years of constant bestselling. Nor did the industry find, despite the thousands of manuscripts both rejected and published annually, a runaway bestseller for 2014 (Dragon Tattoo, Fifty Shades, and Gone Girl had been the standout hits of previous years), and neither did it publish a manuscript to impress the Pulitzer Prize committee in 2012. Why?
Well, it is a universal wisdom that bestsellers are freaks. They are the happy outliers. The anomalies of the market. Black swans. If that is the truth, then once you find a bestselling writer, why put your money anywhere else? Why put your millions on a new twenty-year-old writer instead of Stephen King? How could you possibly know if a new literary author is worth the sort of investment worthy of a future big-prize winner?
If you review their computing model, you find they only review bestselling books against each other. They don’t compare the bestsellers to every book ever published, or even a random collection of 10,000 indie books or 10,000 traditionally-published books NOT on the bestsellers’ list so it’s unclear why bestsellers do better than non-bestsellers, according to their algorithm.
The conventional wisdom has always been that it’s random chance, and so publishers and agents take total guesses as to what book will become a bestseller in the future. Because of this, they look for authors who fit the ‘profile’ they’ve built up: someone between the ages of 25-39 with a killer manuscript and either a) a big following already (celebrities) or b) the chance to gain one super fast. The MS fits the common themes assigned for that particular genre but also has a twist on it so it seems fresh and exciting and can be promoted in 18-24 months when the book comes out (assuming it does). The public at large, who may prefer different novels, will love it as much as librarians.
I personally have an theory it’s the AUTHORS, not the books, that drive sales. You get an author willing to take risks and to become a lightning rod for attention (and controversy) and I think sales will increase over authors who just have good stories. I would love to prove the conventional wisdom wrong- quit look at the Manuscripts, agents! But then again, I’m a lowly writer with 43k Wattpad views and no fiction sales ever, so what do I know, according to them?
Well, at least we can agree on something- no one knows exactly how to build a bestseller. The difference is, I’m willing to bet that I can create them without needs for silly algorithms or guesswork.