Another day, another computer program is waiting to take our jobs. This time, it's a bookish algorithm that claims it can predict the commercial success of fiction with 84% accuracy — something acquisition departments at book publishers do daily with, we're guessing, a much lower success rate.
So, how's this budding media mogul do it? By using a technique called stylometry, which mathematically analyzes words and grammar, it's pinpointed the characteristics of Amazon darlings. The top-line qualities are obvious: “interestingness,” novelty, style of writing, and an engaging plot. But, the specifics are surprisingly grammatical: Books that sell tend to load up on conjunctions, nouns, and adjectives and hold back on verbs and adverbs. Adverbs we get. After all, the road to hell is paved with them. But, verbs are sort of vital — like water, air, and the Internet. That said, if you have to use one, verbs that describe thought processes (recognized, remembered) over those of action and emotion (wanted, promised) are supposedly better. We're thinking unpleasant feelings about this.
Behind all this are computer scientists at SUNY Stony Brook who determined their 84% success rate by using Project Gutenberg, a digital library of free e-books. There, they were able to run their algorithm on 800 texts, which were all fiction but ranged from classic lit to recent sci-fi. They then compared findings to historical info on book sales, taking the least successful (and, therefore, harder-to-find) novels from the very bottom of Amazon's rankings. Also lumped in there was Dan Brown's The Lost Symbol because — despite its commercial success — it, well, inspired a lot of vitriol.
While this isn't the first attempt to apply science to the art of book-selling, it is supposedly the largest and most applicable. So, we'll just have to wait and see if publishers use it. If it's good for their struggling bottom line, great. But, with a newfound ability to mold prose based on what sells, it's the rest of the lines we're worried about. (The Telegraph)