An Imaginary Sabermetrics for Publishing
by Ver贸nica Gerber Bicecci, translated from the Spanish by Christina MacSweeney (Coffee House)
Although five books is most definitely a small sample size of throwaway proportions, out of the books that I鈥檝e written about for this weekly 鈥渃olumn,鈥 Empty Set by Ver贸nica Gerber Bicecci and translated from the Spanish by Christina MacSweeney is my favorite. I don鈥檛 know where it will stack up by the end of the year鈥攖here are a number of titles coming out this summer that I鈥檓 looking forward to, and as a gesture toward impartiality, I鈥檒l should really leave Fox, The Bottom of the Sky, The Endless Summer, and other Open Letter titles out of these evaluations鈥攂ut for now I鈥檇 put it ahead of The Perfect Nanny, In Black and White, Frankenstein in Baghdad, and Theory of Shadows. (And that is how I would rank them, one to five.)
As you can probably predict, I鈥檓 not going to write a full, well thought out review for this book. If that鈥檚 what you want, I鈥檇 highly recommend checking out Lisa Fetchko鈥檚 review over at the She breaks the book down really well, and even gets into a particular translation issue about the use of _ in place of _Yo(Y), which is also discussed in an afterword that will be of particular interest to translators or those interested in the translation鈥攐r editing of translations鈥攑rocess.
I鈥檓 going to use this book as an opportunity to write about something entirely different, but before I do that, I have two or three quick points.
1) I like the use of the charts in this book. I鈥檒l come back to this in a few different ways down below, but drawings such as this one鈥攚hich is preceded by, 鈥淗ere鈥檚 where this story ends,鈥 a statement that means more once you have reached the end鈥攊s what makes this book unique.
And obviously, all the Venn Diagram charts are why I initially chose to read this book. Who doesn鈥檛 like a Venn Diagram?! This is one statement about math and statistics that everyone can agree on.
2) In a way, this is The Perfect Nanny for an entirely different set of readers. Written to be a blockbuster, The Perfect Nanny includes a lot of techniques and tropes and literary moments designed to make a certain set of readers feel comfortably stimulated. The set of readers (R-1) who prefer linear plots, heavy character development, detailed settings, psychological tension.
Empty Set generates an equal amount of reading comfort in a different set of readers (R-2) who feel more at ease in a text of evocative fragments, acrostics, plots like puzzles, and characters whom you don鈥檛 feel obligated to relate to.
For both R-1 and R-2 these books are equally successful in their approaches. And R-1 probably doesn鈥檛 care for Empty Set (鈥渢oo confusing!鈥 鈥淚 couldn鈥檛 relate to anyone!鈥), and vice-versa (鈥淚鈥檇 rather see the movie鈥).
You could, I don鈥檛 know, draw a Venn Diagram of these two subsets of readers . . .
3) Not to take anything away from this novel, but wow have January and February been slow months for international literature. There doesn鈥檛 seem to have been anything buzzing on Book Twitter or Book Marks or in the blogosphere (doesn鈥檛 anyone say that anymore?) or at Winter Institute. I鈥檝e written about the drop in translations both of the past two months, but that was just focused on pure numbers, not quality or sales or impact or anything else. But looking back at what I have read, and forward to what鈥檚 on my docket, it feels like pretty quiet year so far.
Although I鈥檓 personally hoping this review of Madame Nielsen鈥檚 The Endless Summer changes that, this still feels a lot like the current situation in Major League Baseball鈥攖he slowest in all of history鈥攊n which no free agents are being signed and nothing at all is happening. There are so many interesting explanations for this situation in which several of the game鈥檚 best players are currently unemployed: it could be collusion, it could be that clubs have more advanced understanding of the value available in the free agent market, it could be due to the fact that 1/3 of the teams are tanking in 2018 and another 1/2 aren鈥檛 really in a position to do anything but tread water, it could be because of the new collective bargaining agreement and traditional big spenders (LA Dodgers, NY Yankees) trying to reset their competitive balance assessments by getting under the spending threshold for one year, or it could have God bless Scott Boras!1
Anyway, this combination of thinking about baseball (how to best build a team, player valuations, etc.) + reading a novel centered around set theory2 + a stray comment I made in an earlier post 鈥> an idea to try and create some core concepts for a sabermetric approach to the book industry.
This is an obvious building block. People usually value books based on how many copies they sold. 鈥淲e sold 10,000 copies!鈥 Or, 鈥淚t was a best-seller in Mexico!鈥
(Not to be confused with 鈥淧rint Run(PR),鈥 which is a number based in hope that signifies nothing more than the publisher鈥檚 wish to sneakily manipulate the bookseller market. Print Run(PR) is equivalent to Scott Boras鈥檚 bullshit stats packages for players like Eric Hosmer who are hoping to receive contracts that are far larger than the value they鈥檒l generate for their team. Print Runs(PR) are generally lies.)
Are sales really all that useful of a statistic though?
First off, the latter statement up there鈥攔epeated way too frequently in meetings with foreign agents鈥攊s crap. It鈥檚 descriptive, not objective, and lacks any and all context. How many books did this title beat out to become a best-seller? For how long was it a best-seller? How predictive is the Mexican best-seller list for a book entering other markets? Are the coefficients mapping it onto the French and U.S. markets radically different?
Another criticism: Sales in a vacuum takes into account none of the expenses involved with generating those sales. A book with a million dollar marketing budget that sells 100,000 copies is vastly different from a book that sells 100,000 based on a viral video that cost $.49 to make.
It also doesn鈥檛 take into account the list price of the book itself. It鈥檚 obviously way easier to sell 10,000 ebooks at $.99 than 10,000 hardcovers of a scholarly investigation into the sexual life of mollusks that lists for $149.
Sales is like batting average. A nice metric the average citizen can understand, but really not all that valuable.
Actually, that鈥檚 kind of a lie. Batting Average has values that most people can recognize as 鈥済ood,鈥 (.280) 鈥渁mazing,鈥 (.320) and 鈥渉all of fame.鈥 (.340+). What are the equivalents for books? If I tell the people sitting next to me at the bar that we sold 3,000 copies of a book, will they think that鈥檚 great? Or pathetic? Without a commonly accepted baseline鈥攁mong the larger audience, not just book nerds鈥攖his doesn鈥檛 mean a whole lot.
And it doesn鈥檛 take into account the idea that a book is more than its purchases. Thought experiment: Which is better? A book that sells 10,000 copies, 2,000 of which are read, with 10 readers capable of recalling the book one year later, or a book that sells 1,500 copies, 1,000 of which are read, with 200 readers taking this to the grave? (A: If you鈥檙e Big Five it鈥檚 the former, if you鈥檙e nonprofit the latter. There is no unified theory of sales.)
(Sales(S) x List Price(P)) x Readership庐 鈥 Fixed Operating Expenses(FOE) 鈥 Printing(PR) 鈥 Author Payment(AP) 鈥 Translator Payment(TP) 鈥 Marketing Costs(MC) = True Profit(RP)
OK, so this is two steps in one: I鈥檝e added in all the variables mentioned above (costs, list price), but then thrown in the idea of 鈥淩eadership庐鈥 to try and point at the fact that overall impact of a single printed book isn鈥檛 a one-to-one ratio with copies sold. On the most basic level, there are used copies. How many students a year buy used copies of The Great Gatsby for class? Or check it out from a library? A book鈥檚 true value, or 鈥淧rofit鈥 (capitalist term, I know), is always and forever greater than the number of printed copies.
We鈥檙e still missing a few things though: What about people who know about a book, yet don鈥檛 buy it? And what about the longevity of readership? It鈥檚 one thing to read Gone Girl and then keep on living, another to read Ulysses and have your life perspective changed. That Cultural Value(CV) isn鈥檛 captured here, and I鈥檓 not sure it ever can be quantified in this way. So let鈥檚 change tactics a bit.
((Expected Sales(ES) x List Price (P)) 鈥 ((Publishing Interest(PI) + Agent Status(AS)) 鈥 Total Expenses(TE))) ) = Cash Profit(CP) + Cultural Capital(CC)
If we really want to create a sabermetric approach to books, we have to look for exploitable inefficiencies in the marketplace. And my first inclination is that these inefficiencies come in two flavors: leveraging reputations against author advances and finding a way to decrease artist payments.
That鈥檚 not quite right though. Let me back up a bit and math this out.
In the early 2000s, there were no translations3 and there was a major gap between the best /most expensive translators (Margaret Jull Costa, Edith Grossman, Richard Howard, Gregory Rabassa) and everyone else. Without a middle class鈥攁nd without competition鈥攃ertain publishers saw an exploitable inefficiency. How much can you make when you pay $1,000 as an author advance, $1,000 to a grad student translator (鈥淗ey, yo, we鈥檙e gonna like, launch your career!鈥), and can get $3,000+ from foreign agencies desperate for American publishers to acknowledge that their literature even existed? In that situation, you can flip 2,500 sales into a decent amount of money. That is the dirty truth of translation publishing in the early part of this century.
Then things changed! International lit got more popular. Translators got organized. Now, the idea of going overseas to find the best books that no one knows or cares about is complicated by the two dozen new presses trying to beat you there, and the combination of ethical obligations in relation to translator payments and agent involvement in raising author advances (good in the short term, maybe, and probably not in the long term, but that鈥檚 its own metric), raised Total Expenses(TE) in an astronomical fashion. As well as altering the Agent Status(AS) (鈥淚 have the next Ferrante on my list . . . 鈥) and the Publishing Interest(PI) (鈥淲e鈥檙e starting a new press and want in on the hot trends, so which book is the one that鈥檚 going to get us critical attention AND be most readable by the (R1) readers of The Perfect Nanny?鈥). Increase the second half of the equation above while not changing the overall sales, and you鈥檙e going to kill your margins.
That doesn鈥檛 mean that publishers will stop pursuing books that are unlikely to earn back expenses. Look at Penguin paying a million dollars for a Knausgaard novel. There鈥檚 basically no way that he鈥檒l earn that back in straight sales. Same with Knopf and Javier Mar铆as. PRH can definitely expand the audiences for these authors, but there鈥檚 a ceiling. Even knowing that, they鈥檙e willing to go ahead because there鈥檚 a value just to having these names on your list. Reputation, cultural capital, whatever you want to call it, it鈥檚 part of this equation as well.
Expected Sales(ES) = Author Fans(AF) x Purchasing Coefficient(PC)
If someone were able to come up with an algorithm that was even 90% accurate in predicting sales, they would be in a position to basically print money. Long time readers鈥攐r anyone involved in the book word鈥攌now that publishers don鈥檛 really do any market research. Unlike movies, there is no pre-release tracking figures for blockbuster titles. Sure, you can 鈥渉ave a pretty good sense鈥 about how well a book is or isn鈥檛 going to sell, but outside of Harry Potter, James Patterson, and a handful of other brands, the error bars on predicted sales are really wide.
Past performance by the author and publisher are major indicators of how a particular title will sell, so maybe this is something that could be calculated . . . Throw in a few sensible metrics about the author鈥擳witter Followers(TF), Reviewing Connections(RC), etc.鈥攁long with some sort of figures about the publisher鈥擲ales Reps(REP), Average Reach(REA), Influencer Access(IA), etc.鈥攁nd maybe you can come up with some sort of prediction.
(Pace of Reading(PAC) x Length(LEN)) x (Character Connections(CC) x Plot Points(PP)) x Buzz(BUZZ) = Reading Desirability(DES)
Amazon鈥檚 metrics about how fast people read various books, where they tend to stop, which titles are most/least likely to be read in their entirety, etc., totally freak literary people out. There are a ton of Silicon Valley people who would love to create a program that would use some complex algorithm to churn out best-selling book after best-selling book without any author鈥檚 involvement whatsoever. They would flood the market with exactly what most people want, all more or less for free, and utilizing some sort of textual analysis that combines all the typical plot elements of popular books (hero鈥檚 quest, typical plot structure of rising action, climax, denouement) with other quantifiable elements (language level, sentence and chapter length, number of chapters) that have been found to keep readers engaged and flipping pages.
Take all that, mix in some BUZZ (readers want to feel like they have to read a book so as to not be left out) and you can figure out how likely a book is to appeal to a wide audience.
Turnover(TO) x Cash Profit(CP) x Hipster Quotient(HQ) = Indie Stock(IND)
Bookstores actually have the ability to come up with a ton of different measurements, depending on what they want to track or evaluate. Sales per linear foot in given sections. How fast different subjects turn over. Average amount spent by a customer. Frequency of returning customers. There鈥檚 tons of data sitting right there that could be analyzed in a totally straightforward fashion.
But indie stores aren鈥檛 necessarily about efficiency in the way Barnes & Noble or Amazon would like to be. Part of their reason for being is tied to having the books that you don鈥檛 always find at the big box stores, at pushing a sort of aesthetic agenda that sets them apart. If, as a store owner, you could always know which books will both increase your coolness factor with your clientele and sell with the necessary velocity to keep you paying your rent, you鈥檇 be in the best spot possible. This might seem intuitive, but I think it can be a bit more complicated depending on how you value your reputation. For example, you may not want to carry Fifty Shades of Gray because you have standards, but that means you鈥檙e leaving a lot of money on the table. And carrying too many different titles that sell one time a year, yet make you seem like the smartest bookstore around, is a recipe for closure. Figuring out that balance鈥攁nd which books maximize Cash Profit(CP) and Reputation(REP)鈥攚ould be ideal.
Besides, a lot of this calculus is already done on a daily basis by most everyone. Even though it鈥檚 not quantified in a sortable, sharable way, people are constantly making these sorts of decisions. They may not think about them quite as honestly as they should though, and maybe something like a set of publishing sabermetric ideas could help publishers and stores be all that they could be. It鈥檚 fun to come up with various calculations, mostly because it makes you think about what you鈥檙e actually trying to measure, and why the measurements you might already have fall short. It can help define your mission, and by working in various intangible benefits, you can better justify various investments or decisions.
1 For anyone not willing to click through (and good on you!), here鈥檚 the amazing quote from super-agent Scott Boras:
The off-season is like the America鈥檚 Cup. We have 30 boats in the water. They take off and eventually they get to the free-agent docks. Normally, there are trade winds, and there are economic investments in the capacity of the boat, which allow those boats to get to the appropriate free-agent docks.
This year, there was a detour to Japan, where there was a $250 million asset available for $3 million (Ohtani). All boats went to Japan. Then they sailed back a good distance. They came to Florida and found a sinking ship and all of its cargo was in the water (Dee Gordon, Giancarlo Stanton, Marcell Ozuna, Christian Yelich). All teams tried to load it on their boats.
That took additional time. Then, as they moved forward to the free-agent docks, they found other ships dumping cargo鈥擯ittsburgh and Tampa Bay and a few others鈥攚hich then slowed their arrivals to the free-agent docks. So, trade winds, Japan, shipwreck in Florida, more cargo-spewing, all those things artificially delayed the arrivals to the free-agent docks.
Sorry, I have no idea鈥攂ut I love it! More literary agents need to go off the rails when making random comments about the books they鈥檙e trying to auction. That would liven up book journalism!
2 Representative bit from Bicecci and MacSweeney鈥檚 Empty Set:
There isn鈥檛 much documented evidence of this, but during the military dictatorship in Argentina, teaching basic set theory was prohibited in schools. We know, for example, that a tomato belongs to the tomato(TO) set and not to onion(ON) or chilies(CH) or coriander(CO). Where鈥檚 the threat in reasoning like that? In set theory, tomatoes, onions, and chilies might realize they are different foodstuffs, but also that they have things in common, like the fact that they can all belong to the fresh hot salsa(FHS) set and, at the same time, to the Universe(U) of cultivated plants(CP), and might perhaps unite against some other set or Universe(U); for example, that of canned hot salsa(CAHS). In short, a community of vegetables. Venn diagrams are tools of the logic of sets. And from the perspective of sets, dictatorship makes no sense, because its aim is, for the most part, dispersal: separation, scattering, disunity, disappearance.
3 My sabermetric principles apply to BOOKS in general, not just translations, but I want to focus on exploiting this market since it might explain what鈥檚 going on in 2018 with the weird decrease in translation publications.
Although! Let me promise the four of you reading this that next month I鈥檒l run some three- and five-year rolling average stats to avoid comparing 2018 to the Best Year Ever. I鈥檝e been statistically irresponsible and I know it. Sorry.

[…] 鈥擟had W. Post, “An Imaginary Sabermetrics for Publishing“ […]