(Subtitle: What would Google do?)
(Series title: A spam-filled, junk-rich, keyword-dense post of suspense, intrigue, mystery and spam, spam, spam, spam, spam.)
You know the issue. You enter a perfectly ordinary search term into Amazon – a term that seeks a quality answer to a logical question – and what do you get? Do you get a carefully built list of results that offers a strong answer to the question you arrived with? Or do you find the results a bafflingly hard-to-read collection of titles, only some of which look like books of real quality?
If you don’t know what I mean, try it. Enter “crime thriller” or “suspense” or any other broadly thematic search term into Amazon’s search bar. (I’ve mostly used examples from Amazon.co.uk in this piece, but the issue is the same on either side of the Atlantic.)
What you’ll find is a set of search results that seems to be dominated by weirdly titled books, overadorned with so many subtitles and series titles that the text becomes almost impossible to read. And while I’m totally fine with indie authors being well represented in Amazon-search, you can sometimes find it hard to find any traditionally published title amidst the results – and it’s just not plausible that excluding all trad books is the best way of being of service to the curious browser.
What’s more, I suspect this problem is getting worse not just because Amazon permits the junk but – much worse – because it encourages it.
Now I know that, by law, every second post on the Internet these days has to bash Amazon in some way, and plenty of those Amazon-bashers would criticise the firm even if it secured world peace, a cure for malaria and an answer to The Donald all in one fell swoop. So let me be clear. I like Amazon. I think it does good things for authors and I think it’s done some terrific things for readers too. How you can criticise a firm that makes all the books in the world available to all the humans in the world at what are perfectly competitive prices – well, I just don’t understand that. And how authors can attack a firm that allows them (a) to publish their work to all the readers of the world, (b) to do so for free, (c) without tying them into some life + 75 years type of contract, and (d) to pay royalties of 70% throughout – well, I’m just not smart enough to understand why that’s terrible either. Maybe I’ll figure it out one day.
But till that day comes, I’m going to go on liking Amazon and I proudly and happily use the firm to self-publish my work. (Though, by the way, I’m also working my way through my sixth multi-book deal with a Big 5 firm, so I don’t hate traditional firms route either. They give me a good living too.)
So, all that said, let’s get stuck in.
The problem we’re dealing with is simple. There’s too much junk on Amazon. WAY too much. And it’s not just one issue; it’s a whole constellation of them. But let’s see if we can at least pick out some of the major sub-types.
Problem 1: Books miscategorised in error
The graphic above shows a screen capture that advertises The Holy Roman Empire, a masterpiece of historical writing produced by the Chichele Professor of History at Oxford University. It’s a fine book, by any standards, and Allen Lane have (typically) done extraordinarily well to get such a beefy book right up on high on the overall bestseller lists despite weighing in at a challenging 1000 pages and £30 [$45] cover price.
But take a closer look at those bestseller lists. The historical one I get, but “Fitness & Exercise > Yoga”? The HRE book has, I’m sure, a thousand virtues, but telling you about yoga is surely not one of them. The screengrab immediately below gives you another such example of mis-categorisation – almost equally comic in its inappropriateness.
Now, to be clear, these errors aren’t Amazon’s. They come from the publishers and, in both cases, the publishers are Big 5 firms who should know better.
But that’s to pass the buck. This is Amazon’s store, and these are Amazon’s webpages and bestseller lists. It’s fine to make use of what is effectively user-content, but user-content will inevitably have errors in – and it’s down to Amazon to eliminate, or at least vastly reduce, those things.
Nor would it even be hard to do. Instead of just accepting whatever BISAC codes (basically library categorisations) the user supplies, Amazon could run them through an automated checker that would query any combination of terms that seemed implausible. Medieval History and Yoga? Are you sure? Bonkbusting romance and literary fiction? Are you really sure? Can you double-check this?
In most cases, simply asking the user to verify those oddities would be enough. But Amazon doesn’t ask, doesn’t check. The mistakes go through. The bestseller lists gets muddled with nonsense. That’s problem one.
Problem 2: Books miscategorised on purpose
In the examples just above, I presume that the publisher simply made a mistake, but there are occasions where I suspect something more deliberate is going on. Take a look at the screengrab below: Amazon’s current best sellers in the “Crime, Thriller & Mystery Series” category.
And note, this is not a general Crime, Thriller and Mystery selection, this is category reserved for series novels in that broad area. A series is a sequence of interconnected novels, usually featuring the same protagonist. That’s not hard, is it?
But look at that first result. The very top result on the list. Behind Closed Doors describes itself as a “gripping debut.” No doubt the author will go on to write more novels, and perhaps those novels will form a series, but an isolated debut novel should NOT feature in a bestseller list devoted to series work. It’s not part of a series.
Similarly Kill me Again is not a series novel. The Girl You Lost is not a series novel. No Coming Back is not a series novel. Emma Donoghue’s Room is absolutely not a series novel, or ever will be. In fact, not one of the top ten books in the “Crime, Thriller and Mystery Series” forms part of a crime, thriller or mystery series. That’s a shocking fact.
And it’s not just the bestseller list itself. The “most wished for” novels includes Clare Mackintosh’s fine debut (and definitely-not-a-series) novel, I Let You Go. The “most gifted” section includes the equally-definitely-not-in-a-series book, The Lie.
Oh, and since we’re at it, number nine on the actual bestseller list is the same as number one. Number seventeen is the same as number six. Yes, the entries relate to different formats but – duh! – readers know that Amazon offers books in every format. They don’t need to clog up a bestseller list to tell us that.
In short, we have a whole torrent of errors and inaccuracies on a single page, ALL of which could be rejected by a simple automated test, “Is this book part of a multi-book series, Yes or No?”
That’s hopeless. And note this: although one or two of the books on this page could be there as a result of publisher error, you have to believe that the errors are so widespread for another, more concerning, reason, namely that the publishers chose to miscategorise their books, presumably in the belief that it was easier to get to the top of this bestseller list than some others (eg: thrillers, mystery, police procedurals.)
Now that’s speculation but, if I’m right, it’s implying two things. First, Amazon is allowing itself to be gamed (by big publishers and indies alike; both sides are active here.) Second, the way Amazon structures its search results, actively encourages that gaming, by pushing sales towards those who would cheat. And this, remember, takes place in a category where a simple, automated, yes/no test would eliminate such cheating.
Problem 3: Spammification of subtitles
The issues we’ve looked at so far are relatively mild in terms of what’s about to come.
And you already know the kind of thing I have in mind here. You come across it all the time: ever-growing thickets of keywords jammed into subtitles and series titles.
Allow me to illustrate with this fine book, depicted left. What we have here is a book with the one-word title, “Ruthless”, a perfectly fine title for a thriller. I don’t really love the cover, but maybe some people do. I don’t have to like everything. The cover, left, is completely acceptable.
And note this. The book has no subtitle, which is hardly surprising because, in the real world, thrillers don’t actually have subtitles. Academic books often do. Plenty of serious non-fiction does. Thrillers essentially never do. (In the US, publishers often like slapping the phrase “a novel” onto what is already, obviously, a novel. I don’t know why they do that – perhaps they worry that buyers might mistake the product in question for a can of fish paste or a crate of apples – but in any case the phrase “a novel” hardly constitutes a subtitle.)
But just take a look at how Amazon describes this book:
That’s right. If you dig around amidst the junk verbiage there, you can find the one word – Ruthless – which we dummies foolishly took to be the title of the book. But in Amazon world, anything goes. If you want a title/subtitle combo that is nothing more than a mish-mash of keywords bundled together without regard for sense, punctuation, or any desire to convey meaning, then Amazon ain’t gonna stop you.
Indeed, it turns out that if you want ludicrous spelling mistakes in your title, Amazon ain’t gonna stop you. If you want random repetitions, capitalisations, parentheses – well, hell, Amazon won’t stop that either.
The weird thing – I mean, the even weirder thing – is that this kind of game is totally against Amazon’s own rules. Those rules tell you: (1) the title has to appear on the cover, (2) the subtitle has to appear on the cover. This book utterly, clearly, brazenly breaks those rules. Of the nineteen words that constitute the ‘title’ and ‘subtitle’ combo, only one appears on the cover at all.
And again, to be clear, Amazon could easily check that its rules were being met. Computers can now scan images for text with some ease. And remember, titles and subtitles are meant to be legible. They should not be able to evade machine-capture. So if Amazon’s computers come across an image that does not appear to contain the text included in the title / subtitle, they should simply reject the submission – or at least force the question over to human adjudication. (I’d suggest too that if there was a penalty for abuse – no KDP upload for six months, let’s say – then those abuses would almost instantly vanish.)
Yet Amazon chooses not to act. Why?
Problem 4: spammification of titles
Ruthless was bad, but it’s not the worst. At least the darn book did have its main title word prominently on the cover. But in Amazon world, even that doesn’t have to be the case. Take a look at this cover:
What, dear reader, is the title of this book? I don’t know about you, but to my mind the title of that book is The Grave Man. There isn’t, in fact, any subtitle that I can find – this is a thriller and in the real world thrillers don’t have subtitles – but if we we allow that any text on the cover (aside from title and author name) can constitute a subtitle, then perhaps this book is something like “The Grave Man: the author of the Sam Prichard novels”. That is woefully clunky, I grant you, but at least it’s an allowable kind of clunky within these strange new rules of ours.
Only that’s not how Amazon renders this book. According to Amazon, this book is actually:
In other words, not only does the subtitle not appear on the cover, the title doesn’t appear on the cover either. The book’s title is “Mystery” and it doesn’t even appear.
Now, in a way, fair play to the author, David Archer. He saw that he could game Amazon’s feeble algorithms by using a common search term (“Mystery”) as his title, but he didn’t actually want to title his book that way. So he calls his book “The Grave Man”, develops the cover he wants, then just plugs his dummy title into Amazon’s KDP system and he’s off and away.
I assume that Archer’s technique works. I only found his book because I entered “Mystery” as a search term on Amazon.co.uk and Archer’s book was the first result to appear. So he’s getting sales and – though the technique feels spammy to me; it isn’t one I would use myself – sales are sales and readers are readers. In a venomously competitive battle for readers, I don’t really blame Archer for pulling whatever stunts he can.
But I do blame Amazon. What the heck is it thinking? It’s now OK, is it, for people to purloin obviously popular search terms as titles and then display their total contempt for Amazon and its users by not even putting that title on their book? I think it’s obviously not OK. It damages and belittles the reader-experience, and it damages and belittles Amazon’s awesome brand. The firm is – or aims to be – a lot, lot better than that.
Problem 5: let’s all be bestsellers
In an online world, it makes a ton of sense to have multiple bestseller lists, categorised by theme and subject. Why not? It’s not often that a history book will top a regular bestseller list, or a genre romance novel, or a travel book. But readers might well want to browse recent and successful titles in any of those areas, so it makes good sense to supply them. And of course, though “travel” is a relatively niche genre, there are still lots of good travel books being written, being sold and being read. A travel-only bestseller list makes perfect sense. And good for Amazon for creating and maintaining such a list. It was a great and welcome innovation.
Except that its lists are so very fine-grained that plenty of them make no sense at all. Brent Underwood recently proved the point by uploading a photo of his foot as a book, then getting three friends to buy it (at $0.99) . . . and triumphantly found himself at the top of not one but two bestseller lists (Psychology / Transpersonal and Social Sciences / Freemasonry).
Now there are multiple problems here. There’s the first issue we mentioned: when two wholly disparate book categorisations are chosen, the chances are it’s an error (as with The Holy Roman Empire) or a ploy to game the system. Either way, Amazon should at least query the selection, but it doesn’t.
A second issue is that of thin content. Why did Amazon not challenge the almost total absence of content in Underwood’s book? Why did he not get a message saying, “Sorry, we’ve checked your content and it appears thin and of little value to readers. Please reconsider your material and resubmit when ready.”? Clearly, he should have done. Readers would have benefitted.
But the biggest issue is the one Underwood centres on. With bestseller lists as absurdly specific as “Social Sciences / Freemasonry”, any damn book can become a bestseller. Pretty clearly, Social Sciences does deserve a bestseller list of its own. Equally clearly the section of that list devoted to Freemasonry (and in fairness to other secret societies too) is too abstruse for any bestseller list to be meaningful or helpful.
Why have those lists? Why allow people to claim, with perfect truth, that they are #1 bestsellers on the world’s largest and most popular bookstore, when you know perfectly well that the number of sales needed to get them there is pitifully small.
Have multiple lists, by all means, but keep them broad enough that only books that are actually bestselling get to call themselves bestsellers. The current system just devalues what should be a hard-to-achieve accolade.
Problem 6: Even the good turn spammy
This final problem is almost the worst of the lot. Because of the abuses proliferating in so many parts of Amazon’s search system, even genuinely good outfits feel compelled to play the I-can-spam-more-than-you game.
Take this book (left) published by Bookouture, an excellent digital-only British publisher.
The book has been as high as #2 in the overall Amazon.co.uk charts and may even have been to #1.
The book cover is classy. I don’t like that subtitle-that-isn’t-a-subtitle thing, but I’m a bit old-fashioned that way. Certainly, there’s no reason not to describe the nature of the book directly on the cover and “A gripping serial killer thriller” is a concise, useful guide to the content within. So thus far, fair enough.
But how does Amazon describe this book? Including title, subtitle, and series title, the book becomes:
And already, that feels like more verbiage than the eye can easily take in. A bundle of keywords mashed repetitiously together. If it were just title + subtitle, that would deliver a clear and possibly helpful message. If it were just title + series title, that could also be useful. (Except, hold on, there isn’t a series yet. Yes, the author concerned is, as I understand it, contracted to write a further two books in this series, but as yet those books are not available for sale, so this is not a series, and it should not have a series title.)
But let’s leave aside the issue of series titles for books that don’t constitute series. Let’s also leave aside the little question of why this serial killer novel book is flagged as a #1 seller in Sociology:
Let’s leave all this aside, and just focus on this one fact:
Bookouture is a really good publisher, with strong editorial standards, beautiful cover design, proper copyediting skills, excellent commercial success, a strong following on social media and on email. They’re not spammers. They’re not fly-by-nights. They’re not isolated individuals trying hard to make a buck.
Yet despite this, Bookouture feels forced into playing the spam-you-more game. And for that, I blame Amazon. If that firm sets the rules so that piling keywords into title/subtitle/series title works, then staying out of that game will cost a lot of money. For a digital only firm like Bookouture, it could be the dividing line between success and failure. The result of Amazon’s broken rules and poor enforcement of the rules it has is that even decent firms turn spammy.
* * * *
So much for the problems. (I’m not suggesting this is a comprehensive list, by the way, just that it does indicate the scale of the issues.)
I want to turn now to two related questions. First, what would Amazon do? Next, what would Google do?
What would Amazon do?
That sounds like a strange question, doesn’t it? Haven’t we just looked at what Amazon does?
Yet it’s worth noting that Amazon Publishing itself behaves as primly as any fustily traditional print publisher. Here, for example, is a typical Apub book listing:
No series title.
The book description is terse and accurate. Beyond the (accurate, helpful) use of the term “thriller” in the first line, there’s no attempt to pack the book description with search terms.
In short, the listing is completely clean. Completely spam free. Entirely helpful to the reader. The listing actually rejects the tools that countless publishers – indies, digital only, and big traditional firms alike – use to game the system.
And that, I think, tells you more about the core values of the firm than the profusion of spam does. The fact is that Amazon Publishing loathes spam. It maintains a set of values similar to those of print publishers and similar to those that you and I respect as readers. By not playing the spam-you-more game, it shows it would rather make less money than sacrifice its principles. Good for it. I respect that.
What would Google do?
So much for Amazon. Let’s shift focus a moment and turn to Google. If that switch seems jarring, just bear in mind that at the heart of Amazon’s store is a search-engine – one whose job it is to respond to your query with a page full of relevant search results. Google is quite good at that search-engine game, so it’s worth considering how that firm might approach the same problem, if it were placed in charge.
Google’s approach to search has three critical foundations:
- It seeks quality and authority above all
- It is relentlessly innovative
- It is human-led
That last point may raise eyebrows, but it’s true all the same. Yes, Google’s search algorithms are automated, of course, but Google constantly monitors the quality of its search results against careful human evaluation of the authority and quality of the webpages that are turned up.
So human evaluators – carefully trained in their role – are asked to evaluate a whole set of webpages / sites in a particular niche for such things as authority, clarity, easy availability of information, quality of user experience, and much else. If Google’s automated search results are misranking the websites in response to a given search, Google will try to improve their algorithms to improve the match. The automated algorithms are constantly striving to meet goals set by intelligent, trained, well-resourced and human judgements.
Because technology changes, because user-requirements change, and probably because Google’s own understanding of site quality evolves, the search algorithm is never still. The firm is unrivalled in web search and – Bing me no Bings – is still streets ahead of the competition.
Now supposing you applied the same kind of thinking to Amazon’s search results, what would you get? I’d suggest that you’d get a search engine that would think hard about three factors.
- Relevance. Of course it matters that search terms are ‘relevant’ to a particular query. So if you ask for ‘women’s fiction’, you want to find women’s fiction. But since subtitles and the rest are so easy to fill with junk, you will have to capture your signals of relevance from elsewhere. Hard-to-game relevance signals would include (a) formal reviews of the book, (b) user reviews of the book, (c) BISAC categorisation of the book, and (d) the ‘Customers Also Bought’ metric. An intelligent combination of those signals would give you an impossible-to-game method of determining relevance.
- Quality. Even though Amazon’s default ranking for search is ‘relevance’, that doesn’t – and shouldn’t – mean quite what it says. Supposing that you wanted ‘Cold War Spy Fiction’, there might be, say, five thousand titles that offered you precisely that. Given that relevance signals might, for those five thousand titles, be so tightly bunched that minor differences meant essentially nothing, you’d want some other way to push the ‘right’ solutions to the top. Since the ‘right’ solution here would certainly include John Le Carre, you’d have to find some way to generate signals of quality. Those signals might include sales success, formal reviews, informal user-reviews, quality of publisher, prizes won or shortlists achieved, and so forth. Again, you can’t game those things or not really, not easily.
- Sales. I’ve not forgotten that Amazon is a retailer, not a library service. It does and should think about sales, and so it’ll want to pop newer titles and more strongly selling titles up to the top of the lists too. And that’s fine. If books rank similarly for relevance and quality, then Amazon should certainly serve its own interests – and, indeed, the reader’s interests – by promoting the books that readers are more likely to buy.
And that’s not hard, is it? None of that is particularly hard to achieve.
What baffles me is why Amazon has allowed its search results to clog up with as many problems as have been listed in this post. Some possible answers are:
A) It knows about the issue and is working on it.
B) It’s just boobed. It’s taken its eye off the ball.
C) It’s found that its sales benefit from its current approach and that’s all it cares about.
D) It knows that indie-publishers are more ready to game the whole title/subtitle thing than trad-publishers and Amazon actually likes the way its current system disadvantages the latter.
Because I think the firm is hellishly smart, I tend to discount the first two of those explanations.
The third explanation – the sales issue – I also question. I mean, yes, maybe somewhere in Amazon HQ there’s a stat which says that books like Ruthless or Mystery: The Grave Man outsell their traditionally published equivalents . . . but even then, there’s a question about Amazon’s longer term profile and success. Does the firm want to be known as a place that promotes junk over quality? I seriously doubt it. I just can’t see that a reputation of that sort could be to the firm’s long term advantage, and no firm has longer term horizons than Amazon itself. (And of course, though Amazon’s market share implies monopoly, its most important ebook competitors comprise Apple and Google, who are currently battling it out for the title of world’s most valuable firm. I’m certain Amazon is not complacent, but if it is, it shouldn’t be.)
So that leaves the fourth explanation. I’m not sure it’s the correct answer – but what if it were? If what we’re seeing is a deliberate attempt to enfeeble the traditional industry?
Well, I think that would be depressing. It would feel like a teenager kicking out at parental authority. And yes, Amazon, we know you’re innovative. We know you’re disruptive. We know about your huge list of achievements that go far beyond remaking old-fashioned bookselling and include such things as AWS cloud computing, the e-reader, the invention and regularisation of the ebook market, and so much else.
And, dammit, if fighting with trad publishers is your thing, then I don’t even really have a problem with that. Giants hitting each other with sticks: that kind of game is no real concern of mine. But don’t clog your search results with shite, Amazon. Respect the product. Respect the reader. Let’s de-spam Amazon search. Please.
That’s what I think, but what do you think? Let me know in the comments below. I want to hear from you . . .