Surprise, surprise! The marketing value of positive on-line user-reviews has created an industry of liars-for-hire, ready to pimp any product with phoney “user” endorsements. Hotels, cafs, publishers, music labels and (horror!) even bloggers can readily find cut-price on-line testimonialists to pimp their product on sites like fiverr.com.
Unfortunately, there’s a good chance we’ll fail to identify the opinion spam lurking among the legitimate reviews. But the discovery of a highly effective (89% accurate) method for weeding-out the junk, using statistical text analysis, provides some clues to bear in mind while scanning the review sites.
Whether I’m looking for a hotel or a piece of software or buying a book or some music on-line, I habitually scan both the top and the bottom ratings on App stores, Amazon, Trip Advisor … to get an idea of what people loved and hated. I find myself weighing the number of ‘stars’ and the distribution of good and bad reviews. In principle, it’s always been a good strategy to rely on the opinion of other consumers. One of the real advantages of shopping in a market rather than a store, for instance, is that you get clues about quality or value just by following the crowds. It might be an even better strategy for on-line shopping given the Web-driven splintering of brand as a guide to quality and satisfaction.
Although I consider myself pretty canny (doesn’t everyone?) about the weight to give these, often anonymous, on-line reviews I don’t doubt they influence my choices. So the dilution of their information value by opinion spam is a serious problem made worse by difficulty of distinguishing junk and honest opinion. There are two problems: a lot of the spam looks just like a legitimate review and we’re probably not on our guard against it.
We’re unlikely to recognise the disguised porkies because we all have a “truth bias”. We’re inclined to take what we read or hear from others at face value because, at the limit, human societies just wouldn’t work if our scepticism about the motivation of others were unbounded. As potential consumers, this bias is abetted by the context: we’re ready to be skeptical of the quality of the product but, almost as a counter-weight, we’re less wary about the good faith of reports from other “consumers”.
The “marketing-savy” consumer, by definition, has narrower bounds to her credulity. But the clues that might flag a phoney review are far from obvious. It turns out, in fact, that most are completely obscure unless seen through the lens of statistical and lexical analysis of large texts. The markers have been identified in a fascinating paper in computational linguistics by Ott, Choi, Cardie and Hancock.
The researchers carefully built a library of opinion spam written by workers from Amazon Mechanical Turk that they mixed into a database of known “good” opinion to create a research tool that could be used in double-blinded trials. They then tested the capacity of human judges and statistical methods to identify the spam. Humans achieved barely-better-than-chance success in identifying the spam despite the wealth of contextual clues that humans bring to their understanding of language. But the best of the statistical methods—not unlike those used by email-clients to identify spam emails—worked a treat: nearly 90% accurate.
You should read the paper: it’s short and not too gummed-up with jargon. But, for the impatient, here are some of the chief clues that distinguished spam in the case of hotel reviews:
- Like most informative writing, truthful opinion typically contains more nouns, adjectives, prepositions, determiners, and coordinating conjunctions, while deceptive opinion, like imaginative writing, consists of more verbs, adverbs, pronouns, and pre-determiners (in the phrase “both my visits…” the determiner of the noun “visits” is “my”; the pre-determiner is “both”). Note that superlative adjectives are more characteristic of deception than of truthful opinion.
- Truthful opinions tend to include more sensorial and concrete language than deceptive opinions. For example, truthful opinions contain more spatial detail (location of the hotel, size of the bathroom etc). Deceptive opinions contain more detail on external factors (“husband”, “business”, “vacation”).
- Deceptive reviews have more positive and fewer negative emotion terms (of course, since this is the goal of the deception)
- Although deception is usually associated with less use of the pronoun “I”—perhaps revealing a psychological distance from the deceptive conduct—the researchers found that the increased use of the pronoun was one of the most successful markers of deceptive reviews as the writer attempted to emphasise their presence in the (phoney) account.
OK: now for the test. Armed with your new knowledge, decide which of these two opinions (from the paper mentioned above, with the mis-spellings) is real and which is spam:
1. I have stayed at many hotels traveling for both business and pleasure and I can honestly stay that The James is tops. The service at the hotel is first class. The rooms are modern and very comfortable. The location is perfect within walking distance to all of the great sights and restaurants. Highly recommend to both business travellers and couples.
2. My husband and I stayed at the James Chicago Hotel for our anniversary. This place is fantastic! We knew as soon as we arrived we made the right choice! The rooms are BEAUTIFUL and the staff very attentive and wonderful!! The area of the hotel is great, since I love to shop I couldnt ask for more!! We will definatly be back to Chicago and we will for sure be back to the James Chicago.
Click here for the answer.