Recognising phoney on-line reviews

Sur­prise, sur­prise! The mar­ket­ing val­ue of pos­i­tive on-line user-reviews has cre­at­ed an indus­try of liars-for-hire, ready to pimp any prod­uct with phoney “user” endorse­ments. Hotels, cafs, pub­lish­ers, music labels and (hor­ror!) even blog­gers can read­i­ly find cut-price on-line tes­ti­mo­ni­al­ists to pimp their prod­uct on sites like

Unfor­tu­nate­ly, there’s a good chance we’ll fail to iden­ti­fy the opin­ion spam lurk­ing among the legit­i­mate reviews. But the dis­cov­ery of a high­ly effec­tive (89% accu­rate) method for weed­ing-out the junk, using sta­tis­ti­cal text analy­sis, pro­vides some clues to bear in mind while scan­ning the review sites.

Whether I’m look­ing for a hotel or a piece of soft­ware or buy­ing a book or some music on-line, I habit­u­al­ly scan both the top and the bot­tom rat­ings on App stores, Ama­zon, Trip Advi­sor … to get an idea of what peo­ple loved and hat­ed. I find myself weigh­ing the num­ber of ‘stars’ and the dis­tri­b­u­tion of good and bad reviews. In prin­ci­ple, it’s always been a good strat­e­gy to rely on the opin­ion of oth­er con­sumers. One of the real advan­tages of shop­ping in a mar­ket rather than a store, for instance, is that you get clues about qual­i­ty or val­ue just by fol­low­ing the crowds. It might be an even bet­ter strat­e­gy for on-line shop­ping giv­en the Web-dri­ven splin­ter­ing of brand as a guide to qual­i­ty and satisfaction.

Although I con­sid­er myself pret­ty can­ny (does­n’t every­one?) about the weight to give these, often anony­mous, on-line reviews I don’t doubt they influ­ence my choic­es. So the dilu­tion of their infor­ma­tion val­ue by opin­ion spam is a seri­ous prob­lem made worse by dif­fi­cul­ty of dis­tin­guish­ing junk and hon­est opin­ion. There are two prob­lems: a lot of the spam looks just like a legit­i­mate review and we’re prob­a­bly not on our guard against it. 

We’re unlike­ly to recog­nise the dis­guised porkies because we all have a “truth bias”. We’re inclined to take what we read or hear from oth­ers at face val­ue because, at the lim­it, human soci­eties just would­n’t work if our scep­ti­cism about the moti­va­tion of oth­ers were unbound­ed. As poten­tial con­sumers, this bias is abet­ted by the con­text: we’re ready to be skep­ti­cal of the qual­i­ty of the prod­uct but, almost as a counter-weight, we’re less wary about the good faith of reports from oth­er “con­sumers”.

The “mar­ket­ing-savy” con­sumer, by def­i­n­i­tion, has nar­row­er bounds to her creduli­ty. But the clues that might flag a phoney review are far from obvi­ous. It turns out, in fact, that most are com­plete­ly obscure unless seen through the lens of sta­tis­ti­cal and lex­i­cal analy­sis of large texts. The mark­ers have been iden­ti­fied in a fas­ci­nat­ing paper in com­pu­ta­tion­al lin­guis­tics by Ott, Choi, Cardie and Hancock. 

The researchers care­ful­ly built a library of opin­ion spam writ­ten by work­ers from Ama­zon Mechan­i­cal Turk that they mixed into a data­base of known “good” opin­ion to cre­ate a research tool that could be used in dou­ble-blind­ed tri­als. They then test­ed the capac­i­ty of human judges and sta­tis­ti­cal meth­ods to iden­ti­fy the spam. Humans achieved bare­ly-bet­ter-than-chance suc­cess in iden­ti­fy­ing the spam despite the wealth of con­tex­tu­al clues that humans bring to their under­stand­ing of lan­guage. But the best of the sta­tis­ti­cal methods—not unlike those used by email-clients to iden­ti­fy spam emails—worked a treat: near­ly 90% accurate. 

You should read the paper: it’s short and not too gummed-up with jar­gon. But, for the impa­tient, here are some of the chief clues that dis­tin­guished spam in the case of hotel reviews:

  • Like most infor­ma­tive writ­ing, truth­ful opin­ion typ­i­cal­ly con­tains more nouns, adjec­tives, prepo­si­tions, deter­min­ers, and coor­di­nat­ing con­junc­tions, while decep­tive opin­ion, like imag­i­na­tive writ­ing, con­sists of more verbs, adverbs, pro­nouns, and pre-deter­min­ers (in the phrase “both my vis­its…” the deter­min­er of the noun “vis­its” is “my”; the pre-deter­min­er is “both”). Note that superla­tive adjec­tives are more char­ac­ter­is­tic of decep­tion than of truth­ful opinion.
  • Truth­ful opin­ions tend to include more sen­so­r­i­al and con­crete lan­guage than decep­tive opin­ions. For exam­ple, truth­ful opin­ions con­tain more spa­tial detail (loca­tion of the hotel, size of the bath­room etc). Decep­tive opin­ions con­tain more detail on exter­nal fac­tors (“hus­band”, “busi­ness”, “vaca­tion”).
  • Decep­tive reviews have more pos­i­tive and few­er neg­a­tive emo­tion terms (of course, since this is the goal of the deception)
  • Although decep­tion is usu­al­ly asso­ci­at­ed with less use of the pro­noun “I”—perhaps reveal­ing a psy­cho­log­i­cal dis­tance from the decep­tive conduct—the researchers found that the increased use of the pro­noun was one of the most suc­cess­ful mark­ers of decep­tive reviews as the writer attempt­ed to empha­sise their pres­ence in the (phoney) account. 

OK: now for the test. Armed with your new knowl­edge, decide which of these two opin­ions (from the paper men­tioned above, with the mis-spellings) is real and which is spam:

1. I have stayed at many hotels trav­el­ing for both busi­ness and plea­sure and I can hon­est­ly stay that The James is tops. The ser­vice at the hotel is first class. The rooms are mod­ern and very com­fort­able. The loca­tion is per­fect with­in walk­ing dis­tance to all of the great sights and restau­rants. High­ly rec­om­mend to both busi­ness trav­ellers and couples.

2. My hus­band and I stayed at the James Chica­go Hotel for our anniver­sary. This place is fan­tas­tic! We knew as soon as we arrived we made the right choice! The rooms are BEAUTIFUL and the staff very atten­tive and won­der­ful!! The area of the hotel is great, since I love to shop I could­nt ask for more!! We will defi­nat­ly be back to Chica­go and we will for sure be back to the James Chicago.

Click here for the answer.

Leave a Comment

Your email address will not be published. Required fields are marked *