Why automatic sentiment is not working, and why it’s a good thing.
Let’s start with an example: Imagine you work for Dell and you have to rate the following conversations:
- HP is good
- Dell is good, but I still prefer HP
- The new Dell PC is good but the previous one worked better.
- The new Dell PC is good but I miss the look of the previous one.
- HP works great but I hate them
- Dell works great but I hate them
- Dell is as good as HP
- HP, Dell today’s PC’s are the same crap, maybe Dell a little bit less
- HP and Dell are nice entry level products
- Dell is good if you can afford it.
- Dell is exclusive.
- I would only recommend Dell’s PC to small businesses.
- HP is the Dom Perignon of Netbooks.
- Apple is to Dell what Saint Amour is to Beaujolais Nouveau
- I worked too much on my Dell last night and I got sick looking at the screen.
- Dell is only good for gaming
- HP is like it was in the Packard’s time
- HP is like what it was in Carly’s time
- No wonder why the Dell stock is going South
- I love HP (from HP’s PR agency or Director)
- I hate HP (from an employee recently fired)
- A tweet re-purposing the one above without any additional comment
- eCairn is delivering automatic sentiment in 50 languages with 105% accuracy.
Hard to rate isn’t it? Maybe you should do it again ?
These are fairly standard sentences, not corner cases. There is no irony and no borderline use of language (except the last one), but it just shows that:
- Sentiment is subjective
- Sentiment is role based: depending on whether you’re the brand manager for Dell overall, or in charge of the new Dell notebook, or the Small Biz vertical, QA manager, Investor relations, or looking into competitive analysis, you may have different views on what positive means.
- Language to express sentiment is context dependent and cultural (wine analogy)
- Language to interpret sentiment is context dependent and cultural (see the HP history analogy)
- Sentiment analysis has to factor in the “who”, and account for real and pseudo duplicates. Same conversation from different persons, similar conversations but from the same person. Everybody is re purposing nowadays, re tweeting, using ping.fm or alike, not counting the huge amount of “robot “ sites or blogs that just sit there to attract traffic and ads.
- “Social media” is not a valid sample of any user base, even when the user based is defined as internet users. As an example: the Opensource/Linux community is way more vocal than the windows one. Just taking data from the river of news from community is a statistical heresy, except if you want to study Linux fans. Taking into account the re purposing point, twitter is even not a representative sample of the “twitter” population…
That’s why solutions with so called automatic sentiment ends up with 60+ neutral, 70% precision and manual options to override the machine generated sentiment.
It just does not make sense. Putting everything neutral may well be a better bet from a recall and precision standpoint.
I’m not saying it can’t be done. It just can’t be done “generically”. Now if you build a solution for sentiment analysis specializing in the stock exchange/ investor community, this is another story and I can give you solid pointers for that , just email and have your checkbook ready.
You would have to build up dictionaries, invest in a learning algorithm, train it… and yes, that sounds doable but it would be very expensive to setup and the applicability of this would be limited to “stock exchange”, even maybe to stock exchange in 2010.
So forget this Holy Grail, stop wasting $$$ on producing low quality results and come back to the initial objectives of the brand:
Get sentiment on its brand from a specific audience with a reasonable investment.
There are alternatives to reach this objective and the good news is that maths comes to the rescue.
1. Why not sampling? , Marketers have always used focus group and samples, why not extending that on the social web?
2. Why not rating manually? When you zoom in a specific community, manual starts to make sense. We looked at the top Mommy bloggers that we’ve mapped (top 3500) and went down to 2000 discussions about Pampers in the last 6 months. One can do a good job rating 3 conversations per minute, so it’s roughly a 12h job, at $20/hour, that’s $240 over 6 Months.
3. Moreover, focusing on a specific community brings more consistency between the conversations that you rate and the quality of the rating would be higher ( Apple means Apple and Orange means Orange :-), in other words, fruits in the Food Community and brands in the Wireless/Telco Community). It’s easier as an example to establish a standard to rate blog conversations from Mommies on Pampers, than one for any conversation on Pampers (that would go for analyst report on sustainability to conversations from experts to comments on their new ad campaign)! Conversations are more consistent, more alike and easier to rate with a focus. You also get specific results for your key target communities and way more actionable results.
4. While you rate, you also spot insights, key conversations to share, ideas for content marketing.
5. Also doing it this way, you will actually find your promoters and detractors. Connecting conversations to people, you will see who’s moving in positive territory and see whether clusters of influencers are moving in the right direction. This is key for targeted outreach campaigns.
Last but not least, I’m still wondering what type of actionable plan a brand can take when its “sentiment” drops by 3% with an accuracy claimed at 70% … The Motrin case went wild over the week end, Domino Pizza within hours. So for crisis management, investing in Proactive ORM and building up a solid base of fans within the target community is a much better option. (Ford’s approach).
The morale: when 93% of consumers say they want brands to engage in social media, I doubt that they mean engaging with algorithms and I bet they are expecting real and empowered persons. But that’s another story.