The 9th of November, 2016, is likely to remain etched in the memory of anyone involved in research and analytics. Not only did Donald Trump, contrary to all predictions, win the US presidential election. To make things worse (from an analyst’s perspective), the situation constituted déjà-vu over the failure of surveys to predict the Brexit vote a few months earlier.
Both of these apparent failures of data carry messages of value for data-led marketing efforts.
In both cases, most pollsters had foreseen a victory of the political establishment. According to Nate Silver’s FiveThirtyEight-website, for example, Donald Trump’s probability to win was estimated at only 30 percent prior to the vote (a number already much higher than that of other observers). The misses were particularly severe in the states Trump won, where the predictions were off 7.4 points on average.
These failures led to a number of critics questioning the usefulness of analytics in general. Republican strategist Mike Murphy, for instance, regretted live on MSNBC: “Tonight, data died”. Other experts wondered why we were still “relying on polls to predict election results at all”, or transporting the issue into the business realm, asked whether Trump’s victory signaled “the end of data-driven decision making”.
Indeed, in front of such fiascoes, it is only legitimate to challenge the reliability of data science. It may work for physics or chemistry, but that does not mean that it can be trusted in or applied to areas such as politics, marketing, advertising, sports, etc. With the advent of so-called big data, we were inclined to believe that human behaviour was predictable—a notion that is now likely to come under closer scrutiny than ever before.
As we try to understand the reasons for the disconnect between forecasts and actual outcomes, one of the first objections that comes to one’s mind concerns the perception of the methods employed as being unscientific. While purists may argue that “social science” does not qualify as a science in the first place, the one element that makes political or business forecasting questionable from an academic perspective is its lack of transparency and openness. Whereas in traditional sciences, the work of each member of the community is peer reviewed, the models used to produce such predictions are rarely examined by other experts before being published or used in decision making.
Furthermore, as these models are often proprietary or “closed” (think of a black box), it is difficult to build on one another’s contributions, as is the case in other disciplines.
Regardless of the scientific domain, however, the results yielded by each method can ever only be as good as the data they are based on. If the quality of the data is bad, it would be unfair to blame everything on the technique. Given the involvement of people in the collection and processing of the data, it is possible that the problems were due to human error (during the transcription of subjects’ responses into a spreadsheet, the integration of various tables and files, etc.).
Taking aside these factors, though, another cause of the incorrect predictions could be a defective sample selection. Indeed, it seems that the polls may not have reached all the likely voters, thus leading to distributions that did not match those of the entire population. When samples do not reflect the true makeup of the electorate in terms of demographics (gender, age, race, education, income, etc.), the models based on these are hardly applicable to the rest of that population. Similarly, problems can emerge when samples are not large enough to stand for the actual citizenry—for example in the case of small cities, which, under certain circumstances, can still make a difference in the outcome of a vote.
Yet even representative samples can prove insufficient when respondents lie or are embarrassed (respectively, too shy) to express their true intentions in front of a pollster. This can happen when the answer is deemed “unacceptable” by the general public, such as preferring an antiestablishment candidate, or a man, or a white man. While people choose to give a “socially desirable” reply in a survey, things may change in the voter booth, where privacy is guaranteed. This induces analysts to create erroneous models based on biased data that represent an embellished, politically correct, world, while the reality is quite different.
That said, bias does not necessarily have to originate from the data. In the present case, it appears that the researchers themselves may have been the culprits. Huffington Post, for instance, had forecast a 98 percent chance of victory for Hillary Clinton. Given their fervent support for her campaign, it would not be surprising if they had (inadvertently) refused to question the results of their models, simply because they wanted to see Clinton win. Such confirmation bias, i.e., the “tendency to search for, interpret, favor, and recall information in a way that confirms one’s preexisting beliefs or hypotheses, while giving disproportionately less consideration to alternative possibilities”, may indeed have led analysts to ignore some of the signal in the available data, which could have been correct after all.
'Black-swan events' in marketing
By now, some readers will have asked themselves how this is relevant to media and advertising. Black-swan events like the ones mentioned above cannot be excluded in the media and advertising landscape, as forecasting errors happen here as well. Think of all the products that were launched under great fanfare and lately turned out to be flops or of celebrities’ whose popularity wane much faster than expected. Wrong predictions here can be very costly for brands and companies. More often than not, they have their origins in similar errors or biases as the ones previously discussed.
Transposing these into the context of advertising, here some examples of things that can go wrong:
- Error: The respondent misunderstands the question and consequently gives a nonsensical answer (example: the interviewer asks “are you happy” (“你幸福吗?”), the respondent replies “no, my name is Zeng” (“我姓曾”);
- Selection bias: A survey is carried out on the web (thus ignoring the part of the population that does not have access to the internet);
- Socially desirable responding 1: A respondent claims having used a certain product X, when in reality he uses the cheaper product Y (in the same category);
- Socially desirable responding 2: The respondent of a TV rating panel indicates that she watched a documentary, when in reality she watched a variety show;
- Confirmation bias: The analyst of an online TV platform overestimates the viewership (or success) of an upcoming drama, ignoring the fading popularity of its starring actress.
Notice that the list does not even include a number of other factors that are currently inherent to the advertising environment, such as ad fraud, the lack of viewability statistics, or the latent conflicts of interests plaguing end-to-end solution providers (i.e., vendors that enable both the delivery and the tracking of impressions). Furthermore, the oligopolistic nature of the publishing market, especially in China, can also be seen as a cause of siloed data, which themselves impede the digital connection of customer journeys. In this regard, the challenges in advertising are just as material as they are in politics.
Thus, one of the key aspects in data-driven decision making is measurement. As we say in the world of analytics, “garbage in, garbage out”, i.e., predictions or recommendations generated by means of an algorithm or a predictive model are only as good as the data it is trained on. If the quality of these data is not high enough, the results are likely to be inaccurate, and certainly not replicable over time. Fortunately, there are ways how to avoid mistakes, respectively, to improve the situation:
1) Design or employment of more robust polling techniques: Rather than asking direct questions, such as “whom do you plan to vote for”, the focus could lie on citizens’ actual concerns operationalisd through specific question about issues such as the economy, security, immigration, environment, healthcare, etc. Analytical techniques and models could then be used to determine the propensity of individual voters to prefer one candidate or the other. For marketers and researchers, this would imply the…
2) Collection and analysis of data as related to consumers’ emotions: The survey questions involve having respondents rate their preferences on a broad scale (e.g., from 0 to 100), rather than asking a binary question (yes/no). Such an approach allows voters or consumers to contribute relevant data, even when they remain undecided or are not clear about how they intend to behave. This is particularly important in the context of branding, since emotions are the main driver of people’s behaviours and actions when they are making purchases. One way to achieve this consists in the…
3) Mining of open conversations in blogs or social media: Just like political analysts are interested gauging the population’s emotional affinity (i.e., the “degree of liking or disliking for someone or something”) for certain candidates, marketers should do the same with a target audience’s affinity for specific brands. Adding this kind of information into one’s brand-building can facilitate the creation of an emotional bond with the audience, thus strengthening the relationship between brand and consumer. Significant progress can be made through the wise…
4) Combination of “big data”, predictive analytics and machine learning: With the right partner or technology in place, it is possible to track the market’s sentiment or the emotional pulse of consumers in real time, thereby boosting the precision of the measurement. Such capabilities, in turn, let brands or decision makers gain in agility and respond much faster to potential consumer shifts or mood swings. However, since people can also lie about their emotions, one option in the middle or long term could be a…
5) Reconsideration of the use of surveys altogether: Some companies, for example Mars-Wrigley [Editor's note: Mars-Wrigley is a client of Publicis], have already made a decision in this direction, declaring that they prefer to “focus on understanding how people behave, rather than on what people say”. As mentioned above, there can be inconsistencies between what people are telling an interviewer and how they actually behave. Furthermore, claimed statements should be considered with caution, as these can be affected by a number of external factors, for example the data collection method employed (as well as the poll taker herself), the respondent’s memory, the nature of the question, etc. Cross-channel consumer panels that track both people’s media consumption behaviours and their shopping baskets constitute a judicious step forward. For companies who already own and manage first-party data (e.g., through their CRM system or eCommerce site), one viable opportunity may lie in the…
6) Enrichment of the available data with additional variables: Although it is possible to achieve reasonable levels of model accuracy with one or two sources of data (for example, demographic and behavioural), it can always help to add more variables into the mix. By including consumers’ current sentiment or latest dealings on the web, predictive models can be trained in a way to capture those on-going trends as well. This does not only improve the precision of the outcome, but can also increase the relevance of the business recommendations. Predictions can be generated more frequently, faster and adapt to given contexts, thus becoming much more agile. Since the gathering and combination of offline and online data takes time and is not quite trivial, the last piece of advice concerns the…
7) Development of a data management platform (DMP): In simple terms, a DMP is a data warehouse that ingests, stores, processes and delivers consumer data in form of cookie IDs. It is used by marketers, agencies, publishers or other businesses to segment audiences, determine look-alikes of existing customers, or else generate insights about a given population. Although their ultimate purpose is typically to optimise media buying and ad creative, DMPs constitute a formidable tool to better understand consumer information. As such, they are instrumental in helping brands win, serve, and retain their customers.
Although a number of concepts and technological elements (such as big data, machine learning, DMP, etc.) were mentioned here, the key message pertains to the importance of measurement. Data hygiene as well as sound modelling practices should be on any analyst’s, pollster’s and marketer's agenda.
Not only should they avoid relying on analytics without understanding its limitations, they also need to be aware that no model can ever guarantee perfect accuracy. Only then can the general public gain trust in the effectiveness and value of analytics.
Most conveniently, major elections are coming up in France this weekend and Germany (September 2017). At that time, the marketing community will have more reminders to learn the lessons of last November and to show how alive data really is.
 Piatetsky, Gregory (2016): Trump, Failure of Prediction, and Lessons for Data Scientists, available at: http://www.kdnuggets.com/2016/11/trump-shows-limits-prediction.html
 Guis, Isabelle (2016): How a Data Scientist Would Autopsy the 2016 Election Polls, available at: http://www.information-management.com/news/big-data-analytics/how-a-data-scientist-would-autopsy-the-2016-election-polls-10030506-1.html
 Metz, Cade (2016): Trump's Win Isn't the Death of Data—It Was Flawed All Along, available at: https://www.wired.com/2016/11/trumps-win-isnt-death-data-flawed-along
 Cave, Andrew (2016): President Trump - How The Pollsters Got The U.S. Election So Wrong, available at: http://www.forbes.com/sites/andrewcave/2016/11/09/president-trump-how-the-pollsters-got-the-us-election-so-wrong/
 Pattek, Sheryl (2016): Does Trump’s Victory Signal The End Of Data-Driven Decision-Making?, available at: http://www.information-management.com/blogs/big-data-analytics/does-trumps-victory-signal-the-end-of-data-driven-decision-making-10030330-1.html
 Flam, Faye (2016): Why Science Couldn’t Predict a Trump Presidency, available at: https://www.bloomberg.com/view/articles/2016-11-11/why-science-couldn-t-predict-a-trump-presidency
 Piatetsky (2016)
 Guis (2016)
 Murray, Peter Noel (2013): How Emotions Influence What We Buy, available at: https://www.psychologytoday.com/blog/inside-the-consumer-mind/201302/how-emotions-influence-what-we-buy
 Harry, Keith (2016): Trump, Trust, and Why the Polls Got It Wrong, available at: https://www.linkedin.com/pulse/trump-trust-why-polls-got-wrong-keith-harry or
 Ni Shimin, quoted on the 7th April 2017
Olivier Maugain is general manager of analytics for Greater China at Publicis Media