So, this blog
takes a few tangents, and here is the first: I have noticed, increasingly, that
I think in tangents. Two practical manifestations of this include my tendency
to use em dashes in my writing, which – according to Grammar Book (case
in point) – are used to indicate, "an
interruption, or an abrupt change of thought" (aka, a tangent!) And,
secondly, instead of the usual A4 or A5 notepad – or, increasingly, a tablet of
some description – that is the accompaniment of virtually everyone in the world
of work, I have started using an A3 clipboard and paper, because it enables me
to better visualise all of the tangents I take in any given meeting!
[Incidentally,
this very blog is a tangent: I told myself I was going to do all sorts of other
things tonight, but – having spent my morning bike ride thinking about this
(another one, I’ll try to do one per paragraph) – couldn't resist spending my
evening writing it.]
Anyway, I do
some guest lecturing and was recently working with some first year Marketing
students via a lecture and a series of tutorials on 'The Principles of
Marketing Research'. This was one small element within a broad marketing
syllabus, and I wanted to get across the basic principles, of course, but also
draw their attention to some other points that I think will stand them in good
stead for their future careers.
Foremost among
those was my view that there is a fundamental conflict at the very heart of
that term "Marketing Research". Namely, that each constituent part –
'marketing' and 'research' – is a specialist discipline, wholly different to
the other and, often, best lend themselves to people who think predominantly
with the creative right-side of their brain (former) or the calculated left-side
of their brain (latter).
I have seen
this dichotomy innumerable times in my career, as I mostly work as the sole
analyst (left-brain) in teams of marketers (right-brain). For the most part,
this is highly productive because it ensures that a bit of Devil's Advocate is
played, and that an issue is considered from varying perspectives, leading to a
well-tested outcome. But I won't talk about any specific work examples, because
I think we can have much more fun than that.
Instead, I'm
going to talk – as I did to the students – about The Donald (sorry, President Trump). First, a boast; second, a
question. When very few others did, I forecast a win for The Donald; and that leads
me to ask: how did more people not see this coming? It was right there in the
data.
Newsweek infamously did a print-run of their “Madam President” headline, while the Huffington Post – similar to many media
outlets – declared that Clinton had a 98% chance of victory. Paddy Power paid
out on a Clinton win, to the tune of £800,000; then, when Trump won, it cost
them a further £3.5m, and was their biggest ever political payout.
I shook my head
through all of this, boiling down to two (three) things: 1. Confidence levels;
2. the margin of error (MoE); (3. triangulation).
Most of the
assessments that we all made were based on polls; the difference, therefore, was
in how we interpreted those polls. The first chart below shows the extent to
which Trump was ahead (above the line, red) or behind (below the line, blue) in
national polls (more than 160-odd of them) in the six months leading up to the
election. In only 16 cases – roughly 10% – was he ahead. So, most journalists
and marketers – drawing heavily on the right side of their brain – looked at
that kind of chart and thought it's clearly in Clinton's favour: “Madam President”, £800k payout, thank
you very much, and goodnight.
But history shows
that they were wrong; and we need to understand a bit about polls (and surveys
in general) before we continue. Firstly, they can never be 100% accurate
because it is never possible to speak to every single voter in the population
(c250m); but, statistically, if we ask a representative sample of the
population and weight it accordingly, we can be confident to a certain degree.
Polls will be expressed with a confidence level, usually 95%, that says, “we are confident that 95% of the time X
will be true.” The size and structure of our sample, relative to the
population of interest, determines our margin of error. The results of a survey
will not be bang on so, taken together, we use a range within which we are 95%
confident that the true value lies; typically, on a sample size of 1,000 the
MoE is +/-3%, and on a sample of 2,000 it is +/-2% (there is more to this,
still, but this blog is not about that extent of statistical detail – I save
those for weekends!)
So, if a poll
of 1,000 people says that Trump will get 49% of the vote and Clinton will get
51%; what it is actually saying is: there's a 95% chance that Trump will get
between 46% and 52%, and a 95% chance that Clinton will get between 48% and
54%. The key point? Those ranges cross-over one another. It is, effectively,
saying that the outcome is too close to call, statistically.
So what happens
when we super-impose the full range of the margin of error on top of the
previous chart? Firstly, below, we can see that the margin by which either
candidate was leading was rarely outside the MoE – fewer than one in four cases;
and, in particular, when we magnify the polls closest to election day, that is
even less likely: only once was Clinton ahead and outside the MoE and, for the
most part, it wasn't all that close to being outside of it, either.
But, guess what?
Firstly, these polls were broadly correct when we bear in mind that Clinton did
win the popular vote. The thing is that US elections are not decided on the
popular vote (nor are UK ones), but the Electoral College system, which renders
national polling almost irrelevant, and means that only state-level polling
bears any direct relevance to the outcome.
Some states are
always red (Alaska, Idaho, Kansas, Utah, Wyoming), and some states are always
blue (DC, Minnesota). And there are always a handful of key battleground states
with, in 2016, Florida being chief among those (29 Electoral College votes),
along with various others including Ohio (18), North Carolina (15), and
Michigan (16).
State-level
polling, in the equivalent format to the previous charts, and covering the
couple of months up to the election, is shown for Florida, Ohio and North
Carolina below, finding thus:
- Florida: Neither candidate was outside of the margin of error at any point. While Clinton was ahead in more polls (11 vs. 8), this was only the case in one of the five polls closest to the election. What’s more, Trump consistently came closer to being outside the MoE: his average distance from the edge of the MoE was 2.8%pts, against 5.1%pts for Clinton.
- Ohio: Only once was a candidate outside the MoE, and that was Trump in the poll closest to the election. Trump was ahead more often, 16 occasions vs. 6. Again, Trump consistently came closer to being outside of the MoE: 3.6%pts, against 4.8%pts for Clinton (or 5.6% pts if we exclude the outlier that was two months from the election).
- North Carolina: Only once was a candidate outside of the MoE, and that was Clinton about a month before the election. And, again, she was ahead more often but not so in the polls closest to the election. Significantly, as the election drew closer, it was Trump who came closer to being outside of the MoE. Overall, on average, Trump was 3.1%pts away from the MoE (or 1.8%pts if you only consider those closer to election day), against 4.7%pts for Clinton.
Trump won all
of these states; and it was looking at this Florida and Ohio polling ahead of
election day that convinced me that he was going to win. The clues were there.
Now, margin of
error doesn’t explain all of this, and nor was it all the fault of
right-brainers (journalists, Paddy Power’s marketing team) misinterpreting the
data. The left-brainers made plenty of mistakes too and, indeed, it is the
polling industry that has come in for most of the flak over the last eighteen
months (UK General Election: ‘wrong’; EU Referendum: ‘wrong’; US Presidential
Election: ‘wrong’). There were methodological flaws in the polling, from
sampling to weighting (e.g. under-representation of historical non-voters), and
some of these missed the ‘Trump Effect’. Further, states like Michigan and
Wisconsin consistently had Clinton ahead, often outside of the MoE, but Trump
ended up winning both.
And further to
that, my charts above are a little simplified, because the point estimate is
still the most likely result, while the closer to the extremes of the margin of
error we get, the less likely those results are – this is why the finding that
Trump was often closer to being outside of the margin of error is important, as
the point estimates were more in his favour.
Finally, of course, is the confidence level. A 95% confidence level means that there is a 1/20 chance that a poll is entirely wrong - point estimate, margin of error, everything. With an election every four years, any given poll is likely to call entirely the wrong outcome once every 80 years - perhaps this was a once in 80-year event, one of Taleb's 'Black Swans'. This doesn't really apply here, because there were so many polls, but is important for those conducting one-off polls/surveys to remember; and the likes of the Economist/YouGov published 22 polls in the campaign, NBC/WSJ published 25, and Reuters/Ipsos 21 - each of those is likely to have been entirely wrong once.
Finally, of course, is the confidence level. A 95% confidence level means that there is a 1/20 chance that a poll is entirely wrong - point estimate, margin of error, everything. With an election every four years, any given poll is likely to call entirely the wrong outcome once every 80 years - perhaps this was a once in 80-year event, one of Taleb's 'Black Swans'. This doesn't really apply here, because there were so many polls, but is important for those conducting one-off polls/surveys to remember; and the likes of the Economist/YouGov published 22 polls in the campaign, NBC/WSJ published 25, and Reuters/Ipsos 21 - each of those is likely to have been entirely wrong once.
Beyond this (my
third point) – which I won’t elaborate on too much because this blog is already
far too long – is the need for triangulation. Don't just take one source
(polls) and be satisfied with that, but look to test your findings via other
means. In this case, I found evidence that I deemed fairly compelling on social
media: the left (wing, not brain) usually dominate Twitter, but Trump was doing
so here; and, in the example of Brexit – which I forecast to within a decimal
point and where most of the above was also true – I did so by speaking to
people 'on the ground'.
All of this
highlights where the right- ("marketing") and left-
("research") brain come into conflict. The creative right brain is
looking for the headline, the one key point that makes a compelling story –
we're always told that we need to be succinct, we need an 'elevator pitch' (I
am incapable of making an elevator pitch). But, in this case, the left-brain
would tell you that there literally was no easy headline, no compelling point.
The election was statistically too close to call, but there was plenty of
evidence that Trump would win. The difference can be summed up by considering
two different headlines, one less catchy than the other:
Left brain
headline
Right brain
headline
Hardly 'breaking news', is it?
This doesn't
explain the phenomenon in its entirety; but the point is that the polling was,
categorically, not telling us anything conclusive – and, at the more granular
level, strongly hinting of what would come – yet the media outlets, and others,
were acting as if it were conclusive. Whoever told Paddy Power to pay out
clearly had no grasp of the data, and cost them millions (unless, of course, it
was a pure marketing ploy, which is possible).
To finish by bringing
it back to my first year undergraduate marketers: the point that I was trying
to get across to them was to make sure that you aren't doing a 'Madam President' for your company or
clients; make sure you aren't telling them something categorically that the
data isn't actually saying – be clear about what the data is telling you, the
confidence level, the margin of error, the assumptions you've made…even if the
right side of your brain is telling you otherwise.
No comments:
Post a Comment