Introducing the volatility ranges: another way to look at the uncertainty of polls

I just posted the latest federal projections, using the most recent numbers from Abacus. So head over there if you want to read my comments and see the details. This post is about the uncertainty of polls and how to represent it.

When dealing with polls, uncertainty is clearly present. Not only do we have the natural statistical variation (due to sampling), but we also have the fact that a lot of people aren't fully decided and comitted to one party. In the latest Abacus poll for instance, as much as 22% of the respondents are labeled as undecided. When polls are wrong like it has been the case in the recent Alberta, Quebec and BC elections, pollsters often cite a "late swing" of undecided to justify their mistakes.

Also, when polls are reported in the media, it's usually the case that only the voting intentions are reported, with the margins of error just mentionned (usually at the end of the article). Not only that, but these margins of error only account for the normal statistical variation. [Note: I know online polls don't have a so-called probabilist sample and the normal margins of error don't apply in theory. For the sake of argumentation for this post, let's forget about this issue that hasn't been well studied or solved yet]

All in all, we don't have a good and clear picture of the uncertainty that exists in every poll.

Abacus now ask a different question to measure voting intentions. Instead of simply asking "Who would you vote for?", they ask respondents to rank their likelihood to vote for each party, on a scale of 0 to 10 (from less likely to more). Using this unique information (as far as I know, no other pollster in Canada does this), I calculated what we could call volatility ranges. These ranges, or intervals if you prefer, can be seen as confidence intervals that account for the normal statistical variation as well as for the additional uncertainty that arises due to the fact that most respondents aren't fully decided for one party. If you prefer, just think of the fact that measuring voring intentions isn't the same as measuring the average age of a population for instance. In this case, the only uncertainty is really due to the statistical variation (i.e: maybe you got a bad sample). But voting intentions are different, people can be undecided between two or more parties, people can change their mind, etc. The data from Abacus allow us to have a different look at this issue.

A big thank to David Coletto from Abacus for giving me access to his data and collaborating with me.

Here are the steps to calculate these volatility ranges:

1. Conversion of the likelihood score into probabilities. Respondents are asked to report their likelihood, from 0 to 10. But what exactly does it mean? Does someone who reports a likelihood to vote Liberals of 5 have 50% chances of voting Liberals? Or, is someone with a likelihood of 8 twice as likely to vote for this party as someone at 4? We can't compare the likelihood to the actual vote, but we can compare the likelihood to the answers to the traditionnal voting intentions quesiton. By doing so, we can observe for instance how many people with a likelihood to vote Liberals of 5 would actually choose this party when asked who they'd vote for. Doing so revealed some clear non-linearities. For instance, among people at 5 for the Liberals in the latest Abacus poll, only 14% would support this party when asked their voting intentions. But this number climbs to 95% for respondents with a likelihood of 10. Overall, it seems people become serious about a party when reporting a likelihood of 7 and more.

What I observed as well is that some parties are better at converting the likelihood into actual support. In the last Ontario election for instance, there were almost as many respondents with a likelihood to vote Conservatives of 5 or more as respondents in the same situation for the NDP. But when asked their actual voting intentions, the PC was largely ahead of the NDP (a situation that also occured on election night). In other words, there were a lot of people who could vote NDP, but only a fraction of them ultimately chose this party.

2. The second step is to randomize the conversion rates. The idea is to even out the various parties' conversion rates. If you prefer, we're trying to see what could have happened if the NDP had the same conversion rates as the Conservatives for instance. For each randomization, some parties are assigned high conversion rates while others are assigned low rates. The possible range is determined from the observed min and max. Specifically, in the latest Abacus poll, among people with a likelihood of 6 for instance, the conversion rates varied between 20% (for the Conservatives) and 36% (the Liberals). So we randomize between this range and proceed this way for every likelihood score. We don't simply make up probabilities.

3. Once this is done, we apply these conversion rates to each respondents, based on his or her likelihood. To follow on our example, someone with a likelihood to vote Conservatives of 6 would then see this number be converted into a probability between 20% and 35%. During this step, some normalization is usually required. For instance, we can have a respondent with a likelihood to vote Liberals of 6 and a likelihood of 7 for the Conservatives. We need to make sure the probabilities sum to 100%.

4. The final step consists of sampling at the respondent level. Specifically, each individual is randomly assigned to one party, based on his probabilities (which are themselves based on his likelihood to vote for each party). So if someone has 40% chances to vote Conservatives and 60% to vote Liberals, this individual will be assigned to one of the two parties in each simulation. Out of the 5000 simulations, this individual willl of course have been assigned 60% of the time to the Liberals. By doing that for each respondent and then summing over all of them, we get a simulated outcome (i.e: we get that the Conservatives would received 400 votes out of a 1000 respondents for instance while the Liberals would get 380 votes). By repeating the process 5000 times, we get the possible ranges of outcomes. What is reported below are the 95% confidence intervals for these outcomes.

It's important to understand that we aren't making up support. The support exists in the form of the likelihood to vote. We simply run simulations to see the possible outcomes. If you prefer, instead of simply saying that there are people undecided between the Liberals and the NDP, we see what could happen if these undecided were to vote for one party or the other. But we do it in a systematic and robust way that accounts for all the parties and all the various likelihoods.

Results using the latest federal data from the Abacus poll:

Voting intentions
Volatility ranges

The three parties are really all tied. The NDP even has a higher potentiel than the Liberals, despite being polled below them with the traditionnal voting intentions question. It means that in this poll, the Liberals were better at converting thir support than the NDP. It's another way to look at the importance of getting the vote out. Parties must not only convince people to possibly vote for them, they must also make sure they end up doing so!

If we compare our ranges to the normal confidence intervals, we can see that ours are naturally bigger. it makes sense since they account for more uncertainty. Specifically, the Abacus poll, with 1600 respondents, would have margins of error of + or - 2.25% for the three parties (since they are all around 30%). That would give intervals of 5 points. On the other hand, our ranges are around 15 points, or + or - 7.5 if you prefer. Therefore, it seems that the uncertainty due to the fact that a lot of respondents aren't fully decided and comitted to one party can be represented (in this case) with an additional + - 5 points.