Adding uncertainty to the model

The same way polls are provided with margins of error, it would make sense to provide seats projections with some sort of uncertainty. But how to proceed? I'll explain here how I proceed and why.

1. Using the MOE in each province.

One logical solution would be to make projections using the MOE of the polls, or the average of polls since this is what I'm using for the "official" projections. It should be noted that projecting a federal election is more difficult than a provincial one because we need to estimate and apply the model in many provinces. And because pollster usually have a 1000 observations sample size, the polls are much more accurate when these 1000 observations are from one province only (like from Quebec) than from the entire Canada. Therefore, even by using an average of polls, we are still left with really big MOE in some provinces like the Atlantic or the Prairies. Here is an example.

Let's focus on BC, a province with 36 ridings. The Nanos polls have around 150 observations for this province, meaning MOE of 8-points. Let's see how much we can shrink this MOE by averaging most recent polls. The last Ekos poll had 302 observations for this province, thus a 5.6-points MOE. We also had a recent Leger poll, with 349 observations and a 5.6-points MOE. Finally, we also had a Harris-Decima poll with MOE of 6.2-points for BC. Those are the four most recent polls (note: I consider the Nanos as one poll only since every day, the update consists of adding around 400 new observations only. Hence, Nanos has one new full poll every 4 days). If we average these four polls, we get a MOE much smaller of course (here, a big assumption is that the only difference between these polls is the sampling size. In reality, we know that other things matter as well, such as the methodology (online or by phone for instance). But I'm not taking that into account). The MOE for the "meta poll" is 3.2-points only (if you wanna know how to do this calculation, simply sum the square of each MOE, take the square root of the sum and then divide by the number of polls). So basically, at 95% confidence interval, the percentages for BC look like this (note: technically the MOE is NOT the same for every party, it depends on the level of support. But pollsters and media usually use the approximation and give the same MOE to everybody):

CPC: 40.2% ±3.2%
LPC: 22.3% ±3.2%
NDP: 24.1% ±3.2%
Green: 11.3% ±3.2%

If you do the same calculations, you get a meta MOE of 3.8% in the Atlantic, 2.3% in Quebec, 1.9% in Ontario, 3.6% in Alberta and 3.4% in the Prairies. Therefore, to calculate the upper bound for the Conservatives, we should input the upper bound of the confidence interval in every province. What we are supposed to do with the other parties is not that clear. Should I also input the Liberals at the lower bound? It would make sense since if one party is doing better, it means the other ones are doing worse (in each province, the votes for the 4 four (5 in Quebec) main parties usually sum to 96% or even more, leaving very little space for the "others" parties). Let's try first to simply input the Tories at the max, leaving the others at the average.

Using the last 4 fours polls mentioned, the projections would change from 148 to 157 seats. Inputing them at their lower bound for the percentages, they would get 136 seats. This means that the 95%-confidence interval for the Conservatives is 136 seats to 157 seats.

The best case scenario for the Liberals would increase their seats from 75 to 82. The worst case scenario would be 71 seats, meaning the 95% CI is 71-82.

So overall it makes sense. It requires a lot of calculation of course. But is there a simple way to find these confidence intervals? Let's look at solution number 2.


2. Using the "close races".

What I call a close race is what others might call a marginal seat. This is a riding won by a margin of less than 5-points. Using the four polls mentioned, the Tories are projected to win 131 safe seats and be involved in 38 close races. For the projections, they would win 17 out of those 38 races. For me, a 100%-confidence interval would be 131-169. It is way bigger than the 95% CI calculated above. So what can we do? Well, there is no way that a party would lose ALL the ridings where this party was close. Neither is there a chance to win them all. If you look at the past elections, parties usually win between 30% to 70% of close races. Let's apply this here. If the Conservatives were to win only 30%, they would win 131+30%x38=143. If they were to win 70%, they would be at 158. That would give us a CI of 143-158, not so far from the above calculations. For the Liberals, the same interval would be 58+12=70 and 83. Here we are even closer.

At the end, this method of simply attributing more or less close races is of course less scientific. You can't use MOE to back this method. But it seems to provide a good approximation of the uncertainty in the projections and they require way less calculations. On top of that, one could argue it is more reality-based (as opposed to being based on some theoretical statistical concept).

For the final projections, I will probably do the two calculations. But for now, I'll simply mention the number of safe seats and close races.