Is there some vote splitting in Ontario?

A few weeks ago I wrote this article showing that, at that time, vote splitting between the Liberals and NDP was not the reason for a possible Tory victory. This was back when the Conservatives of Ford were close to 40% and had a comfortable lead.

The situation have of course changed a lot. The New Democrats of Horwath are nopw ahead in the popular vote and the seat lead of the Tories have shrunk (they are still favourite though). So time to re-do the analysis.

As for the first time, I used the second choices as given by Mainstreet and Innovative, they are both very similar and haven't changed much from last time. Basically, the OLP and NDP have each other as main choice (by a good margin) while a majority of PC voters doesn't have a second choice. Those of do prefer the NDP over the Liberals.

So, what are the results? The table below shows you what would likely happen in two possible scenarios. In one the OLP wouldn't exist and I redistributed the Liberals vote according to their second choices. In the other I did the same with the NDP.



It does appear that there is now enough vote splitting going so that the Conservatives of Doug Ford wouldn't win if the OLP didn't exist. It's not surprising. With the NDP being so close behind the Tories, imagine if they were to receive a boost of almost half of these votes while the PC would recover barely 13%.

Interestingly, the Liberals wouldn't be able to prevent a PC majority if the NDP didn't exist. This is partially due to the fact there would a slightly lower transfer of votes from NDP to OLP than in the previous scenario. In general though, the lead of the PC over the OLP in many ridings is just too big and the seat wouldn't be flipped by redistributing the NDP votes.

This is obviously due to the current electoral system. If that bugs you, you can try to vote strategically. There are many resources out there. Just know that this is a very uncertain exercise and unlikely to work (as it requires many other people to do the same). I'd also like to mention that the OLP dropping has helped the PC more in the last week since many seats in Toronto are now narrowly won by the PC. So it's not sure what would happen if some OLP voters were to switch to the NDP. For instance, one of the last 3 seats projected to go OLP is Vaughan-Woodbridge. But the PC is projected second there. If 5% of the Lib vote were to switch to the NDP in that riding, that would flip the riding from red to blue.

Projections update of May 31st 2018

Not a long article today as I already wrote a long text about how provincial and riding polls don't agree with each other. Just a quick projections update.

We got a ton of polls! Like really, it's like the election is tomorrow. The average moved quite a bit with the PC now less than 2 points behind while it was over 3 yesterday. The chances of winning are now further from the quasi toss-up of yesterday as well. I'm starting to believe that the NDP will need a late campaign surge if Andrea Horwath wants to win next win. Make no mistake though, there is still considerable uncertainty.

Voting intentions; Seat projections with confidence intervals; Chances of winning the most seats

Possible outcomes:




Riding by riding projections


The riding and province-wide polls don't match up

My projections are based on what I believe to be the easiest numbers to be able to predict: the province-wide numbers for each party. I base my model on the province-wide numbers because these will be the most accurate numbers provided by the polls, much more than regional or riding level data. With that said, I do make further adjustments if my regional averages are too far away from the polling ones. And when we have riding polls, even though they are less accurate, I do an average between them and my projections when the latter are 5 points or more off.

The problem right now is that the riding polls, all conducted by Mainstreet as part of their daily tracker (you need to pay $30 to have access) absolutely do not match up with my projections. This could obviously mean my model is wrong and this is indeed possible. But the riding polls also don't match with the current province-wide numbers. And this is problematic.

How different are the two types of polls? The province-wide ones show a close race with the NDP slightly ahead (in the popular vote) while the riding polls are showing a landslide Conservative victory.

Since the riding polls are behind a paywall, I can't reveal the numbers. But I can talk about general trends and differences.

There have been 38 riding polls published so far. We don't have the exact dates (Mainstreet said they'd add this info) but they were all published over the last week (and the discrepancy between polls and projections have been more pronounced in recent days). Ridings are pretty much all over the province, from the North (Kenora-Rainy River, Sault Ste. Marie, etc), the East (Ottawa South and West-Nepean, Glenngarry-Presscott Russel, etc), the SW (Sarnia-Lampton, etc), the golden horseshow (Guelph, Kitchener-Conestoga, etc) or the GTA (Toronto Centre, multiple Brampton ridings, etc). It's a fairly varied "sample".

If I compare my projections to these polls (when they were published of course), I currently seem to be overestimating the Liberals by 2.2 points and the NDP by a whopping 5.2 points while I underestimate the PC by 3.4 points. This is significant, it means the average PC-NDP gap is almost 9 points off. This leads to multiple ridings where I see the NDP ahead but the riding polls have the PC first. In some ridings, my model is projecting the NDP significantly up compared to 2014 while the polls have them barely higher.

So again, at first, it seems more likely that my projections are wrong. And for a while I thought this was the case and I made the necessary adjustments. But the more polls we get and the more I think the issue is bigger than that.

If we average the 38 polls, we get the table below. I also added the average deviation with respect to my projections.



If I only use the riding polls published during the last two days, then the projections are underestimating the PC by 4.6pt and overestimating the NDP by 7.4! So the problem got worse.

By the way, I realize some of you might object that these 38 ridings can't be a good "sample" of the entire province (think for instance if you only polled ridings in the 416, your results wouldn't be representative). I did a simple average of the results of 2014 and found 42%, 30% and 21% for the OLP, PC and NDP respectively. This compares fairly nicely to the official province wide results of 39%, 31% and 24%. So while not a perfect sample, it's also not overly skewed. And anyway, when I compare them to my projections, the bias, if there is one, should be the same.

If riding polls were simply providing extra information, I shouldn't have a systematic over or underestimation with my projections. I should sometimes overestimate the NDP and sometime underestimate it. But it is clearly not happening.

Notice also how the average difference between polls and projections match up with the overall difference. Take the 38.8% and subtract the 3.4 points, you get 35.4%, almost exactly the current polling average. Do the same for the NDP and you get 31.7%+5.2%=36.9%, there as well almost exactly the average.

What this shows is that my projections are doing a good job at transposing the province-wide percentages into riding ones. Therefore the average of my riding projections will give you the NDP ahead by around 2 points, as it should. The riding polls on the other hand are showing a situation where Doug Ford and the Tories would still be largely ahead and en route to a landslide majority.

Landslide majority based on the riding polls isn't an exaggeration. If the Mainstreet numbers are right, the PC is about to win the entire 905, all of Central Ontario and Eastern (minus 1-2 seats in Ottawa). It would also win most of the Southwest including ridings we'd think the NDP would be ahead such as Sarnia-Lampton. The PC could even be competitive in the North. So really, the entire province minus some seats in the core urban environments. We are talking a 75-80 seats majority here possibly (what the projections were showing 2 weeks ago basically).

The province-wide polls (along with regional polling averages) show a very different situation, one where the NDP is crushing it in Toronto, competitive in some of the 905, ahead in the Southwest and the North. This would be a close race with a slight edge for the PC.

I don't know why the riding polls are so different. Maybe they were conducted a while ago? It's not even a Mainstreet bias as their provincial polls are in line with others. I'll ask Mainstreet about this. But for now, I can't use the riding polls. I believe the provincial ones more and therefore my projections more. I'll use the data of some of these polls such as Guelph (showing the Green competitive) or the ridings in the North. For the rest? I'll remove the adjustments for now.

Riding polls cost me two correct calls last year in BC and I think that this article should convince you that they aren't telling the same story. So I have to make a choice as to which polls I trust. And the choice between multiple province-wide ones or some riding ones is easy. After, I may be wrong and will regret it next week, but at least you know where I stand.

Projections update for May 30th: Almost a toss-up but momentum is for the NDP

As we got more polls yesterday (Mainstreet tracker, Pollara and Innovative), a trend is emerging: the NDP is rising again. Let's just look at the swings between the last polls for each firm:

PC NDP OLP
Pollara -5 5 -1
Innovative -2 5 -4
Mainstreet* -1.2 3.4 -1.6
Ipsos 1 -3 -4
Abacus -2 3 -1
*Comparing the last two full trackers

There is an obvious outlier to the general trend (Ipsos) but otherwise everybody seems to agree: the NDP is rising while PC and OLP are dropping.

This is why the polling average now has the NDP with a fairly comfortable lead. Yes some of it is due to the Pollara poll that has the NDP at 43% (!) but the trend is there.

The most up to date projections are:



(Note: I Tweeted yesterday that my projections were showing 49% chances for both the NDP and PC. That was true but I then made other adjustments, mostly based on riding polls).

Despite the trend and a fairly large lead for the NDP in the popular vote, the projections still have the Tories ahead! How come, right? Well, I was just as puzzled as you. So here are some explanations.

1. The riding polls (all from Mainstreet) are absolutely terrible for the NDP. To the point where I can't explain some of the results. According to Mainstreet, the Tories would actually be competitive in the North and win ridings I thought were fully safe for the NDP. In other ridings like Hasting-Lennox and Addington, the PC would be currently crushing the NDP by 34 points! The NDP would actually barely have increased since 2014 despite being up almost 14 points province wide. This makes no sense to me whatsoever. This is happening in many other ridings. Based on these polls, my projections are overestimating the NDP in the East, GTA, North and the Southwest. And by double digits sometimes.

So I included some of the polls but I excluded others for now. But it remains that those adjustments are giving the PC a couple more wins and making its vote even more efficient. I'll have to decide how much I want weight I want to give to those riding polls...

2. The model is currently projecting the PC to win a large majority of the close races (margins of 5% or less) it is involved in. Specifically, the PC would win 21 out of 29 races. That seems high. This would require a really good GOTV and/or some luck. On the other hand the NDP only wins 6 out of 19. Imagine a more likely "success rate" of 50%, the PC would then be at 62 seats and the NDP at 55, a much closer race already. I'll have to take a better look at these ridings in the next week.

3. Ultimately we are back to the topic of vote efficiency. I had previously written that my calculations were showing a 50-50 race if the NDP had a 2 points lead over the PC. My new estimations show that the NDP might actually need a 3.5 points lead! So why is that? And no I'm not actively trying to make the NDP lose. Here is what I found. The main difference is that the OLP is now at around 20% instead of 25%. And as unintuitive as it sounds, the Liberals dropping that low is actually helping the Tories!

You can use the simulator to see how it works. Leave the PC and NDP where they are and increase the Liberals to 25% (yes it doesn't add up to 100% anymore but forget it for now). Did you see what happened? The PC dropped from 68 to 59 seats while the NDP only lost one seat! What this means is the vast majority of seats the Liberals have lost by dropping from 25 to 20% have gone to the Conservatives. We are talking of ridings like Don Valley West, Eglinton-Lawrence, Glenngarry-Prescott-Russel, Orléan, half of Mississauga or Ottawa South.

Most of those are part of the close races mentioned in 1.

It really is a weird situation right now. The PC is dropping, the NDP is rising but vote efficiency is increasing for the PC and keeping this party ahead. But make no mistake, this is a much more precarious lead than what they had before. Yes the PC is still projected at 65+ seats but this party is now in a situation where it could easily lose. A small overestimation by the polls, a more resilient Liberal votes in some key ridings or some NDP surge thanks to young voters (in Toronto for instance). Any of those (plus others) and Andrea Horwath wins the most seats.

Here below are the distribution for the main three parties. As you can see, the ones for the PC and NDP overlap quite a lot. And the NDP's upper tail actually goes further. Notice also that the Liberals, at 4 seats, would clearly be scoring on the low end of their distribution and this party has the potential for way more than four seats.



So don't focus too much on the 68-52, focus on the 55% vs 43%. This is almost a toss up. Factor in the trend mentioned at the beginning and the NDP isn't in a bad position. Also, it is important to realize that the NDP is experiencing a massive swing upward compared to 2014 while the PC is barely up. So my projections might be quite off for the New Democrats. Models like mine are much better suited to forecast riding level results when the swing is small.

With that said, imagine if the NDP were to actually win the popular vote by 3 points but the PC were to get a majority! That could be the best argument for electoral reform in a very long time.

Can you get a quality poll for $200?

Important notice: please do NOT use the numbers of this poll/experiment and tweet them. It was an experiment only. Plus, the poll was actually done last week, so the numbers aren't relevant anymore.

This is the question I asked myself when I saw that we could order individualized Google Surveys.

Some of you probably know what I'm talking about. You can install the Google Opinion Rewards app on your phone and once in a while you'll receive surveys. Usually this is to ask you what you thought of a store you went to (or Google thinks you went to through tracking). You answer and you get a magical 10-20 cents that you can use to buy apps, music and movies on the Play Store.

Ok, I know what some of you will say: this isn't a valid poll! This won't work! This is the same as those non-scientific polls on websites.

Not quite. Google has a robust database (not shocking) and can provide a good sample. In the 2012 US election, the second best poll was actually from Google Survey! There is a page about using it for election. There is a template that is recommended for best results (using a two question system where you first ask people how likely they are to vote).

So look, I'm not trying to argue that a Google Survey is equivalent to a well designed poll by another firm. But if you think this is equivalent to those Sun Media polls on their websites for instance, you are dead wrong.

Ok so as an experiment I decided, last week, to order a poll. I only asked one question because if you ask two, the cost increases significantly (instead of 20 cents per answer, it was over $1!). I asked the following question: "If the Ontario election were held today. Which party would you vote for or are you currently leaning towards?"

And I offered a randomize choice of the parties with the leader's names as well as the "undecided" and "will not vote" options. The poll took 3 days, from May 22nd to 25th to collect the request 1000 observations. It was quite slow at first, which was disappointing, but picked up quickly at the end. I asked Google to ask the question in English to residents of Ontario aged 18 and over.

Ok, enough already, what were the results? Here they are below, in raw, unadjusted form:



Remember that this poll was done between May 22nd to May 25th. Unfortunately for me, I picked the few days where the NDP surged. I could actually see it by looking at the questionnaires coming in. As reference, the polling average around that time was (excluding the Forum poll as I think it's an outlier and I don't have enough polls conducted between the 22nd and 25th to compensate for its crazy results) 21% OLP, 37% PC and 35% NDP (and around 5% Green).

It says 1000 respondents but Google actually indicates that I only have 631respondents once weighted. Once the undecided and wouldn't vote are removed, we have roughly 400 respondents. A survey of that size would have margins of error of 4.8% 95% of the time. So the polling average actually falls within the results of this poll once we account for the margins of error. Except for the "another party". There is an obvious overestimation here. This makes no sense and I can't explain it right now.

My results are quite similar to the tracker from Mainstreet published on the 23rd (it was around 21-38-34 at that date)!

At 28%, this survey also got way more undecided than most polls (between 5% and 15% depending on the firm). To be fair, I only asked one question and even though it said "leaning towards", it might explain partially why I got so many undecided.

So, was the experiment a success? Kinda. I haven't played with the raw file, trying to reweigh the data (which is possible since I have age and gender data from most respondents; I, however, do not have the region or city) to see if I could "correct" the weights to get better results.

I also lost a lot of observations to the wouldn't vote or undecided, so a more serious experiment would require a higher budget to increase the sample size.

Still, I find it pretty interesting that I was able to go online and order a poll for only $200 and this poll's results can actually be reconciled with the polling average. Plus, as I said before, my poll was done during a period of rapid growth for the NDP. The PC had a much bigger lead for the first day and the NDP slowly but surely caught up.

Would it replace a true poll from a known Canadian pollster? Of course not! But there is some potential there. I wish Google would allow regional breakdowns though.

I'll gladly share the raw data file with whoever wants it. Just reach me on Twitter or by email and tell me why you'd want the data. I only asked that you link to my site and mention my name if you'll use the data for a blog post or anything.

Did Kathleen Wynne win the final debate? Probably

We are a few days after the third and final debate and we have our first poll from Innovative (Mainstreet said they won't ask the question). And the data is remarkably similar to my fun, non-scientific Twitter poll! Judge for yourself:



My Twitter poll:


Note: at the moment of writing this article, my poll was showing 27-16-34. Just in case the Russian bots are trying to change the results!

Ok, so past the fun coincidence, the Innovative numbers suggest Wynne most likely won the debate. She did especially well among people who (said/pretended) to have watched the entire debate while Horwath does better among those who watched some of it.

As usual, the problem with simply asking people who they think won is that people are biased. Conservative voters are much more likely to say Ford won for instance.

That's why I always do the debate index where I simply calculate the ratio of the percentage of people saying one leader won over the percentage of people saying they want to vote for that party. An index of 1 means you convinced your base and you likely won't win or lose votes post-debate. An index over 1 is usually associated with a rise in the polls in the next few days.

Without further ado, here's the debate index for the third and final Ontario debate:



So Wynne clearly won. To be fair, it's easier to do well in this index when your party is barely at 20%. Still, an index of 1.81 is quite high and similar to what Horwath got for the first debate.

So, can we expect a late rebound from the Liberals? Maybe. The Mainstreet tracker today kinda showed that but other polls haven't. Also, if the index is right and predictive, the Liberals gains should come at the expense of Ford and the PC. We'll see.

Personally I don't think this debate will have a dramatic effect on voting intentions. Ford's performance most likely means a last minute rebound to 40% for the PC is unlikely (or wouldn't have been caused by the debate) while Wynne might have prevented a last minute collapse.

With 8 days to go, the Tories still have a small edge over the NDP

There is only one week left in this 2018 Ontario election. What once was a safe lead for the Progressive Conservatives of Doug Ford has turned into one of the most competitive elections I've covered. By the end of the campaign, my projections usually give chances of winning of 80% or more to one party. This time around however, it is well possible that the race will literally be too close to call!

That's not completely the case right now though. This isn't a perfect 50-50 race. As you can see below, the chances of winning (the most seats; I'm not even trying to discuss scenarios where two parties would make a deal) are roughly 65-35. The only real certainty is that the Liberals of Kathleen Wynne won't finish first. Beyond this, not much has changed in the last few days.

Voting intentions; Seat projections with 95% confidence intervals; Chances of winning the most seats.

If you want the detailed projections, you can find them at the bottom of this article. If you care more about outcomes, here they are:



Notice that the "majority" scenarios are more likely. This is because the OLP isn't currently winning enough seats to make a minority the most likely scenario. Doesn't mean it won't happen (see BC last year), just that it's not as likely.

If you are wondering what "others" is, those are mostly scenarios where there is a tie. If the PC and NDP were to win the same number of seats, this would create an interesting but potentially messy situation. It could go weeks before we'd know who would become Premier. So I prefer not to speculate and just leave that as "others". As a BC resident, I wish Ontarians to have just as much post-election fun as we did last year!

You can also see these projections on the map. This is the first time I did that and I must thank Rhea Donsman for showing me how (and some Youtube videos). It's most likely not perfect (no numbers for now, just colors) yet but it's fun, isn't it?



The main question some of you might have is really: how can the NDP be higher in average in the polls yet be given only 34% chances? The answer is really vote efficiency. Our electoral system being what it is, how the votes are regionally distributed can make a big difference. The NDP vote is simply less efficient in winning seats than the PC's. At least based on my analysis. I had estimated in a previous post that the NDP would need to win the popular vote by 2 points if Andrea Horwath wantsed to win more seats than Ford. I stand by this analysis. It's not an exact science but 2 points seems to be the lead where this race would be 50-50. As you can see, the polling average doesn't place the NDP 2 points ahead.

While the New Democrats had been rising in pretty much every poll over the last week (including some crazy numbers from Forum...), this trend stopped yesterday. Ipsos published a new poll that showed a rebound from the Tories. Mainstreet published a full poll to the public (as opposed to the daily tracker that is behind a paywall) where the NDP finally took the lead (remember that for a while, phone polls were not agreeing with online polls and continued to show the PC ahead, so Mainstreet having the NDP first was kind of a big deal). But on Twitter, Quito Maggi (Mainstreet's CEO) said that the lead was back to the PC in their new daily numbers. I think "rebound" will be the buzz word for the day. Maybe.

At the end of the day, it's not really possible to determine right now who would receive the most votes. Polls aren't accurate enough to give us such a precise estimate. The best we can tell right now is that the NDP and PC are in a close race, around 35-37%. After, turnout and GOTV could swing the balance one way or the other.

On top of regular polls, I also account for riding polls from Mainstreet. Their accuracy is way lower but they still provide information. Some of it isn't very good for the NDP. In most ridings in the GTA, I have found that my model was overestimating the NDP and/or underestimating the PC. This was the case in 3 riding polls in Brampton. So I made some adjustments. With that said, NDP supporters should know that Mainstreet's riding polls during the BC election last year failed to capture the NDP wave in the Lower Mainland. Still, I can't completely ignore this information.

So what does the NDP need? They either need to sweep Toronto proper, increase significantly in the GTA or somehow manage to target key ridings here and there (1 or 2 in Ottawa, etc). The rest of the province isn't that interesting. The North is already all NDP, Central Ontario is heavily PC, so is the rural South West. The NDP also already wins the urban centers (Hamilton, London, Niagara, etc). So really, there aren't that many possibilities left. Of course, all of this is assuming the race remains close. It's completely possible the NDP will surge in the last few days.

We don't have polling data regarding the debate of Sunday (well I have my own Twitter poll but this is just for fun). Innovative said they'd have such data in their poll while Mainstreet said they won't. Having watched the debate, my subjective opinion is that Wynne did relatively well (after the horrible "Sorry not sorry") while Ford probably didn't lose any vote or convince anyone. So I don't think this debate will cause the NDP surge, but what do I know? Talking about a NDP surge most likely means talking about a Liberal collapse. Is the OLP at its floor or can it go even lower than 20%? It already did in some polls, just not in the average.

The OLP being projected so low means a minority is unlikely. But it doesn't mean the Liberals won't offer a fight in many ridings. Depending on how resilient they can be (and where), it could create quite a lot of close races and uncertainty. I'm not trying to protect myself in advance, but people maybe need to lower their expectations regarding projections. With such wide swings and wild cards, I don't anticipate my model (or any other for that matter) to perform as well as in BC last year for instance. As always, focus on the probabilities more than on the raw, top line numbers.

That's all for now. I'll try to post an analysis regarding strategic voting during the afternoon or tomorrow. Also, remember that advance voting has started. As soon as we get data about the turnout, I'll make some riding adjustments.

Riding by riding projections:




Final debate is tonight, but who won the first one?

Tonight is the last televised debate between the leaders of the major parties in Ontario. Given the state of the race (a close contest between the Tories and the NDP) and the fact debates are usually major campaign events, it goes without saying that the stakes are high for Doug Ford and Andrea Horwath. They could literally win or lose the job of next Premier of Ontario tonight.

For the Liberals of Kathleen Wynne, the objective should be to avoid a complete collapse.

Speaking of which, the Sunday update to the projections is here below. No poll published yesterday (well except the updated Mainstreet tracker) but a couple of riding polls have generates adjustments large enough to change a couple of seats. Toronto Centre for instance is now projected NDP.

Voting intentions; Seat projections with confidence intervals; Chances of winning the most seats

As you can see, the Liberals are at only 4 seats! The chances they'll win zero are only 1.2% though. Still, think about it: there is a non-zero chance that the current majority party in Ontario (the party that has been in power since 2003! and has won 4 elections in a row) could be wiped out completely!

The riding by riding projections are available at the bottom of this article.


The debate

Ok, so tonight is the debate at 6:30pm EST. For Ford, this is the opportunity for him to finally show a good campaign (this isn't my subjective opinion talking here, polls clearly show that voters don't think Ford has done a good campaign or that he showed them reasons to vote for him) and rebound after losing the lead (in the popular vote). For Horwath, this is most likely the best opportunity for her to get the extra votes she needs to become Premier. It'll be interesting to see if she decides to go after the PC or the Liberals more.

This isn't the first debate however. There was one at the very beginning of this election, on May 7th in Toronto. Who won this one? I haven't actually tried to answer this question yet because I was still on vacation when this debate aired.

We mostly have two polls with data about this specific question. The first one, from Mainstreet, showed that 35% of Toronto residents thought Ford won, while 24% thought it was Horwath and only 19.3% for Wynne.

The second poll, from Innovative and conducted province-wide, had Horwath as the winner for 45% of respondents. Ford was 2nd with 25% while Wynne was 3rd again at 19%.

The two polls aren't directly comparable since one surveyed Toronto only while the other one was province wide. Still, they both agree Kathleen Wynne finished 3rd.

I always create a debate index to determine the possible impact of the debates. The idea is to remove (some of) the bias of the voters. Namely, if you are currently a Liberal voter, chances are you'll be a lot more likely to think Wynne did the best job. The Innovative poll actually proved that it was indeed the case on page 45 of the pdf.

To create the index, I do the ratio of the % of people who thought a candidate won over the % of people voting for that party. A ratio of 1 means you convinced the people who already liked you. A ratio above 1 means there are voters from the other parties who think you did the best job. By experience, this index has worked pretty well in predicting future changes in voting intentions (it worked in Quebec in 2014 and BC in 2017). A ratio above 1 usually means your party will improve in the polls in the next week.

If we do the calculations with these two polls (after some manipulations to remove the "undecided" as well as making the data comparable), we get the following index (remember that back then, the NDP was polling remotely as high as now):

Index
OLP - Wynne 0.780
PC - Ford 0.983
NDP - Horwath 1.581

Hindsight is always 20/20 but in this case, it seems the first debate would clearly have predicted a decrease of the Liberals and an increase for the NDP. Exactly what happened.

We'll have to wait and see for the post-debate polls tomorrow to be able to recalculate this index for the second debate. Hopefully pollsters do ask this question.


Projections update Saturday May 26th 2018

Not a lot to write about today. While Forum did come out yesterday with the NDP at a crazy 47%, other pollsters seemed to agree the race was much closer. Ekos, after leaking some partial results the day before showing the NDP up by 10, ultimately published a full poll with PC and NDP statistically tied. Abacus said on Twitter that they weren't observing a crazy NDP break-out beyond what they had already observed (interesting because Abacus was one of the first to catch the rise of the NDP a couple of days ago) and Mainstreet's tracker is slowly but surely converging to a close race. Finally, Innovative went kinda against the trend by publishing a poll with the PC still relatively comfortably ahead. It should be noted however that the Innovative poll was conducted from the 18th to the 23rd, so it's possible it didn't capture some of the NDP's jump that happened recently.

So, turned out that my long analysis of yesterday where I was showing the edge the PC has with vote efficiency is still valid and relevant! Yeah! By the way, you should definitely read this analysis!

Here below are the most up to date projections.

Voting intentions; Seat projections with confidence intervals; Chances of winning the most seats

Mainstreet also started publishing some riding polls, namely in Ajax and Guelph. Riding polls are by far less accurate but I can't ignore them. After all, I need to make projections because we don't get 124 riding polls. So when there is one, I include it in my forecast.

I can't reveal the exact numbers in Guelph but let's just say that the poll confirmed the Green leader Mike Schreiner is in the race. It's always very difficult to predict such a race where one party is putting everything it got into this one riding. Past results of Mike Schreiner didn't seem to indicate a crazy personal effect (as opposed to Elizabeth May or Andrew Weaver for instance) but it seems this is working better this time around. To be fair, Guelph is most likely a better riding and the vote is so split that it takes a low percentage of votes to win this year.

So can he win? Yes, absolutely. But it remains a 3-way race with the PC and NDP (even the Liberal candidate isn't fully out).

So the big change of the day is the confidence interval for the Green now being 0 to 1 instead of 0 to 0. Beyond this, I don't have anything to add for now. Enjoy your Saturday!

The Ontario election is now a competitive race between the Conservatives and the NDP

Aaaaaand we got a race! That's right, after weeks (months?) of the Progressive Conservative Party of Doug Ford clearly leading in the polls (and the seat projections), we now have a close race between this party and the NDP of Andrea Horwath. This is honestly quite impressive given how big the PC lead was just a few weeks ago. If the Tories end up losing this election, I'm sure people will compare them to the Maple Leafs and the famous blown lead to Boston.

After a few days where online polls were showing a tight race (with the NDP sometimes ahead) while phone polls (Ekos, Mainstreet) kept showing the PC way ahead, things changed yesterday. Frank Graves, Ekos' CEO has tweeted that they now see big shifts and the NDP is first. They should publish the poll today (the preliminary results leaked on Twitter were literally showing the NDP with a 10 points lead!). As for Mainstreet, its CEO Quito Maggi has also Tweeted something similar and we should start seeing the changes in their daily trackers (although, since it's a 3-days tracker, it'll take some time). We also know Forum will publish a poll on Thursday and rumours are that they have the NDP very, very high. Edit: Yup, the Forum poll is up and this party is at freaking 47%! If you want to see what the projections would look like with these numbers, just use the simulator.

All that to say that the projections below will most likely be outdated very, very soon. But even if that's the case, you can see a very different race with the NDP within striking distance of the Tories.
Edit: I updated the projections with the new poll from Forum. For the first time, the polling average has the NDP ahead.

Voting intentions; Seat projections with confidence intervals; Chances of winning the most seats

The probabilities for a majority are only 43.2% for the PC and 19.1% for the NDP. You can find all the details at the bottom of this post.


Since the projections won't be valid for very long, I want to focus instead on what a close PC-NDP race, with both parties around 36%, would look like. Of course, I'm assuming here that the race will indeed remain tight for a while and the NDP won't take a large lead. But this scenario is far from impossible but let's ignore it for now. If the rumored Ekos and Forum numbers are true, we are talking of an orange wave and all this article here is for nothing. Oh well.

The model seems to show that in this case (the close race), the Conservatives would have the edge with vote efficiency. In order to fully represent this, I used my simulations and estimated the function below. This is showing you the chances of the PC winning more seats (majority or not) than the NDP as a function of the PC lead in terms of votes (so it's negative if the NDP gets a higher percentage of vote). This is valid for the NDP and PC around 36% and the Liberals around 20%.



If no party has an edge, then the chances of winning should be exactly 50% if the vote spread is 0. In other words, if two parties were to get (say) 36%, then they both should have a 50% chance of winning. So one way to capture the edge of one party is to see where the 50% chance is.

As you can see, the 50% mark is when the PC is around -2 points, which means the NDP needs to win the popular vote by 2 percentage points (ex: 38% versus 36%) in order to have the same chances as the PC. Another way to say this is that if the two parties are actually tied, the PC has around 90% chances of winning more seats.

Just to be clear here, the projections above show the PC with around a 2 points lead and "only" 80% chances of winning. So you might be confused because this graph here shows that if the PC wins by 2 points (+2 on the x-axis), then the PC has over 90% chances of winning more seats. The probabilities here and in the projections above aren't representing the same uncertainty. In the projections, it is accounting for the possibility that the polls are wrong. And we know they can be. On the other hand the graph here shows the uncertainty due to the distribution of the vote (or the electoral system). So at PC +2, this is really looking at the possible seat distribution if the Tories were to actually receive 2% more votes. See the difference?

2 points is a fairly big advantage although this is less than the usually accepted 5 points lead the Liberals need to have on the PQ in Quebec since the Liberal vote is heavily concentrated in non-francophone ridings (note: I don't believe this 5 points rule is remotely true nowadays, but that's another story). This is just another example of the flaws of our electoral system where one party could get fewer votes but more seats.

Ok so why is the PC so much more efficient? One explanation is of course that my model is just wrong. This is completely possible but not super interesting (cause I won't know it unless we actually get the election).

Another explanation is because the NDP is wasting votes in some regions and too far behind in others. The PC vote is really evenly spread while the NDP vote is more concentrated in Hamilton/Niagara and the North. The NDP remains low in central Ontario and the east and, more importantly, is still behind the PC significantly in the GTA (in Toronto proper, it seems to be a 3-way race and you can just roll a die to make your prediction). The GTA is really the key. It's the source of many seats. In the projections above, the Tories are winning 21 seats in the 905 while the NDP is only at 8.

So what the NDP needs right now is to increase its share of votes in the GTA. And possibly in a non-uniform way but instead by increasing more in some key ridings. Of course, this is easier said than done. The vote inefficiency of the NDP in the GTA seems particularly severe.

Look at the map below, from Wikipedia. The GTA was the life source of the Liberals in 2014. The Tories actually lost seats there between 2011 and 2014 (as well as in the greater 905 in general). For the NDP, it finished below 20% in the 905 and only won 2 seats. The best measure of the inefficiency of the NDP vote is to calculate the standard deviation within the 905. The higher the standard deviation, the more volatile the NDP vote is, which means it's high in a few ridings but also very low in others. For 2014, the standard deviation was around 11 points for the NDP, almost twice as much as the PC (interestingly, the NDP is close to the Liberals but the Grits were also ahead there, so it's less of an issue for them). In average the NDP were almost 30 points away from winning ridings while the PC was only around 15 points away. And there as well the NDP's deficit varied quite a lot between ridings.




So the big question this year is really: who will win the lottery since the Liberals are about to lose most of their seats? Based on what we just saw, the PC is in a much better position to win these seats. The NDP really has only two ways here. First, it could simply take a huge lead overall in this region, thus compensating for its inefficient distribution. The second method is to hope that its gains in votes will be optimally distributed in the right ridings. In the east 905, we are talking of ridings in Vaughan for instance that could be the first to go NDP (among the ones not already going orange) or Ajax. On the other hand, the NDP would likely be wasting votes if it was to increase mostly in Thornhill or Markham. So either on the left or right side of the east 905, but not in the middle. In the west 905, the NDP should hope its gains aren't in the ridings closer to Hamilton (Oakville, Burlington) and more in ridings like Mississauga (and Brampton, but it's already projected to go NDP). In general, the west of the 905 (so the Peel Halton region) is better for the NDP than the east (the York region).

There are 4-5 ridings where the NDP is within striking distance. But the others? The others are harder. In some of them the NDP is likely still a good 20 points away. Still, taking 4 ridings away from the PC would go a long way to restore a more balance race between the two. Good riding targeting by the NDP and this race is a toss-up.

So what the NDP needs right now if it actually wants to win on June 7th is a good transfer of Liberals votes in the GTA. Andrea Horwath needs to inform and convince these voters. Inform them that the OLP has no chance of stopping Doug Ford (note: the latest Leger poll showed just how uninformed most voters were. 50% of Liberal voters thought their party had the best chances of winning the election!) and then convince them to actually make the jump. I'm sure she'll spend a lot of time repeating this during the last two weeks (and the debate).

The other region where the NDP seems to be currently inefficient is the east, including Ottawa. But this is less important than the GTA in the overall vote efficiency of each party. With that said, a OLP to NDP transfer in Ottawa (along with the one in the GTA) would give Horwath a majority.

So here you have it. Unless polls start showing a PC rebound or a large NDP lead, this race will become incredibly competitive. But the Tories have the vote efficiency going for them. They can lose the popular vote by 2 points and still win more seats. They would likely not get a majority however. But that's for another article.

I'll try to update the projections quickly when we get the new polls today.

The detailed projections from above:

Quick morning projections update, May 24th 2018

I already posted a long article about the general accuracy of Canadian polls here, I strongly suggest you give it a read. Since I know other polls will come today (Léger for instance), I just updated my projections using yesterday's Mainstreet tracker as well as the new Pollara poll (for which we have very little information).

So here it is folks, just the number, no much blah blah.

Voting intentions; Seat projections with confidence intervals; Chances of winning the most seats

Notice that the Conservatives of Ford are now less than 90% sure to win the most seats. Maybe more significant is the fact the chances of a majority are now barely above 60% (see below for details). If the trend goes on (and I suspect it'll with the new Leger numbers. Leger doing online polls which have shown a tighter race than phone ones), then it'll soon be toss up for a Tory majority...

Possible outcomes are here:


Finally the detailed projections. Have a nice day everyone!

Can we trust Canadian polls?

With the Ontario entering its final two weeks and the polls diverging between them (mostly depending on whether they are done online or by phone), one could wonder: how accurate are Canadian polls in general? Should we trust them?

The short answer is yes, but they tend to be off once in a while. And even when they aren't completely off -the majority of the time- there is still considerable uncertainty even if you average many polls.

Let's look at how I reached this conclusion.

I collected the polls during the last week (or so) of the campaign for 10 elections. They are Alberta 2012 and 2017, BC 2013 and 2017, Quebec 2012 and 2014, Ontario 2014 as well as the last three federal election. This is I believe a fairly good sample. It includes many different elections over many years. And it does include the two big misses (Alberta 2012 and BC 2013), so whatever results I find can't be called biased towards pollsters.

For each election, I averaged the polls. A simple, straight up average. No fancy adjustments based on sample size or whether the poll was done 12 hours after another one. No adjustment for incumbency or anything. I only made sure to include only one poll for each firm. I believe my personal average will beat this simple one most of the time, but at least it's clear and straightforward.

I then compared the average for the main parties (the exact number will vary. It's 4 in Ontario, 5 for federal election or Alberta, etc) to the actual results.

I then calculated what is called the MSE, the mean square error. It's simply the average of the square sof the differences. Why the square? So that it doesn't matter whether the polls were over or underestimating a party. A deviation of 2 points above or below will result in the same "penalty" (2 squared, so 4).

If you know your stats, for unbiased estimators (and we believe that polls are of this kind, meaning they don,t systematically under or overestimate voting intentions), the MSE is equal to the variance.

Once I have the MSE, I can take the square root and multiply by 1.96 in order to get the margins of error. This is really basic stats but don't worry if you don't get it. I'll interpret the results in a non (or less) technical way.

The table below presents the results for these 10 elections.




So in average for the main parties, polls will be within 2 points of the actual results. Interpret this number as saying that in average, if polls estimate a party at (say) 35%, then it was 2 points off from the actual result (so the party ultimately got 33% or 37%). Of course that's an average. Sometimes polls miss by more (they were off by 11 points for the PC in Alberta 2012 but only off by 1 point for the PC of Harper in 2015).

Notice that the 1.92% is for the poll average. If you look at individual polls, it'll be worse in general (they are few exceptions where a single poll would do better than the average such as the Nanos and Forum polls for the federal election in 2015).

It's good but not exceptional. Again this is after averaging usually 5-10 polls for each election. Being off by around 2 points in average can mean the outcome is very different from the one you were expecting. Imagine the poll average is showing a race tied at 36%-36%. Applying the average absolute error, it means the actual results could be 38% to 34%. I can guarantee you that in most cases, this is an error big enough to change the winner. Every percentage point can turn into multiple seats at that level. 2 points off was enough to give Trudeau a majority compared to the projections (they had other problems as well, but let's move on).

A simple observation: out of the 10 elections, the incumbent party was underestimated by the polls in 9 of those. The only exception is Quebec 2014. But in Quebec, it is expected that the Liberals will be underestimated. So we could almost say 10 out 10.

In average, the underestimation of the incumbent is a crazy 3.6 points! If we exclude the two big misses of Alberta 2012 and BC 2013, it gives us 2.1 points. This is why in my average I allocate more undecided to the incumbent as to take care of this systematic underestimation. And yes I'm currently doing it for the Liberals in Ontario. This boosts them by 1.5 points usually (it depends on recent polls, number of undecided, etc). I feel strongly about doing it because it is based on data and evidence. For all the mistakes I made in 2015 (not the best election for me), getting the CPC right is one of the few things I got!

Okay, let's go back to the general accuracy of polls. If we translate the MSE above into margins of error, it means that Canadian polls have actual, effective margins of error of 5.68% 19 times out of 20. This is equivalent to a poll with only 271 respondents (for a party at 35% as MoE are a function of the level of support)!

You might be confused here. Most polls out there are reporting MoE of around 3% to 3.5% for a sample size of 1000 respondents. So how can the average have larger MoE? Averaging multiple polls, independent from each other, should in theory significantly reduce this margin of error. That's true, but only for the theoretical margins of error that are there to take care of sampling variations (in other words: the fact we only randomly select a 1000 people to answer). But see, in the real world, sampling variation isn't the main issue. Far from it. In the real world, when we actually look at polling accuracy, we need to remember that other factors are at play: turnout, people changing their mind or making up their mind at the last minute, people lying to the pollsters, people refusing to answer but voting anyway, incorrect weighting by the pollsters, etc.

This is why I couldn't care less when we see online polls with non random samples to which traditional margins of error don't apply. So many people make a big deal out of it. I really don't because I know that the actual margins of error are much bigger and not a function of sampling. If sampling variation was the single greatest source of uncertainty, then averaging 5-10 polls of a 1000 respondents would give spot on estimates of the election. But it doesn't.

How does this compare to other countries? Well it's worse than for French president elections where the effective margins of error of 3.81% (and that's for 2002, 2007 and 2012, I haven't included 2017 where the polls were super accurate). Funny enough, French pollsters do NOT use probabilistic random samples. And yet they perform quite well. As for the US, this article mentions actual margins of 7 points, although I believe it'd be lower if we only look at presidential elections.

Out of the 10 elections I included in my analysis, two were clear misses (Alberta 2012 and BC 2013). Some were partial misses (Quebec 2012 for instance). If we remove the two really bad elections (let's just assume they were one-off, or well, two-off -possibly caused by online polls not being as well developed as they are now), you get the second column.

With margins of error of 3.2%, we see that Canadian polls are fairly accurate. Still, this means that at the end of this Ontario election, even after averaging 5-10 polls (that will be published during the last week), you should add and subtract around 3 points to the various averages. That gives you a fairly wide intervals (6 points is bigger than the current gap between the PC and the NDP!). With how votes are translated into seats, it could mean the PC for instance could win 50 or 80 seats! And remember that this is assuming the polls will be "right". If you want to factor in the possibility of a bigger miss like in Alberta, then you need intervals of almost 12 points!

Don't think however that the chances that the polls are 12 points off are the same as the chances they are 2 points off. That's not how you should interpret those margins of error. The upper and lower bounds should be seen as possible but unlikely results for the election based on the polling average. In order to illustrate this, I generated this graph below for a generic party polling at 35% (usually the score of the top party in one Canadian election). The graph gives you a visual representation of how likely each electoral outcome is. (technical note: this part works better if you accept a Bayesian point of view where the polling average represents the uncertainty we have for our knowledge of the correct voting intentions). This distribution is the result of simulating 10,000 samples with sample size of 270.


See, if a party is polled (in average) at 35%, then this party could actually receive only 30% of the vote. But while this is possible, it's highly unlikely (thus why the bars are smaller). Please do not interpret this article or this graph as a validation of saying "anything can happen". This gives the wrong impression that every outcome is equally likely when it isn't. Similarly, if your party is polling at 30%, your party can indeed be underestimated and win. But it's not the most likely scenario. In average you want your party to be polling high, there is no way around this. People pretending otherwise are either lying or stupid.

What this exercise shows is that despite my best efforts (or anyone doing this job), there will always be considerable uncertainty. This is why I believe people should focus more on my probabilities (where I account for this uncertainty with a well calibrated model) rather than the top line numbers. What it also shows is that people who simply say "polls are always wrong" are actually wrong. This is simply not true. Yes Canadian polls aren't perfect and they sometimes miss, but in general they do a fairly good job.