Can we trust the polls?

Isn't that the question? People in BC most likely remember the 2013 election where literally every poll had the BC NDP ahead (sometimes by a large margin) and the BC Liberals ultimately won by almost 5 points. With the Alberta election of 2012, this remains one of the biggest polling mistakes in Canadian history. More recently, we also remember the Brexit (polls where mostly predicting the "Stay" to win) or, of course, the election of Trump (again, an average error much smaller than for Alberta of BC). With that said, the polls in France last Sunday were spot on, which resulted in my projections also being spot on (yay!).

But how accurate are polls in average? In this post, I'm looking at that exactly. The idea is to find what I call the real margins of error, the ones accounting not only for sampling variation -the plus or minus 3% 19 times out of 20, as reported by the polls - but also for the estimation bias and the other sources of uncertainty.

Before we start, let's notice that the margins of errors reported by the polls are pretty much useless. First of all, with a lot of polls being conducted online, the samples aren't truly random and the basic theory of statistics don't really apply (note: it doesn't mean online polls don't work. The French polls were all done online for instance). Secondly, the margins reported are the theoretical ones for a party at 50% level of support. Yes, that's right, the margins actually vary with the level of support, so a party at 40% will have larger margins than a party at 5%. But literally no pollster in this country would ever tell you that. Finally, as mentioned above, they measured the sampling error. But I'll argue that we don't really care about that, not when we have many polls that we can aggregate. Fact is, if the only source of error was indeed the sampling one, polls aggregators like me should be able to nail every single election. But we clearly don't (sorry...). Why? Because measuring voting intentions has other sources of variations. People can change their mind, they can lie, they can refuse to answer, etc. All of these create a potential bias.

So what I did was take the latest polls from a few recent elections in this country (the last 3 federal elections, Alberta 2012, BC 2013, Ontario 2014, Quebec 2014, Alberta 2015). It's not a complete sample - I could have added the 2012 Quebec election where the polls were also off; I'm also missing some elections in smaller provinces. But hey, it's already a good source of data. It should be enough to give us a good idea of the average accuracy of Canadian polls.

For each election, I calculated the poll average for polls conducted during the last week of the campaign (without any adjustment from me) and I compared it to the actual results. Then I calculated the Mean Square Error. This is a statistical measure of the average error (it's technically the variance for an unbiased estimator). Taking the square root and multiplying it by 1.96 gives us our effective margins of error at 95%. Note that I only looked at the error for the top parties (usually the top 4 or 5 parties in province, in other words, the ones included in the polls). I also calculated the average absolute error (if the polls had a party at 40% but it got 42%, the error is 2 points).

Results are in the table below.

Source: own calculations based on the polls and elections results in Alberta 2012, BC 2013, Ontario and Quebec 2014 as well as the last three federal elections.


As you can see, the actual margins of error of a typical Canadian poll are relatively large. Much larger than what the standard margins of error would predict (again, if there was only sampling error, taking the average of 6-7 polls should give us almost perfect estimates). Even if we exclude the last two obvious mistakes (Alberta 2012, BC 2013), polls aren't that accurate - although they appear more accurate than French polls (that aren't done through random sampling).

What the margins above mean is that even after aggregating the polls, your estimates are still likely off. Another way to put it is that in average, polls are off by 2.09 points for each party. Make no mistake, that's a good level of accuracy. But when we transpose these polls into seats, if the gap between two parties is over or underestimated by 4 points (2 x 2pts), this can make huge differences. Each point can represent as many as 5-6 seats if a party is in the "paying zone" (above 25% usually).

I had written an article a couple of years ago using data from Abacus (whose CEO, David Coletto , was kind enough to me to give me access to their raw sampling data). I showed that if we accounted for the fact that people could change their mind, the actual margins of error should be closer to 7.5%. This seems to line up relatively well with my findings here. And this is consistent with the findings in the US.

So why are polls sometimes off? Well if only we knew for sure! However, David Coletto and myself looked into it and found that polls had a tendency to be more wrong when there was a big change to the turnout between elections. Neither the polling method nor the sample sizes had a significant impact of the overall polling accuracy, just the change in turnout.

So will it happen again this year in BC? We can't tell. I personally don't think this campaign will generate an increase in turnout the same way the last federal election did -I'm not even sure the turnout will go up at all, there just isn't the same enthusiasm. But we'll know better once we get the turnout for the advanced voting that begins this weekend. In any case, this article should show why using simulations is so important and why uncertainty will always be present while forecasting an election. Not only for the shares of vote, but even more so for the seat projections.

By the way, for the projections and simulations, I go with margins of error of 4.8pts. It's slightly less than the estimates here. I do it mostly for two reasons. The first one is that my own polling average is typically better than the raw, pure polling average used here. The second reason is that the numbers in this post should be seen as the upper bound. In particular I calculated the Mean Square Error for all the year and then I took the square root of this average. But I could have taken the square root for each year and then have done the average. As you may know, the average of the square root is less than the square root of the average. Doing so would yield margins of error of 4.96% (or 2.98% without Alberta and BC).

Who won the BC debate? Probably Andrew Weaver

We now have the numbers from Mainstreet post-debate poll, courtesy of the Vancouver Sun.



The newspaper is saying that "Horgan won by a slim margin". I disagree. Yes the raw numbers show that but the raw numbers are not what is really interesting. Simply asking people "hey, who won the debate?" is not the best question because people's own political bias will play a role. In particular, Liberal voters will tend to say that their leader, Christy Clark, won. And the same for NDP voters, etc. It's always difficult to be objective to make these judgments. After all, if you already like John Horgan and want to vote for him, you are likely to find that he did well during the debate.

So when you see that only 33% of people think Horgan won, this is bad news for the BC NDP that was polling above 40%. On the other hand, when 29% say Andrew Weaver won, this is much higher than the 19-20% of this party in the polls.

This is why I always do my debate leader index (trademarked by me!). It worked fine during the last federal election to predict a rise of Trudeau. It also worked very well in 2014 in Quebec to predict that Pauline Marois, the PQ leader, actually had had a bad debate and would see her party go down. I find this index as a much more useful measure than the raw numbers. See this index as a more predictive measure of how voting intentions will or could be influenced by the debates.

So for the televised BC debate of 2017, here is the index. I have two measures, one is using the pre-debate polling average while the other one is using the voting intentions post-debate among the people who watched the debate. Somebody on Reddit made the comment that the population who watches the debate is different (which is likely true. Prove of that being that the NDP is behind the Liberals in the voting intentions of this poll only), so I should use the second measure. I think there are good arguments for both approaches, so I'm using both. results are very similar anyway. I dropped the undecided for both the voting and winning the debate questions in order to be fully compatible.

Debate index*
Debate index with debate poll only
Christy Clark
0.83
0.8
John Horgan
0.89
1.0
Andrew Weaver
1.65
1.4

*Percentage of people who said this candidate won the debate divided by the voting intentions for the party of the candidat

Remember, an index greater than 1 means you managed to get people voting for another party to say that you won. A score below 1 means that you didn't even convince all the people already voting for you.

There is no question that Andrew Weaver won. We already knew that he generated a lot of searches and interest on Google during and after the debate, but we now have another confirmation (or evidence if you prefer) that he "won" the debate.

You have people who don't vote Green (at least not yet?) who think Weaver won the debate while you have a good share of NDP voters who even admit Horgan didn't win (using the first approach). Notice however that the measure here might have a bias as I have observed that people tend to be nicer to the leaders of small parties (think Elizabeth May at the federal level). But Weaver and the Green are polling at 20%, they are at times ahead on the Island, they aren't "small" anymore, or at least not currently. Still, the bias might exists.

As for Christy Clark, her index isn't good but the same poll indicates that her favourability increased. (the numbers here are terrible for Horgan who ends up being barely more popular than Clark). With that said, the numbers from Mainstreet are hardly an indication that she did enough to climb back in the polls (of course, these polls could be wrong as they were in 2013, but I'm assuming they aren't).

Winning a debate is often half the battle. The other half being about winning the post-debate spin. We'll see how the Green and Andrew Weaver will fare in the next few days. But for now, I'd confidently say that Andrew Weaver won the debate. After, there is no guarantee that he (or his party) will indeed increase in the polls in the coming days. But he's the most likely to benefit from his performance.

Update: it seems I'm not the only one with this conclusion since the UBC Prediction Market (where people bet actual money) has seen a sharp increase in the number of seats for the BC Green (up to 9 as I'm writing this line).

After the BC debate...

Just one short blog post while waiting for the Mainstreet post-debate poll.

I went to Google Trends to see which leader generated more searches. This is the result :

 




(Note: I did an exact term search instead of a topic search because this method doesn't work for John Horgan. Results were similar when I tried to do a topic search for Clark and Weaver).

Update: the graph above is directly updated by Google, so the numbers by the time you'll look at them might be different than when I posted them; So here is a print screen of what it looked like after the debate:


Weaver has been behind the other two ever since the election started, so it's pretty significant for him to be ahead of Horgan - he was even tied with Clark right after the debate.


We'll wait and see if this matches with what Mainstreet will find. But based on Google Trends and keeping in mind the relative levels of support for each party (and the fact that Christy Clark just always generates more searches), it looks pretty good for Andrew Weaver.

And you, what did you think? Leave a comment and/or answer the non-scientific poll below:


Who do you think won the BC debate?

Christy Clark from the BC Liberals
John Horgan from the BC NDP
Andrew Weaver from the BC Green
Nobody won
Poll Maker

Who vote for the BC Green Party?

This post intends to take a deep look into who is voting for the Green party and the implications for the projections. It'll mostly be a technical, long boring post. You have been warned!

1. The Green party can attract Liberals voters

Let's look at 2013. The Green technically got a smaller share of votes than in 2009 (8.13% vs 8.21%), but this is highly misleading because they were running fewer candidates - only 61 out of 85, while they had a full slate in 2009. This is important because it means the Green swing was actually positive in many ridings.

In the 24 ridings without a Green candidate in 2009, the Green got 31,358 votes. That's about 23% of all the votes received by the Green party that year. You should see the importance of taking that into account while estimating coefficients or looking at the results. If the Green managed to stay around 8% province-wide while losing 23% of their votes in some ridings, it must mean this party went up in the other electoral districts, significantly in many places

Without the votes in these 24 ridings, the Green would have been at only 6.3% province-wide in 2009. So when they got 8.13% in 2013, it actually means an average swing of almost 2 points!

The big question here is obviously: what happened to these votes in 2013? Did the people who voted Green in 2009 in these 24 ridings simply didn't vote? Did they vote for another party? And this is where I become useful and where statistics and regressions can be used. Finding the answer to this question can give us some indications as to who these Green voters are and which other party they could support.

I tried to look at it with different approaches First of all, I looked at the results of 2013 in each riding and tried to explain the share of votes of the Liberals and NDP with the share of votes of each party in 2009. Doing so revealed that the share of a Liberal candidate in 2013 was highly correlated with the share in this riding 4 years ago, that's logical. It's the same for the NDP. But for the share of votes of the Green, I found significant effects on the shares for the Liberals but not the NDP. Specifically, my estimations indicate that about half the votes for the Green in these 24 ridings went to the Liberals in 2013 while almost none went to the NDP. There are a lot of potential issues with my method here, I'm fully aware of it -I've done enough econometrics in my life to be aware of that- but it's still interesting.

Then I tried to use variations instead (so the swing in each riding between 2009 to 2013). There as well the results showed me that the BC Liberals got about half the votes while the NDP didn't get anything. This is very significant because this explains, at least partially, while the BC Liberals increased in some ridings/regions while dropping overall province-wide. One way to make sense of this is to realize that the BC Liberals maybe received some votes from the Green voters in ridings where the Green didn't have a candidate anymore -in the Okanagan valley for instance, while the Liberals dropped more in other ridings/regions. In ridings where there was a Green candidate both in 2009 and 2013, the Liberals dropped much more than in the ridings without a Green in 2013 (where they actually increased in average!).

Again, the econometrics is very limited but the fact that I got the same results with both methods is encouraging.

In average, it appears that the missing Green candidates gave a bonus to the Liberals of 3.7 points (again, in the 24 ridings).

This can seem very counter-intuitive. Most people usually assume that Green voters have the NDP as second choice. But polls have shown Green voters to be more divided than that. The final Mustel poll in 2009 was showing exactly that: Green voters were split regarding their second choices.

Obviously the current polls are showing a very different situation. Ipsos Reid shows that 42% of Green voters have the NDP as their second choice while only 14% have the Liberals. The latest Mainstreet polls even has the NDP as the second choice of 74%(!) of Green voters -a number much higher than in previous polls from the same firm though. On the other hand, many Mainstreet polls have shown the Liberals voters with the Green as their main second choice.

What this should at least shows is that Green voters don't simply come from or go to the NDP. The communicating vases are more complex than that. The simple fact that current polls show the NDP at its 2013 level while the Liberals are down should be another evidence of that.

Another example of this is Oak Bay-Gordon Head, the riding of the Green leader, Andrew Weaver. He won his seat in 2013 thanks to an incredible personal effect. My estimations show that his result was 27 points above what a "generic" Green candidate would have been expected to receive. It was obviously thanks to him and the incredible campaign the party ran for him there. Still, as far as I can tell, this is the biggest personal effect I've seen, bigger than the one for Elizabeth May.

Yet, the same estimations show this effect was taken equally from the Liberals and the NDP, both losing between 13-15 points. And remember this effect is estimated while also including regional effects. In Victoria Beacon-Hill where former Green leader Jane Sterk was running, I also find that her personal effect (around 12 points) was taken much more from the Liberals than the NDP.

Of course, there can be many explanations. For instance, it's possible that BC Liberals voters from the Island are more likely to vote Green than say Green voters in the Interior. Or, in the case of Victoria-Beacon Hill, some Liberals might have tried to make the NDP lose this safe seat by voting Green. But again, it at least shows that the Green Party seems capable of attracting votes from both parties.


2. Implications for the projections this year

There are mostly two elements to this part. The first one is the inverse of the first part of this post: what will happen in the ridings where the Green are now running a candidate and they weren't 4 years ago? Surely the swing there will be different from the swing elsewhere. Will we observe a "catching up" effect where the results in 2017 would be more along the line of "what should have happened in 2013+swing between 2013-2017"? If that was the case -and looking at what happened to the BC Conservatives in 2009 in ridings with new candidates, we have reasons to believe we'll indeed observe that- then it means the provincial swing for the Green party will be misleading again.

The Green Party is running 83 candidates this year (out of 87 ridings). That's a significant increase in the number of candidates. So when we see the Green being polled 10-12 points above their score of 2013, some of this increase will likely only come from the ridings with a new candidate. This could be fairly significant for the chances of the Green to get MLAs this year. If one is making projections only looking at the provincial swing, this will likely overestimate the swing in ridings where there was already a candidate -and those ridings are the ones the Green can win. I admit I haven't fully accounted for this yet and I'll do so by the end of the week. Make no mistake, this is a lot of work and it requires some assumptions.

The second part: in the ridings with a new Green candidate, where will the vote come from? Could we see the opposite of what we saw between 2009-2013? Meaning the Liberals could be the party most affected by these new candidates. There as well, I'll try to adjust the model by the end of the week.


Conclusions

There are really two things to take away from this blog post. First of all, the Green Party is likely attracting voters from both parties. It is indeed taking seats from the BC NDP on the Islands, but the voters aren't all coming from the NDP.

Second of all, the fact the Green party is running more candidates than last time likely means projections could overestimate the Green swing in ridings where they already had a candidate. Given that every single seat the Green are aiming for is in this situation, you can see the potential for making mistakes.

BC NDP takes a commanding lead (well, maybe)

After a good 10 days without polls, here are two new ones. First of all is yesterday's Justason one. And today we have a new one from Mainstreet. Both polls show the NDP ahead, although its lead is significantly bigger according to Mainstreet.

I updated the projections and you can find more details on the BC Election 2017 page (have you bookmarked it already?!).

Voting intentions; Seat projections with confidence intervals; Chances of winning the most seats

The title used the term "maybe". This is because there are a couple of things to keep in mind before calling for a guaranteed NDP majority. First of all, the Justason poll. Sample size is technically 1128 respondents but a whopping 37.5% either don't know, are not eligible or wouldn't vote. That's almost 40% of the sample! So the numbers they published are among decided voters and there are only 712 of them. The firm then goes on to break down the results by 6 sub regions. That's completely ridiculous. Victoria for instance has a grand total of 60 respondents! Why would a firm even bother publishing numbers based on such ridiculously small sample sizes? That reminds me of this poll during the last federal election that literally had 10 respondents for the CPC in the entire Atlantic Canada. Here's my advice: if your sample is small, don't publish regional breakdown, it's pretty useless.

As for the Mainstreet one, while it doesn't suffer from a tiny sample, there are a couple of things that you should know about. First of all, they dropped the BC Conservatives, which makes sense. But because they don't even offer an "other" option, it means that Lib+NDP+Green sum to 100%. Elections in BC usually have a higher share of others than other provinces. We can expect at least 2-3% in this category (included for the 10 BC Cons candidates). I really wish Mainstreet would offer "other party" as a choice. Secondly, Mainstreet is showing a really important swing in favour of the BC NDP (+5 points) and against the Liberals (-3). Is it really what is happening? Is the strategy of John Horgan to pretty much stay in the greater Vancouver paying off? Mainstreet sure indicates so since the NDP is now ahead by 16 points in this region. But is the swing real or is it simply the results of random sampling? Justason (with tiny samples) has the NDP barely ahead in the same region. Also, the number of undecided actually increased since the last poll (especially in the rest of BC). That doesn't make any sense, we shouldn't see more undecided after 2 weeks of campaign. Finally, the second choices of the Liberals have massively shifted from having the Green as the main second choice to the NDP. I personally think this might be the result of a small sub-sample there (it's only for non-decided but leaning Lib voters).

Anyway, both polls have been added to the average. Thank god for the Justason one otherwise my poll average would now simply be based on Mainstreet (since all the other polls are getting old) (Note: I have nothing against Mainstreet, on the contrary, but I don't like it when my average is driven too much by only one firm or poll).

My projections have changed quite a bit. I personally don't like when they changed drastically overnight but there isn't much I can do with with so few polls. What was a close race between NDP and Liberals is now turning into a comfortable lead for the former. Comfortable being maybe an overstatement. With 31% chances of winning the most seats, Christy Clark is still in this race but her odds are decreasing.

The Green party (that revealed its full platform yesterday) is definitely sticking around. We are at the halfway mark of this campaign and their support appears to be solid. Mainstreet even shows a big increase in the number of "strong supporters" of this party (at 67% of strong supporters, this party is barely behind the other two). But the electoral system is still hurting this party quite hard. If we had a PR system, Andrew Weaver would be on his way to get around 17 MLAs instead of the 3 projected here. To be fair, the simulations indicate that the Green could elect as many as 11 people but this is obviously an unlikely crazy perfect case scenario. They do have, however, around a 25% chance of getting 5 MLAs or more. But to do so, they will either need to beat their poll numbers or, more likely actually, rely on strong local campaigns and candidates.

As a final note, people naturally haven't forgotten about how wrong the polls were in 2013. And it's too early to look at these polls -especially the Mainstreet one - and declare a NDP victory. But the big difference so far is that the BC Liberals don't appear to be climbing back during the campaign as they did four years ago. To be fair, they also didn't start as low as they did. Still, I'm not saying we are immune against another polling error, but the dynamic is at least different. Now, the next big event is naturally the debate on Wednesday. Hopefully we'll get some polls asking who won.

What if BC had the French electoral system?

Yesterday was a good day for the polls in France and, by extension, for my projections and myself. The polls for the presidential election in France were spot on and therefore so was I. To be fair, it's much easier to simply aggregate the polls than to convert them into seats anyway.

Let's not forget the BC election though. We haven't had many polls (yet?) but I made some substantial updates to the model. First of all, I took into account where the Green and especially the BC Conservatives were running candidates. The Green have 83 candidates (out of 87) while the BC Cons only have 10. Honestly I should just drop the Conservative from my projections but they were already included, so it's actually less work to leave them there.

I also made some adjustments to some ridings. Some adjustments are coming from the lack of a Conservative candidate (in ridings where this party got over 10% last time around), others are coming because I have reasons to believe my projections were wrong. One such example is in Victoria Beacon-Hill where I previously had it leaning Green. But that was with the built-in bonus of having the Green leader running there. It's not the case anymore and my estimations show Jane Sterk did benefit from a significant bonus. On the other hand, I now have the Green ahead in Cowichan Valley as it appears fairly obvious the Green are running a better campaign than the BC NDP (whose base was kinda split during the nomination process, with the former campaign manager now running as an independent!).

You can find the updated projections on the BC Election 2017 page and you can use the updated model in the simulator.

I wanted to try a little exercise related to the French election. France uses a run-off system where, if no candidate gets more than 50% of the vote in the first round, the top 2 then go to a run-off 2 weeks later. I wanted to simulate the effects on the BC election.

To do so, I used my projections as the baseline results and I redistributed the votes of the other parties in every riding where no candidate was getting a majority. I used the second choices provided by Mainstreet and Ipsos.

The results? See below:



The BC NDP would go from having a small lead to being pretty much guaranteed a majority. For the Green it wouldn't change anything. It's because they are winning their three ridings "comfortably" and they are the main second choice of the main two parties.

Only 39 ridings are currently projected to have a a candidate over 50%. As for being in the top 2 (i.e the run-off), the Liberals qualify 79 times, the BC NDP 85 and the Green 10. These numbers might seem crazy but remember there are only really 3 parties, so it really limits the possibilities.

An example of a riding that would flip from Liberals to NDP with the French system is Boundary-Similkameen. There are 5 ridings affected in total: The one mentioned plus Courtenay-Comox, Maple-Ridge Mission, Surrey-Panorama and Vancouver Langara.

I think that beyond the fun exercise, this blog post should illustrate how close this election is. You flip 5 ridings and you have a NDP majority. If it remains really like this until May 9th, my final projections will be quite uncertain.

Jour J en France: les toutes dernières tendances

Jour J en France: les toutes dernières tendances
Nous y voici, le 1er tour de l'élection présidentielle française. Amis français, j'espère que vous irez voter aujourd'hui. Vos expatriés de Montréal et Vancouver (et ailleurs) ont voté hier et ont souvent dû attendre des heures.

Si vous voulez voir mes projections finales ainsi qu'une analyse détaillées, je vous réfère à mon billet précédent.

Nous aurons les premières estimations officiellement (et normalement) à 20h, heure de Paris. En suivant les médias suisses ou belges, on a souvent les tendance plus tôt.

Que savons-nous jusqu'à présent? La participation dans les territoires d'outre-mer est en baisse par rapport à 2012. Une abstention élevée devrait normalement aider Marine Le Pen et François Fillon.

Il y a aussi eu un sondage publié hier en Belgique (interdit de publier en France) montrant Le Pen devant (26%) et Fillon 2e (22% contre21% pour Macron). Effet de la fusillade de la semaine passée? Ou juste un sondage bidon? Après tout, Twitter parlait du fait que ce sondage provenait possiblement du camp Fillon. La Tribune de Genève avait aussi un sondage dernière minute mais ce dernier montrait les mêmes chiffres que les autres sondages.

Si on va sur Google Trend, la tendance des dernières 48h est pour Macron et Mélenchon.


Buzzpol/Filteris continue de voir un 2e tour Marine Le Pen contre François Fillon en se basant sur le "buzz" en ligne.

Finalement, les marchés de paris ont un 2e tour opposant Macron à Le Pen avec environ 65% de chances tandis que Le Pen vs Fillon est à 18%. ce qui correspond à peu près à la moyenne des modèles (voir mon billet précédent; Mon modèle est l'un de ceux avec la plus faible incertitude).

Voilà, je verrai si je ferai une mise à jour pendant la journée. Sur Twitter assurément, mais pas forcément ici.