With one week to go, the BC Election is still too close to call

Seven days, that's how long the three leaders in BC have to convince people to vote for their party (or their candidates if you prefer). And with one week to go, there is enough uncertainty that we can't really make a call. Not because I'm chicken and don't want to be wrong, but because there is enough uncertainty (polls, electoral map, local effects, etc) that multiple scenarios are possibles.

To be fair, we've had so few polls that the projections have been fairly stable. Yesterday saw the publication of the first poll of this campaign showing the BC Liberals ahead, maybe an important moment in this election. The Justason poll, conducted online with 1127 respondents (but only 768 decided voters) has the party of Christy Clark slightly ahead of John Horgan's. Compared to the previous poll from this firm, this is an increase of almost 2 points for the incumbent party and a corresponding drop for the NDP. The Green also increasing by two points, our first post-debate confirmation of what we were saying: Andrew Weaver likely won the first debate.

The lack of polls has created a somewhat puzzling narrative in the media. Many people (and pundits, etc) seem to think that polls have all shown a large lead for the NDP. This isn't really true. What happened was this one Mainstreet poll with the NDP enjoying a a 10-points lead. A poll actually published around the same time as the last Justason one that was showing a much smaller lead for the challenger party but got way less media coverage. I have always been of the opinion that this was a very close race and I feel confident in saying that the next couple of polls will prove me right.

I have updated the BC Election 2017 page with all the details, but here are the latest projections.

Don't be fooled by the 8 seats lead of the BC NDP. This isn't a safe lead at all. First of all, any small mistake by the polls -for instance an underestimation of the Liberals or the Green by a couple of points- could cut the number of seats sharply. There are also currently 17 ridings decided by fewer than 5 points. Among them, the BC Liberals are projected to win 4 and the BC NDP 13. The model currently seems a little bit unbalanced. My job during this last week will be to focus on these ridings and see what information I can find. But the 4-13 should show you that the BC NDP will need to get the vote out in some key ridings.

The Green Party could well hold the key of this election. The electoral system seems incredibly unfriendly to it right now. Indeed, getting 3 MLAs (out of 87) with 20% of the vote is a really, really bad conversion rate. At least these three seats seem relatively safe (probabilities of winning go from 76% in Cowichan Valley to >99% in Oak Bay-Gordon head and Saanich North and the Islands). After come the possible surprises in Saanich South, the two Victoria ridings and Esquimalt-Metchosin. You also a couple of ridings with non-zero chances. In the possible-but-not-currently-likely ridings for the Green, all four are currently projected NDP. This adds a significant level of uncertainty to the projections for the party of John Horgan. All it takes for the BC NDP to pass from 46 seats (a majority) to 42 (a minority or a defeat) is a concentrated Green surge on the Island.

Advance voting started this weekend - I voted on Saturday in False Creek- and early turnout data from Election BC isn't super indicative of any trend. Esquimalt-Metchosin's turnout seems on the rise, which would be a good news for the Green, but Saanich North and the Islands seemed more quiet after the first day. I'll keep an eye on it with the updated data tomorrow. If there is indeed a Green surge on the island, I believe it should at least be visible in an increased turnout.

That's pretty much it for now. I'll work hard this week to be able to provide the best possible projections next Sunday or Monday. Hopefully we get a couple of polls. But I strongly suspect that this election will remain uncertain until the very end.

The uncertainty of the BC election 2017

It seems pollsters have just decided to ignore British Columbia this time around, except of course Mainstreet Research. To add insult to injury, there were polls done in other provinces (for Ontario) or at the federal level.

Also, I'd like to rant quickly about some of the stuff we see labelled as "polls". First of all, there is the "News1130/PlaceSpeak Poll" where people can simply go and voluntarily declare who they intend to vote for. Look, it's an online poll. They can try to make it look better by writing stuff such as "PlaceSpeak's unique location-based technology is pioneering a new model for real-time polling within electoral districts (ridings)", it doesn't make it an actual poll with proper sampling and weighting. I mean, come on, we already have enough people criticizing well done online polls, let's not start doing what 1130 is trying to do. The worst part is when you read things such as "... require more [responses] for a representative sample". No no and no. It doesn't matter how many responses you'll get, it's not gonna be a representative sample. I'm baffled with what I see here and News1130 should quite frankly be ashamed of themselves.

Then we got an actual poll in the riding of Kamloops-North Thompson made by Justason. Except this survey only has 239 observations. I know riding polls tend to have smaller sample sizes, but 239 is just too small. And as they did with their already small sample for their latest provincial poll, Justason then goes on and breaks down the results by age, gender, etc. Again, come on! You can't publish results based on literally 10-20 observations by group. You are losing credibility. Oh, by the way, there seems to be a Communist surge in Kamloops with the candidate of this party pointing at 8% (actually 13% among men!). I'm not even sure what to do with this poll here. The numbers don't match the projections but the sample size is so small that the margins of error are huge. Let's just put it as "the Liberals might be higher -at least in the interior- than what the polls are showing? I guess. Good enough for me.

Ok, let's go back to the projections. I decided to take a look at the uncertainty that exists and the possibles outcomes for each party.

The first graph shows the possible seat distributions for all three parties (I put the Green on their own graph because the x-axis is very different for them).

As you can see, the BC NDP is favourite at this point but  the BC Liberals are still in this race. How could they still win? The polls could underestimate them (as they did last time; Remember that polls aren't perfect). Or the Liberals could have a more efficient vote. Both possibilities are included in my simulations. Still, make no mistake, even though polls can be wrong, it's still better to be ahead in them than trailing. Keep in mind though that we literally only had two polls in the last 10 days, so the poll average is likely not as accurate as I'd like to be.

Because British Columbia still mostly has only two major parties (sorry Green voters, you might change that this election), the distributions show that both a Liberals and NDP majority is possible. I know it sounds like I'm trying to cover all the bases as not to be wrong, but this is really just what it is. Given the usual accuracy of polls and the fact that transposing voting intentions into seats is not an exact science, there is more than enough uncertainty that both outcomes are possible, along with some minorities and even ties. Anybody claiming to know for sure what will happen is either lying or delusional.

As for the Green, their distribution is heavily concentrated around 3-4 seats. There is a best case scenario of 9 or 10 -even 11 actually- but they are highly unlikely. This is why I'm a little bit puzzled by the UBC prediction market that was giving them as many as 10 seats yesterday. As I'm writing this, it's 7 seats and 17.3% of the vote. Unless the Green party manages to concentrate its increase in these 7 ridings, I'm a little bit skeptical. Yes Weaver did well at the debate but winning seats under our electoral system is still a hard task for smaller parties.

Finally, let's look at how many ridings are safe for each party.

There are very few ridings with no uncertainty. The BC Liberals currently have 5 ridings where my projections give them 100% chances of winning while the BC NDP has 6 seats in this situation. I'd like to remind the readers that the model has not made a mistake yet while projecting a candidate at 100%.

The Green party has absolutely no chance in a large number of ridings (61 to be exact). There as well, I have never made a mistake when projecting someone at 0%. It came close (some candidates with like 0.5% chances ultimately won) but technically haven't made a mistake yet with the current model.

The last graph shows what kind of lead the NDP currently has. Yes they would likely win more seats but many of them are not guaranteed. They simply have an edge with the number of seats projected with 60-80% chances. This is good news for John Horgan but this is the type of seats you can easily lose if the polls are slightly wrong. If you look at the projections, you'll see that a 75% chance of wining usually means a 5-6 points lead. Don't think this is ultra comfortable.

That's all for now. Remember that advance voting starts today. So if you already know who you'll vote for, you can do your civic duty now and relax on May 9th.

Can we trust the polls?

Isn't that the question? People in BC most likely remember the 2013 election where literally every poll had the BC NDP ahead (sometimes by a large margin) and the BC Liberals ultimately won by almost 5 points. With the Alberta election of 2012, this remains one of the biggest polling mistakes in Canadian history. More recently, we also remember the Brexit (polls where mostly predicting the "Stay" to win) or, of course, the election of Trump (again, an average error much smaller than for Alberta of BC). With that said, the polls in France last Sunday were spot on, which resulted in my projections also being spot on (yay!).

But how accurate are polls in average? In this post, I'm looking at that exactly. The idea is to find what I call the real margins of error, the ones accounting not only for sampling variation -the plus or minus 3% 19 times out of 20, as reported by the polls - but also for the estimation bias and the other sources of uncertainty.

Before we start, let's notice that the margins of errors reported by the polls are pretty much useless. First of all, with a lot of polls being conducted online, the samples aren't truly random and the basic theory of statistics don't really apply (note: it doesn't mean online polls don't work. The French polls were all done online for instance). Secondly, the margins reported are the theoretical ones for a party at 50% level of support. Yes, that's right, the margins actually vary with the level of support, so a party at 40% will have larger margins than a party at 5%. But literally no pollster in this country would ever tell you that. Finally, as mentioned above, they measured the sampling error. But I'll argue that we don't really care about that, not when we have many polls that we can aggregate. Fact is, if the only source of error was indeed the sampling one, polls aggregators like me should be able to nail every single election. But we clearly don't (sorry...). Why? Because measuring voting intentions has other sources of variations. People can change their mind, they can lie, they can refuse to answer, etc. All of these create a potential bias.

So what I did was take the latest polls from a few recent elections in this country (the last 3 federal elections, Alberta 2012, BC 2013, Ontario 2014, Quebec 2014, Alberta 2015). It's not a complete sample - I could have added the 2012 Quebec election where the polls were also off; I'm also missing some elections in smaller provinces. But hey, it's already a good source of data. It should be enough to give us a good idea of the average accuracy of Canadian polls.

For each election, I calculated the poll average for polls conducted during the last week of the campaign (without any adjustment from me) and I compared it to the actual results. Then I calculated the Mean Square Error. This is a statistical measure of the average error (it's technically the variance for an unbiased estimator). Taking the square root and multiplying it by 1.96 gives us our effective margins of error at 95%. Note that I only looked at the error for the top parties (usually the top 4 or 5 parties in province, in other words, the ones included in the polls). I also calculated the average absolute error (if the polls had a party at 40% but it got 42%, the error is 2 points).

Results are in the table below.

Source: own calculations based on the polls and elections results in Alberta 2012, BC 2013, Ontario and Quebec 2014 as well as the last three federal elections.

As you can see, the actual margins of error of a typical Canadian poll are relatively large. Much larger than what the standard margins of error would predict (again, if there was only sampling error, taking the average of 6-7 polls should give us almost perfect estimates). Even if we exclude the last two obvious mistakes (Alberta 2012, BC 2013), polls aren't that accurate - although they appear more accurate than French polls (that aren't done through random sampling).

What the margins above mean is that even after aggregating the polls, your estimates are still likely off. Another way to put it is that in average, polls are off by 2.09 points for each party. Make no mistake, that's a good level of accuracy. But when we transpose these polls into seats, if the gap between two parties is over or underestimated by 4 points (2 x 2pts), this can make huge differences. Each point can represent as many as 5-6 seats if a party is in the "paying zone" (above 25% usually).

I had written an article a couple of years ago using data from Abacus (whose CEO, David Coletto , was kind enough to me to give me access to their raw sampling data). I showed that if we accounted for the fact that people could change their mind, the actual margins of error should be closer to 7.5%. This seems to line up relatively well with my findings here. And this is consistent with the findings in the US.

So why are polls sometimes off? Well if only we knew for sure! However, David Coletto and myself looked into it and found that polls had a tendency to be more wrong when there was a big change to the turnout between elections. Neither the polling method nor the sample sizes had a significant impact of the overall polling accuracy, just the change in turnout.

So will it happen again this year in BC? We can't tell. I personally don't think this campaign will generate an increase in turnout the same way the last federal election did -I'm not even sure the turnout will go up at all, there just isn't the same enthusiasm. But we'll know better once we get the turnout for the advanced voting that begins this weekend. In any case, this article should show why using simulations is so important and why uncertainty will always be present while forecasting an election. Not only for the shares of vote, but even more so for the seat projections.

By the way, for the projections and simulations, I go with margins of error of 4.8pts. It's slightly less than the estimates here. I do it mostly for two reasons. The first one is that my own polling average is typically better than the raw, pure polling average used here. The second reason is that the numbers in this post should be seen as the upper bound. In particular I calculated the Mean Square Error for all the year and then I took the square root of this average. But I could have taken the square root for each year and then have done the average. As you may know, the average of the square root is less than the square root of the average. Doing so would yield margins of error of 4.96% (or 2.98% without Alberta and BC).

Who won the BC debate? Probably Andrew Weaver

We now have the numbers from Mainstreet post-debate poll, courtesy of the Vancouver Sun.

The newspaper is saying that "Horgan won by a slim margin". I disagree. Yes the raw numbers show that but the raw numbers are not what is really interesting. Simply asking people "hey, who won the debate?" is not the best question because people's own political bias will play a role. In particular, Liberal voters will tend to say that their leader, Christy Clark, won. And the same for NDP voters, etc. It's always difficult to be objective to make these judgments. After all, if you already like John Horgan and want to vote for him, you are likely to find that he did well during the debate.

So when you see that only 33% of people think Horgan won, this is bad news for the BC NDP that was polling above 40%. On the other hand, when 29% say Andrew Weaver won, this is much higher than the 19-20% of this party in the polls.

This is why I always do my debate leader index (trademarked by me!). It worked fine during the last federal election to predict a rise of Trudeau. It also worked very well in 2014 in Quebec to predict that Pauline Marois, the PQ leader, actually had had a bad debate and would see her party go down. I find this index as a much more useful measure than the raw numbers. See this index as a more predictive measure of how voting intentions will or could be influenced by the debates.

So for the televised BC debate of 2017, here is the index. I have two measures, one is using the pre-debate polling average while the other one is using the voting intentions post-debate among the people who watched the debate. Somebody on Reddit made the comment that the population who watches the debate is different (which is likely true. Prove of that being that the NDP is behind the Liberals in the voting intentions of this poll only), so I should use the second measure. I think there are good arguments for both approaches, so I'm using both. results are very similar anyway. I dropped the undecided for both the voting and winning the debate questions in order to be fully compatible.

Debate index*
Debate index with debate poll only
Christy Clark
John Horgan
Andrew Weaver

*Percentage of people who said this candidate won the debate divided by the voting intentions for the party of the candidat

Remember, an index greater than 1 means you managed to get people voting for another party to say that you won. A score below 1 means that you didn't even convince all the people already voting for you.

There is no question that Andrew Weaver won. We already knew that he generated a lot of searches and interest on Google during and after the debate, but we now have another confirmation (or evidence if you prefer) that he "won" the debate.

You have people who don't vote Green (at least not yet?) who think Weaver won the debate while you have a good share of NDP voters who even admit Horgan didn't win (using the first approach). Notice however that the measure here might have a bias as I have observed that people tend to be nicer to the leaders of small parties (think Elizabeth May at the federal level). But Weaver and the Green are polling at 20%, they are at times ahead on the Island, they aren't "small" anymore, or at least not currently. Still, the bias might exists.

As for Christy Clark, her index isn't good but the same poll indicates that her favourability increased. (the numbers here are terrible for Horgan who ends up being barely more popular than Clark). With that said, the numbers from Mainstreet are hardly an indication that she did enough to climb back in the polls (of course, these polls could be wrong as they were in 2013, but I'm assuming they aren't).

Winning a debate is often half the battle. The other half being about winning the post-debate spin. We'll see how the Green and Andrew Weaver will fare in the next few days. But for now, I'd confidently say that Andrew Weaver won the debate. After, there is no guarantee that he (or his party) will indeed increase in the polls in the coming days. But he's the most likely to benefit from his performance.

Update: it seems I'm not the only one with this conclusion since the UBC Prediction Market (where people bet actual money) has seen a sharp increase in the number of seats for the BC Green (up to 9 as I'm writing this line).

After the BC debate...

Just one short blog post while waiting for the Mainstreet post-debate poll.

I went to Google Trends to see which leader generated more searches. This is the result :


(Note: I did an exact term search instead of a topic search because this method doesn't work for John Horgan. Results were similar when I tried to do a topic search for Clark and Weaver).

Update: the graph above is directly updated by Google, so the numbers by the time you'll look at them might be different than when I posted them; So here is a print screen of what it looked like after the debate:

Weaver has been behind the other two ever since the election started, so it's pretty significant for him to be ahead of Horgan - he was even tied with Clark right after the debate.

We'll wait and see if this matches with what Mainstreet will find. But based on Google Trends and keeping in mind the relative levels of support for each party (and the fact that Christy Clark just always generates more searches), it looks pretty good for Andrew Weaver.

And you, what did you think? Leave a comment and/or answer the non-scientific poll below:

Who do you think won the BC debate?

Christy Clark from the BC Liberals
John Horgan from the BC NDP
Andrew Weaver from the BC Green
Nobody won
Poll Maker

Who vote for the BC Green Party?

This post intends to take a deep look into who is voting for the Green party and the implications for the projections. It'll mostly be a technical, long boring post. You have been warned!

1. The Green party can attract Liberals voters

Let's look at 2013. The Green technically got a smaller share of votes than in 2009 (8.13% vs 8.21%), but this is highly misleading because they were running fewer candidates - only 61 out of 85, while they had a full slate in 2009. This is important because it means the Green swing was actually positive in many ridings.

In the 24 ridings without a Green candidate in 2009, the Green got 31,358 votes. That's about 23% of all the votes received by the Green party that year. You should see the importance of taking that into account while estimating coefficients or looking at the results. If the Green managed to stay around 8% province-wide while losing 23% of their votes in some ridings, it must mean this party went up in the other electoral districts, significantly in many places

Without the votes in these 24 ridings, the Green would have been at only 6.3% province-wide in 2009. So when they got 8.13% in 2013, it actually means an average swing of almost 2 points!

The big question here is obviously: what happened to these votes in 2013? Did the people who voted Green in 2009 in these 24 ridings simply didn't vote? Did they vote for another party? And this is where I become useful and where statistics and regressions can be used. Finding the answer to this question can give us some indications as to who these Green voters are and which other party they could support.

I tried to look at it with different approaches First of all, I looked at the results of 2013 in each riding and tried to explain the share of votes of the Liberals and NDP with the share of votes of each party in 2009. Doing so revealed that the share of a Liberal candidate in 2013 was highly correlated with the share in this riding 4 years ago, that's logical. It's the same for the NDP. But for the share of votes of the Green, I found significant effects on the shares for the Liberals but not the NDP. Specifically, my estimations indicate that about half the votes for the Green in these 24 ridings went to the Liberals in 2013 while almost none went to the NDP. There are a lot of potential issues with my method here, I'm fully aware of it -I've done enough econometrics in my life to be aware of that- but it's still interesting.

Then I tried to use variations instead (so the swing in each riding between 2009 to 2013). There as well the results showed me that the BC Liberals got about half the votes while the NDP didn't get anything. This is very significant because this explains, at least partially, while the BC Liberals increased in some ridings/regions while dropping overall province-wide. One way to make sense of this is to realize that the BC Liberals maybe received some votes from the Green voters in ridings where the Green didn't have a candidate anymore -in the Okanagan valley for instance, while the Liberals dropped more in other ridings/regions. In ridings where there was a Green candidate both in 2009 and 2013, the Liberals dropped much more than in the ridings without a Green in 2013 (where they actually increased in average!).

Again, the econometrics is very limited but the fact that I got the same results with both methods is encouraging.

In average, it appears that the missing Green candidates gave a bonus to the Liberals of 3.7 points (again, in the 24 ridings).

This can seem very counter-intuitive. Most people usually assume that Green voters have the NDP as second choice. But polls have shown Green voters to be more divided than that. The final Mustel poll in 2009 was showing exactly that: Green voters were split regarding their second choices.

Obviously the current polls are showing a very different situation. Ipsos Reid shows that 42% of Green voters have the NDP as their second choice while only 14% have the Liberals. The latest Mainstreet polls even has the NDP as the second choice of 74%(!) of Green voters -a number much higher than in previous polls from the same firm though. On the other hand, many Mainstreet polls have shown the Liberals voters with the Green as their main second choice.

What this should at least shows is that Green voters don't simply come from or go to the NDP. The communicating vases are more complex than that. The simple fact that current polls show the NDP at its 2013 level while the Liberals are down should be another evidence of that.

Another example of this is Oak Bay-Gordon Head, the riding of the Green leader, Andrew Weaver. He won his seat in 2013 thanks to an incredible personal effect. My estimations show that his result was 27 points above what a "generic" Green candidate would have been expected to receive. It was obviously thanks to him and the incredible campaign the party ran for him there. Still, as far as I can tell, this is the biggest personal effect I've seen, bigger than the one for Elizabeth May.

Yet, the same estimations show this effect was taken equally from the Liberals and the NDP, both losing between 13-15 points. And remember this effect is estimated while also including regional effects. In Victoria Beacon-Hill where former Green leader Jane Sterk was running, I also find that her personal effect (around 12 points) was taken much more from the Liberals than the NDP.

Of course, there can be many explanations. For instance, it's possible that BC Liberals voters from the Island are more likely to vote Green than say Green voters in the Interior. Or, in the case of Victoria-Beacon Hill, some Liberals might have tried to make the NDP lose this safe seat by voting Green. But again, it at least shows that the Green Party seems capable of attracting votes from both parties.

2. Implications for the projections this year

There are mostly two elements to this part. The first one is the inverse of the first part of this post: what will happen in the ridings where the Green are now running a candidate and they weren't 4 years ago? Surely the swing there will be different from the swing elsewhere. Will we observe a "catching up" effect where the results in 2017 would be more along the line of "what should have happened in 2013+swing between 2013-2017"? If that was the case -and looking at what happened to the BC Conservatives in 2009 in ridings with new candidates, we have reasons to believe we'll indeed observe that- then it means the provincial swing for the Green party will be misleading again.

The Green Party is running 83 candidates this year (out of 87 ridings). That's a significant increase in the number of candidates. So when we see the Green being polled 10-12 points above their score of 2013, some of this increase will likely only come from the ridings with a new candidate. This could be fairly significant for the chances of the Green to get MLAs this year. If one is making projections only looking at the provincial swing, this will likely overestimate the swing in ridings where there was already a candidate -and those ridings are the ones the Green can win. I admit I haven't fully accounted for this yet and I'll do so by the end of the week. Make no mistake, this is a lot of work and it requires some assumptions.

The second part: in the ridings with a new Green candidate, where will the vote come from? Could we see the opposite of what we saw between 2009-2013? Meaning the Liberals could be the party most affected by these new candidates. There as well, I'll try to adjust the model by the end of the week.


There are really two things to take away from this blog post. First of all, the Green Party is likely attracting voters from both parties. It is indeed taking seats from the BC NDP on the Islands, but the voters aren't all coming from the NDP.

Second of all, the fact the Green party is running more candidates than last time likely means projections could overestimate the Green swing in ridings where they already had a candidate. Given that every single seat the Green are aiming for is in this situation, you can see the potential for making mistakes.

BC NDP takes a commanding lead (well, maybe)

After a good 10 days without polls, here are two new ones. First of all is yesterday's Justason one. And today we have a new one from Mainstreet. Both polls show the NDP ahead, although its lead is significantly bigger according to Mainstreet.

I updated the projections and you can find more details on the BC Election 2017 page (have you bookmarked it already?!).

Voting intentions; Seat projections with confidence intervals; Chances of winning the most seats

The title used the term "maybe". This is because there are a couple of things to keep in mind before calling for a guaranteed NDP majority. First of all, the Justason poll. Sample size is technically 1128 respondents but a whopping 37.5% either don't know, are not eligible or wouldn't vote. That's almost 40% of the sample! So the numbers they published are among decided voters and there are only 712 of them. The firm then goes on to break down the results by 6 sub regions. That's completely ridiculous. Victoria for instance has a grand total of 60 respondents! Why would a firm even bother publishing numbers based on such ridiculously small sample sizes? That reminds me of this poll during the last federal election that literally had 10 respondents for the CPC in the entire Atlantic Canada. Here's my advice: if your sample is small, don't publish regional breakdown, it's pretty useless.

As for the Mainstreet one, while it doesn't suffer from a tiny sample, there are a couple of things that you should know about. First of all, they dropped the BC Conservatives, which makes sense. But because they don't even offer an "other" option, it means that Lib+NDP+Green sum to 100%. Elections in BC usually have a higher share of others than other provinces. We can expect at least 2-3% in this category (included for the 10 BC Cons candidates). I really wish Mainstreet would offer "other party" as a choice. Secondly, Mainstreet is showing a really important swing in favour of the BC NDP (+5 points) and against the Liberals (-3). Is it really what is happening? Is the strategy of John Horgan to pretty much stay in the greater Vancouver paying off? Mainstreet sure indicates so since the NDP is now ahead by 16 points in this region. But is the swing real or is it simply the results of random sampling? Justason (with tiny samples) has the NDP barely ahead in the same region. Also, the number of undecided actually increased since the last poll (especially in the rest of BC). That doesn't make any sense, we shouldn't see more undecided after 2 weeks of campaign. Finally, the second choices of the Liberals have massively shifted from having the Green as the main second choice to the NDP. I personally think this might be the result of a small sub-sample there (it's only for non-decided but leaning Lib voters).

Anyway, both polls have been added to the average. Thank god for the Justason one otherwise my poll average would now simply be based on Mainstreet (since all the other polls are getting old) (Note: I have nothing against Mainstreet, on the contrary, but I don't like it when my average is driven too much by only one firm or poll).

My projections have changed quite a bit. I personally don't like when they changed drastically overnight but there isn't much I can do with with so few polls. What was a close race between NDP and Liberals is now turning into a comfortable lead for the former. Comfortable being maybe an overstatement. With 31% chances of winning the most seats, Christy Clark is still in this race but her odds are decreasing.

The Green party (that revealed its full platform yesterday) is definitely sticking around. We are at the halfway mark of this campaign and their support appears to be solid. Mainstreet even shows a big increase in the number of "strong supporters" of this party (at 67% of strong supporters, this party is barely behind the other two). But the electoral system is still hurting this party quite hard. If we had a PR system, Andrew Weaver would be on his way to get around 17 MLAs instead of the 3 projected here. To be fair, the simulations indicate that the Green could elect as many as 11 people but this is obviously an unlikely crazy perfect case scenario. They do have, however, around a 25% chance of getting 5 MLAs or more. But to do so, they will either need to beat their poll numbers or, more likely actually, rely on strong local campaigns and candidates.

As a final note, people naturally haven't forgotten about how wrong the polls were in 2013. And it's too early to look at these polls -especially the Mainstreet one - and declare a NDP victory. But the big difference so far is that the BC Liberals don't appear to be climbing back during the campaign as they did four years ago. To be fair, they also didn't start as low as they did. Still, I'm not saying we are immune against another polling error, but the dynamic is at least different. Now, the next big event is naturally the debate on Wednesday. Hopefully we'll get some polls asking who won.

What if BC had the French electoral system?

Yesterday was a good day for the polls in France and, by extension, for my projections and myself. The polls for the presidential election in France were spot on and therefore so was I. To be fair, it's much easier to simply aggregate the polls than to convert them into seats anyway.

Let's not forget the BC election though. We haven't had many polls (yet?) but I made some substantial updates to the model. First of all, I took into account where the Green and especially the BC Conservatives were running candidates. The Green have 83 candidates (out of 87) while the BC Cons only have 10. Honestly I should just drop the Conservative from my projections but they were already included, so it's actually less work to leave them there.

I also made some adjustments to some ridings. Some adjustments are coming from the lack of a Conservative candidate (in ridings where this party got over 10% last time around), others are coming because I have reasons to believe my projections were wrong. One such example is in Victoria Beacon-Hill where I previously had it leaning Green. But that was with the built-in bonus of having the Green leader running there. It's not the case anymore and my estimations show Jane Sterk did benefit from a significant bonus. On the other hand, I now have the Green ahead in Cowichan Valley as it appears fairly obvious the Green are running a better campaign than the BC NDP (whose base was kinda split during the nomination process, with the former campaign manager now running as an independent!).

You can find the updated projections on the BC Election 2017 page and you can use the updated model in the simulator.

I wanted to try a little exercise related to the French election. France uses a run-off system where, if no candidate gets more than 50% of the vote in the first round, the top 2 then go to a run-off 2 weeks later. I wanted to simulate the effects on the BC election.

To do so, I used my projections as the baseline results and I redistributed the votes of the other parties in every riding where no candidate was getting a majority. I used the second choices provided by Mainstreet and Ipsos.

The results? See below:

The BC NDP would go from having a small lead to being pretty much guaranteed a majority. For the Green it wouldn't change anything. It's because they are winning their three ridings "comfortably" and they are the main second choice of the main two parties.

Only 39 ridings are currently projected to have a a candidate over 50%. As for being in the top 2 (i.e the run-off), the Liberals qualify 79 times, the BC NDP 85 and the Green 10. These numbers might seem crazy but remember there are only really 3 parties, so it really limits the possibilities.

An example of a riding that would flip from Liberals to NDP with the French system is Boundary-Similkameen. There are 5 ridings affected in total: The one mentioned plus Courtenay-Comox, Maple-Ridge Mission, Surrey-Panorama and Vancouver Langara.

I think that beyond the fun exercise, this blog post should illustrate how close this election is. You flip 5 ridings and you have a NDP majority. If it remains really like this until May 9th, my final projections will be quite uncertain.

Jour J en France: les toutes dernières tendances

Jour J en France: les toutes dernières tendances
Nous y voici, le 1er tour de l'élection présidentielle française. Amis français, j'espère que vous irez voter aujourd'hui. Vos expatriés de Montréal et Vancouver (et ailleurs) ont voté hier et ont souvent dû attendre des heures.

Si vous voulez voir mes projections finales ainsi qu'une analyse détaillées, je vous réfère à mon billet précédent.

Nous aurons les premières estimations officiellement (et normalement) à 20h, heure de Paris. En suivant les médias suisses ou belges, on a souvent les tendance plus tôt.

Que savons-nous jusqu'à présent? La participation dans les territoires d'outre-mer est en baisse par rapport à 2012. Une abstention élevée devrait normalement aider Marine Le Pen et François Fillon.

Il y a aussi eu un sondage publié hier en Belgique (interdit de publier en France) montrant Le Pen devant (26%) et Fillon 2e (22% contre21% pour Macron). Effet de la fusillade de la semaine passée? Ou juste un sondage bidon? Après tout, Twitter parlait du fait que ce sondage provenait possiblement du camp Fillon. La Tribune de Genève avait aussi un sondage dernière minute mais ce dernier montrait les mêmes chiffres que les autres sondages.

Si on va sur Google Trend, la tendance des dernières 48h est pour Macron et Mélenchon.

Buzzpol/Filteris continue de voir un 2e tour Marine Le Pen contre François Fillon en se basant sur le "buzz" en ligne.

Finalement, les marchés de paris ont un 2e tour opposant Macron à Le Pen avec environ 65% de chances tandis que Le Pen vs Fillon est à 18%. ce qui correspond à peu près à la moyenne des modèles (voir mon billet précédent; Mon modèle est l'un de ceux avec la plus faible incertitude).

Voilà, je verrai si je ferai une mise à jour pendant la journée. Sur Twitter assurément, mais pas forcément ici.

Projections finales pour l'élection présidentielle française de 2017

Ce dimanche a lieu le premier tour de l'élection présidentielle en France, un scrutin qui se déroule tous les 5 ans. Après de longs mois d'une campagne objectivement plus folle que d'habitude, les projections montrent une course à 4 avec cependant deux favoris. Rappelons pour ceux qui ne le savent pas que le scrutin français est à deux tours. Les deux candidats qui termineront en tête dimanche se qualifieront pour le 2e tour dans deux semaines.

Comme je le disais, nous avons une course à 4, ce qui est inédit en France. Traditionnellement, les deux favoris sont faciles à prédire. Bien sûr il peut y avoir des surprises comme en 2002, mais les sondages n'indiquaient pas une course si serrée et surtout pas à 4. Cela étant dit, la convergence entre les 4 candidats semble s'être arrêtée vers la fin. Emmanuel Macron connaissant même une légère tendance à la hausse vers la toute fin. Ainsi, bien que "tout soit possible" (y compris n'importe quelle combinaison des 4 candidats pour le 2e tour), il reste que certains scénarios sont de loin plus probables que d'autres.

1. projections basées sur les sondages

Ci-dessous vous avez les chances d'être au 2e tour ainsi que les intervalles de confiance pour les résultats au 1er tour pour les 5 principaux candidats.

Cela n'a pas vraiment changé depuis ma dernière mise à jour si ce n'est la remontée de Macron. Rappelons en passant que Macron est favori contre n'importe quel autre candidat (ou candidate) au 2e tour (dans bien des cas, assez largement). Ainsi, pour lui, la vraie course a lieu ce dimanche. S'il termine dans les deux premiers, il sera presque assurément le prochain président.

Il y a encore quelques semaines, la présence de Marine Le Pen du Front National au 2e tour était quasi assurée. Mais sa relativement mauvaise campagne (à la fin), sa piètre performance dans les débats et la montée de Mélenchon lui ont finalement fait mal. Ses chances demeurent élevées grâce à des électeurs très décidés, mais force est de constater qu'elle semble avoir râté sa campagne (elle était à 25% et plus depuis 2 ans).

Si François Fillon est dans les deux premiers ce dimanche, il faudra saluer sa persévérance. Alors qu'il était le grand favori après la primaire de la droite, les accusations d'emplois fictifs ont miné sa campagne. Il a cependant décidé de rester candidat et il semblerait que ce n'était pas forcément une mauvaise idée. Son socle électoral est solide à environ 19%. Ses électeurs sont motivés et en général plus âgés (donc ils votent). Soyons honnête, sans le "Penelopegate", il serait sûrement en tête.

L'histoire de cette fin de campagne est naturellement la montée fulgurante de Mélenchon. Au fil des dernières semaines, il a canibalisé les appuis d'Hamon mais aussi (un peu) de Macron et de Le Pen. Il peut sembler surprenant qu'un candidat de gauche "radicale" puisse prendre des votes à la droite dure ou extrême mais ce populisme transcende souvent les clivages gauche-droite. Il avait déjà connu une bonne hausse en 2012 mais n'avait finalement obtenu que 11% des voix. Il semble que ses appuis soient plus solides cette fois-ci mais il reste que sa progression ne lui a pas permis de dépasser les 20% en moyenne.

Finalement, Benoît Hamon a connu une campagne de cauchemard. Après sa victoire surprise dans la primaire de la gauche, il n'a jamais réussit à revenir dans la course pour le 2e tour et il s'est effondré vers la fin. Il n'y a aucune chance qu'il soit au 2e tour et dans les faits il pourrait terminer sous les 5%, un résultat catastrophique pour le PS dont les dépenses de campagne ne seraient plus remboursées!

Regardons la distribution des résultats possibles:

Macron a une courbe plus "large" ou "étendue", ce qui signifie qu'il y a davantage d'incertitude pour ce candidat. Mon modèle est un peu différent des autres (voir ci-dessous) car il tient compte de la certitude du vote (le % d'électeurs pour un candidat qui déclarent leur choix comme étant définitif). C'est peut-être une erreur de tenir compte de cette variable mais en même temps, je trouvais bizarre de ne pas utiliser cette information. Et à ce jeu-là, Macron fait moins bien que Le Pen ou Fillon. Ces derniers peuvent compter sur des électeurs sûrs à plus de 80% alors que Macron n'est que vers 70%. D'un autre côté, Macron peut compter potentiellement sur l'arrivée d'appuis tant de sa gauche que de sa droite (cela signifie aussi qu'il peut perdre tant sur sa gauche que sa droite!). Ainsi, je dirais qu'un modèle qui ne montre pas un plus large intervalle pour Macron n'est pas complet. Oui j'ai dû faire des hypothèses pour intégrer cette incertitude, mais les résultats sont cohérents avec ce que l'on imaginerait. Aussi, rappelons que Macron n'a pas officiellement de parti et n'a jamais été élu auparavant. Sa victoire serait un évènement incroyable dans la politique française. Il est ainsi normal je crois d'avoir une incertitude plus grande pour ce candidat.

Mon modèle a aussi une certaine corrélation entre les candidats en raison de l'utilisation du report des voix et des 2e choix. Ainsi le résultat de Mélenchon par exemple est corrélé négativement avec le vote Hamon, Macron ou même Le Pen, mais pas avec le vote Fillon (ou très faiblement). Les deux corrélations les plus fortes sont entre Mélenchon et Macron ainsi que Macron et Fillon. Pour la première, il y a un groupe d'électeurs de gauche qui semblent pouvoir voter pour Macron, Hamon ou Mélenchon. On le voit bien dans le sondage Harris où les électeurs PS se divisent entre les trois. Je crois sincèrement que ce groupe est probablement la clé du scrutin de dimanche. S'il soutient effectivement Macron en majorité, ce dernier sera élu. Mais si cet électorat devait rester chez lui ou voter pour Hamon et Mélenchon -même dans des proportions juste un peu plus grandes que celles dans les sondages, alors attendez-vous à une surprise. Le fait que le score de Macron soit celui inclus dans les deux plus grandes corrélations est une autre façon de réaliser que son intervalle est plus large: il a un plafond très élevé mais un plancher bas.

L'autre électorat clé est l'électorat de centre-droit. On parle des supporteurs de Bayrou en particulier. À l'heure actuelle ils sont majoritairement chez Macron mais voter Fillon n'est de loin pas impossible pour eux. Il y a aussi les 4% chez Nicolas Dupont-Aignan qui seraient fort utiles à Fillon.

Le graphique ci-dessus montre bien que la course à 4 est réelle. Rappelons-nous que le modèle est calibré en terme d'incertitude par rapport à la précision des sondages depuis 2002. Cela veut dire des marges d'erreur moyennes d'environ 3.8%. Vu que l'écart entre Mélenchon (4e) et Macron (1er) est d'à peine 4.6 points, on voit bien que plusieurs scénarios sont possibles. D'autant plus que nous avons des raisons de penser que les sondages ont peut-être tort (voir ci-dessous).

Quant aux second tours possibles, voici le graphique qui nous fournit l'information:

En gros, il y a une chance sur quatre que le 2e tour ne soit pas cela attendu entre les deux favoris des sondages (favoris au 1er tour bien sûr). C'est une probabilité élevée. Le fait que Macron soit dans les 3 scénarios les plus probables montre son avance sur ses adversaires. Mais il reste que 6 scénarios sont possibles ce dimanche, une première en France.

2. Les sondages

Tel que je viens de le mentionner, les sondages français ont une marge d'erreur effective de 3.8% depuis 2002. Une autre façon de voir l'incertitude des sondages est de réaliser que l'erreur moyenne absolue est de 1.6 points parmi les 5 principaux candidats. Aussi, toujours parmi le top 5, les sondages ont eu tendance à sous-estimé un candidat de manière importante à chaque élection (Le Pen sous-estimé par 3.5 points en 2002, Sarkozy par 3 points en 2007 et Marine Le Pen par 2 points en 2012). Naturellement un candidat a toujours été surestimé (Jospin en 2002, Le Pen en 2007 et Mélenchon en 2012). Le fait que les sondeurs français pondèrent en fonction du vote passé peut possiblement expliquer ces erreurs pour les candidats en baisse ou en progression. Si le yo-yo devait continuer cette année, Mélenchon devrait être sous-estimé et Le Pen surestimée.

Parlant de sondeurs, ces derniers sont accusés de "herding". C'est à dire de ne pas publier leurs vrais chiffres mais de se coller aux autres sondages afin de ne pas avoir tort. Il faut dire que si vous regardez les sondages depuis des semaines, la volatilité entre sondages est vraiment faible. Le Pen par exemple semble systématiquement entre 22 et 23%, ce qui ne devrait pas arriver avec des échantillons aléatoires. Sauf que voilà, les sondeurs français ne font pas des échantillons aléatoires mais par quotas. Et ils redressent ensuite les résultats en fonction du vote passée. Cela a tendance à réduire la volatilité. Cela étant dit, la volatilité observée est vraiment faible et s'il devait y avoir une surprise ce dimanche, les sondeurs devront s'expliquer.

Aussi, tous les sondages sont faits par internet sauf un! Et ce dernier a des résultats relativement différent avec Fillon bien plus faible et Mélenchon 2e.

Alors, est-ce les sodnages vont avoir raison? Nous ne le saurons que dimanche mais en attendant, nous pouvons regarder d'autres indicateurs tels que les recherches dans Google Trends. Il est bien connu qu'il existe souvent une corrélation entre la popularité d'un candidat et sa présence dans le moteur de recherche.

D'après Google Trend, Mélenchon est en tête. Si vous faites la recherche vous-même et que vous regardez pour les 30 ou 90 derniers jours, vous voyez la même chose.


Si l'on compare à 2007 ou 2012, on peut penser que Mélenchon y est surestimé. Il faut dire que son équipe de campagne est bonne pour générer le buzz. En même temps, Macron devrait être un peu inquiet. Les données non basées sur des sondages ne sont pas bonnes pour lui. Non seulement le Google Trends mais aussi le buzz média en général.

Ainsi, nous avons des raisons de penser que Macron est surestimé tandis que Fillon et Mélenchon sont possiblement sous-estimés. Si tel devait être le cas, il se pourrait que les chaînes télévisées françaises ne puissent pas nous donner l'identité du top 2 à 20h ce dimanche (note: les chaînes françaises ont toujours des sondages faits à la sortie des urnes et ils publient les estimations à 20h00, soit quand les bureaux de vote ferment. Il s'agît cependant d'estimation et en cas de course serrée, il nous faudrait attendre les résultats définitifs).

Finalement, il me faut mentionner l'attaque terroriste de cette semaine. Nous ne pouvons pas regarder les sondages pour mesurer les conséquences (il y a bien eu un sondage fait après l'attaque, mais l'échantillon est petit). Il reste que l'on peut imaginer que cela avantagera les candidats de droite (Le Pen et Fillon) car les électeurs les voient en général meilleurs pour les questions de sécurité. J'avoue ne pas pouvoir prédire l'impact de cette attaque et il est possible qu'elle ne changera rien. Mais dans le contexte d'une élection à haute incertitude, disons que ca n'aide pas!

3. Les autres modèles

Je n'avais jamais fait de projections pour la France et j'ai commencé ce modèle essentiellement "pour le fun". Aussi, à l'inverse du Canada ou des États-Unis, les modèles de projections/prédictions semblent moins communs en France.

Il y a Depuis1958.fr. À l'inverse de moi, il ne regarde pas uniquement le 1er tour. Si l'on compare nos prédictions pour dimanche, on voit que l'on s'entend pas mal. Macron à 90% de chances de se qualifier, Le Pen 75%, Fillon 16.7% et Mélenchon 17.7%. Il ne tient pas compte des redistributions de votes où de la certitude des électeurs. Quant aux scénarios pour le 2e tour, là aussi on s'entend pas mal: 66% de chances d'avoir Macron contre Le Pen ou 12% pour Macron contre Fillon. On ne s'entend pas vraiment sur les chances d'avoir Macron contre Mélenchon cependant. J'accorde 7% de chances à ce scénario alors qu'il a près de 13%.

The Economist donne 74% de chances à Macron d'être au 2e tour, 65% à Le Pen, 30% à Fillon et 30% à Mélenchon. J'avoue que je trouve que ce modèle a un peu trop d'incertitude et la joue un peu trop "sûr" afin de ne pas avoir tort. Les sondages depuis 2002 n'ont pas été aussi mauvais. En même temps, il est peut-être prudent de ne pas regarder seulement trois élections. D'autant plus que les sondages ont connu leur dose d'erreur récemment (Brexit, Trump, etc).

Quant à The Crosstap, les probabilités sont très proches de The Economist. Là aussi cependant, leur modèle est calibré avec une erreur moyenne plus grande que celle des élections récentes.

Finalement, Contesdefaits a aussi Le Pen avec moins de chances que moi. Je crois que la différence vient vraiment du fait que je sois le seul à tenir compte du fait que Le Pen a un vote bien plus garanti, du moins selon les sondages. C'est peut-être une erreur de ma part et nous verrons dimanche. J'ai déjà mentionné le résultat un peu étrange d'avoir Marine Le Pen avec >80% de "définitifs" et pourtant son score est tombé de 25% à 22%.


C'est de loin l'élection la plus incertaine que j'ai couverte depuis que j'ai commencé ce blogue. Entre la course à 4, les possibles erreurs des sondages et la récente fusillade terroriste, il y a une incertitude majeure quant aux résultats de ce 1er tour. Cela étant dit, ne vous métrompez pas, Macron est le favori et de loin. S'il termine 1er (ou 2e) dimanche, l'accomplissement sera de taille. On parle d'une personne qui est jeune, n'a jamais été élue et a lancé son mouvement il y a 1 an! En même temps, les sondages montrent bien que les gens ne sont pas forcément super fans de son programme. Il est souvent vu comme le candidat par défaut.

Les surprises peuvent arriver mais remarquez bien que Macron a littéralement été en tête de tous les sondages!

Si je devais donner mon opinion subjective quant à quelle surprise pourrait arriver, je crois que Fillon est probablement sous-estimé. Je crois que ses chances d'être au 2e tour sont meilleures que mes probabilités, mais je ne vais pas commencer à modifier les chiffres subjectivement. La question maintenant est vraiment de savoir qui sera surestimé. Là je ne serais pas surpris s'il s'agissait de Macron. Ainsi, je crois qu'un 2e tour Fillon contre Le Pen n'est de loin pas impossible. Pour que cela se produise, il faudrait essentiellement les ingrédients suivant: une abstention importante (ce qui favorise Le Pen et ferait sûrement mal à Macron) et la droite classique qui retourne chez Fillon. On parle ici de 2-3 points que Fillon pourrait aller chercher chez Nicolas Dupont-Aignan ainsi que chez le centre-droit. La droitre en France a toujours été forte et terme de programme, Fillon devrait être plus élevé que 19-20%.

Notes méthodologiques

Voici comment le modèle fonctionne.

1. Faire la moyenne pondérée des sondages en donnant plus de poids aux sondages récents.
2. Faire la moyenne du pourcentage d'électeurs pour chaque candidat qui déclarent leurs choix comme étant définitifs (plus de 80% pour Le Pen, environ 70% pour Macron par exemple).
3. Faire 50,000 simulations (avec écart type d'environ 1.3 pt). Donc ici, Macron (en moyenne à 23%) est parfois à 25% et d'autres fois à 21%.
4. Séparer les électeurs pour chaque candidat dans chaque simulation qui sont définitifs de ceux qui peuvent changer d'avis.
5. Parmi ceux qui peuvent changer, l'hypothèse est que la moitié, en moyenne, votera finalement pour ce candidat (distribution normale avec moyenne à 50%). L'autre moitié ne votera soit pas (fonction du taux prévu d'abstention) ou votera pour son 2e choix. Cette partie est difficile car peu de sondages fournissent cette information. J'ai dû y aller un peu subjectivement. Essentiellement, je me suis basé sur les affiliations idéologiques (Hamon et Mélenchon partagent beaucoup d'électeurs), les 2e choix de certains sondages ainsi que le report des voix au 2e tour. Autre hypothèse: à part les 5 principaux candidats, tous les autres ne peuvent que perdre des votes (=transférer aux 5 principaux), ils ne peuvent pas en recevoir.
6. Faire la somme des votes définitifs, des indécis ayant finalement décidé de rejoindre ce candidat ainsi que les transferts des autres candidats. Cela nous donne des marges d'erreur totales d'environ 3.8%, ce qui correspond exactement aux marges d'erreur effectives des sondages depuis 2002. En d'autres mots, le modèle contient la bonne quantité d'incertitude mais celle-ci est introduite de plusieurs manières (simulations au début, part des indécis qui votent pour un autre candidat, abstention, etc).
7. Calculer le nombre de fois où chaque candidat est dans les deux premiers.

Présidentielle 2017: à 4 jours du vote!

La situation semble s'est stabilisée en France. Macron a stoppé sa chute (il semble même un peu remonter dans les derniers sondages) tandis que Mélenchon et Fillon ont arrêté leurs remontées (celle de Mélenchon étant de loin plus importante). Quant à Marine Le Pen, elle semble elle aussi être stable. À moins d'avoir des tendances très différentes dans les 2 prochains jours, j'ai l'impression que mes projections finales ressembleront beaucoup à celles de ce billet.

Voici un graphique avec ces sondages et les tendances:

En gros, Macron et Le Pen restent favoris pour accéder au 2e tour mais tant Fillon que Mélenchon ont leurs chances. Étant donnée la fiabilité des sondages français depuis 2002 et l'erreur moyenne (en particulier la tendance des sondages à être imprécis pour deux des 5 principaux candidats), on ne saurait exclure une surprise ce dimanche. L'écart entre les deux premiers et les autres est juste assez large pour faire en sorte qu'une surprise reste un scénario avec moins de 50% de chances d'arriver (oui, je sais, c'est la définition même d'une surprise, mais vous comprenez ce que je veux dire).

Premièrement, regardons les probabilités d'être au 2e tour ainsi que les intervalles de confiance pour les résultats au 1er tour.

Avec plus de 85% chacun, Macron et Le Pen restent favoris. Oui il y a les marges d'erreur et l'incertitude en général, mais il reste qu'être sondé devant par quasiment tous les sondages est mieux que de devoir compter sur une sous-estimaiton systématique dans les enquêtes d'opinion. Marine Le Pen continue d'être celle avec la plus faible incertitude car son électorat reste le plus définitf. Cela étant dit, elle avait déjà 80% d'électeurs déclarant leurs choix comme définitifs quand elle était à 25%. Elle est maintenant à moins de 23% et a toujours 80% d'électeurs solides. C'est un peu bizarre et cela montre qu'un électeur peut déclarer son choix comme définitif mais quand mêem changer d'avis après quelque temps. De tous les modèles, je crois que je suis le seul à tenir compte de cette variable. C'est peut-être une erreur mais en même temps, cette donnée est disponible dans tous les sondages et il serait tout autant étrange de l'ignorer. De plus, il semble logique que Macron ait davantage d'incertitude vu qu'il peut gagner (et perdre!) des deux bords.

Comparons mes probabilités à d'autres modèles. Depuis1958.fr donne 87% à Macron, 82% à Marine Le Pen, 16% à Mélenchon et 15% à Fillon. C'est hônnetement très proche (surtout que nos méthodologies sont très différentes). The Crosstab a davantage d'incertitude que moi. Macron et Le Pen sont en-dessous des 70% de chances alors que Fillon est à 35% et Mélenchon y est à 29%. Il faut dire qu'ils ont de plus larges marges d'erreur car ils ont calibrés leur modèle d'après la fiabilité des sondages depuis plus longtemps que moi (les sondages ont relativement bien fait depuis 2002). Finalement, The Economist est très similaire à The Crosstab, si ce n'est que les chances d Mélenchon y sont encore plus élevées.

Au final, les modèles s'entendent sur l'ensemble mais pas dans les détails.

Aussi, il nous faut mentionner que les sondeurs français sont actuellement accusés de "herding", cette tendance des sondeurs à tous avoir les mêmes chiffres. Cela arrive quand les sondeurs, au lieu de publier les chiffres réellement obtenus (et différents de la moyenne) font juste copier-coller la moyenne des sondages. Pourquoi? Comme ça vous ne pouvez pas être la seule firme dans l'erreur! à l'inverse, si les sondages ont tort, ils seront tous dans l'erreur!

J'avoue ignorer si ce phénomène est effectivement en train d'arriver en France mais force est de constater la super faible variation entre les sondages.

À noter qu'une seule firme fait des sondages par téléphone (Le Terrain) et les résultats sont différents des sondages faits en ligne. Je le mentionne car je trouve dommage que tous les sodneurs français utilisent la même méthode de collecte de données.

En termes de distributions, voici les simulations les plus récentes.

Je crois que cette image représente le mieux à quel point nous avons une course à 4. Pas une course égale avec tout le monde à 25%, mais une élection où 4 candidats ont de réelles chances d'être au 2e tour.

Parlant de 2e tour, voici les scénarios possibles:

Le scénario Macron - Le Pen reste de loin le plus probable mais nous avons maintenant 4 autres scénarios avec des chances non négligeables. Je ne projette pas actuellement le 2e tour mais la plupart ne seraient probablement pas très serrés (Macron gagne quasiment tous ses duels; le plus serré pourrait être Fillon contre Le Pen ou Mélenchon contre Macron).

Je ferai ma dernière mise à jour d'ici la fin de semaine, lorsque nous aurons tous les sondages.

Notes méthodologiques

Voici comment le modèle fonctionne.

1. Faire la moyenne pondérée des sondages en donnant plus de poids aux sondages récents.
2. Faire la moyenne du pourcentage d'électeurs pour chaque candidat qui déclarent leurs choix comme étant définitifs (plus de 80% pour Le Pen, environ 70% pour Macron par exemple).
3. Faire 50,000 simulations (avec écart type d'environ 1.3 pt). Donc ici, Macron (en moyenne à 23%) est parfois à 25% et d'autres fois à 21%.
4. Séparer les électeurs pour chaque candidat dans chaque simulation qui sont définitifs de ceux qui peuvent changer d'avis.
5. Parmi ceux qui peuvent changer, l'hypothèse est que la moitié, en moyenne, votera finalement pour ce candidat (distribution normale avec moyenne à 50%). L'autre moitié ne votera soit pas (fonction du taux prévu d'abstention) ou votera pour son 2e choix. Cette partie est difficile car peu de sondages fournissent cette information. J'ai dû y aller un peu subjectivement. Essentiellement, je me suis basé sur les affiliations idéologiques (Hamon et Mélenchon partagent beaucoup d'électeurs), les 2e choix de certains sondages ainsi que le report des voix au 2e tour. Autre hypothèse: à part les 5 principaux candidats, tous les autres ne peuvent que perdre des votes (=transférer aux 5 principaux), ils ne peuvent pas en recevoir.
6. Faire la somme des votes définitifs, des indécis ayant finalement décidé de rejoindre ce candidat ainsi que les transferts des autres candidats. Cela nous donne des marges d'erreur totales d'environ 3.8%, ce qui correspond exactement aux marges d'erreur effectives des sondages depuis 2002. En d'autres mots, le modèle contient la bonne quantité d'incertitude mais celle-ci est introduite de plusieurs manières (simulations au début, part des indécis qui votent pour un autre candidat, abstention, etc).
7. Calculer le nombre de fois où chaque candidat est dans les deux premiers.