Chicago Fanatics Message Board
https://mail.chicagofanatics.com/

We might have something: using stats to predict the NCAA
https://mail.chicagofanatics.com/viewtopic.php?f=101&t=92855
Page 1 of 1

Author:  Psycory [ Wed Mar 18, 2015 10:18 am ]
Post subject:  We might have something: using stats to predict the NCAA

A grad school friend and I do it every year, put in a couple of dollars and put it in my pool (if you are interested in joining my pool, it is cheap - $5, message me). It never wins, however, it can give you some general trends on who to like/avoid. We have data going back the last 15 years for all the teams that are in the tournament a some stats (Sagarin rating, sos, rpi, conference Sagarin, conference rpi, offensive and defensive statistics as well) to create a profile of winners and losers for each round based on the past 13 years of data. In the analysis when two teams fit the "winners" profile, we have two different decision rules for the tie breaker (absolute value of winning score, difference of winner profile score - loser profile score). The absolute value tends to be more liberal, while the difference score tends to be more conservative.

Trends: for the first time ever, it only predicted 1s and 2s in the final four. It liked the Big 12 and Big 10 in the early rounds, hates all double digit seeds.

Final fours: Conservative: Kentucky, Arizona, Virginia, and Duke. Champion: Duke over Kentucky
Liberal: Kentucky, Wisconsin, Virginia, and Gonzaga Champion: Kentucky over Gonzaga

In three weeks make fun of it when Villanova defeats Kansas or something.

Author:  newper [ Wed Mar 18, 2015 6:54 pm ]
Post subject:  Re: We might have something: using stats to predict the NCAA

I think the thing that makes bracket pools so interesting versus a computer simulation/stats model, is that the bracket pool and stats model are aligned in the fact that you need to predict the games correctly, however, the key to winning a pool is predicting match-ups correctly that others are not predicting. For a stats model, it may say Team A is a 80% favorite in a game. And if they played the game 20 times, Team A might very well win 16 of those. However, it is a one shot deal in a tournament, so how do you get a stats model to say "I like Team A quite a bit, but I'm choosing Team B for the hell of it." If you have it just pick based on straight percentages, it will almost always lose because the majority of other people are going to pick that same way, only someone will have picked the other way in a game. If you set it up so it picks based on probability (i.e., it runs the contest 100 times and picks one out of those 100 as the game of record) then you run the risk of zigging when you shouldn't have, and it kind of blows up the rest of the bracket.

I don't know if you have a percentage (or maybe you could infer one from the point spreads that you come up with) but it would be interesting to look at matchups that you have historically called in various ranges... like if you had one team winning at a 75-85% rate in your predictions, if you review the results of all of those predictions, did your favored team win 80% of the time or so?

Author:  Psycory [ Wed Mar 18, 2015 7:14 pm ]
Post subject:  Re: We might have something: using stats to predict the NCAA

WARNING: Stats geeking out ahead...

Our model doesn't work that way, what we do is create profiles of winners/losers based on the stats of previous winners. So we have the stats for every team in the tournament since 1998. The profile analysis creates an algebraic equation that we use and if their 'winner' profile has a higher score than the 'loser' profile, they are a winner. When two teams are categorized as winners (or losers) we go to the tie break decision rules (the conservative one is the overall higher winner score, the liberal one is the difference between winner and loser score). Our hope is to predict the early round upsets, however as we get more data (power) the model is getting more and more conservative - picking less potential upsets this year it picked only one 10 seed and above to win in the first round. We've been tossing around the idea of only going back 10 years we might play more next year - I'll be on sabbatical.

The fun thing is to look at if the equations are significant or not (should they do well in predicting).
First round analysis is significant (p<.001)
Second round analysis is significant (p<.02)
Third round analysis is significant (p=.049)
Fourth round is not significant (p=.245)
fifth round is not significant (p=.322)
Sixth round is significant (p = .032)
We joke that if we get the final round, we will have the winner. In fact a couple times my friend was in vegas for final four weekend and we put some money down on our winners. We won once, lost the other time to basically break even.

Author:  Psycory [ Wed Mar 18, 2015 7:16 pm ]
Post subject:  Re: We might have something: using stats to predict the NCAA

One other thing, when they are significant, they should predict better than chance...in other words the profiles fit previous winners and losers at a good number
Round 1 has 75% accuracy of fitting teams into their appropriate profile
Round 2 has 77% accuracy
Round 3 has 74% accuracy
Round 6 has 100% accuracy - again get us to the last round.

Page 1 of 1 All times are UTC - 6 hours [ DST ]
Powered by phpBB® Forum Software © phpBB Group
https://www.phpbb.com/