Page 1
Page 1
Started By
Message

Offseason-ish thread: Grading the CFP Committee rankings after 11 years (deep dive)

Posted on 1/3/26 at 2:11 pm
Posted by lostinbr
Baton Rouge, LA
Member since Oct 2017
12697 posts
Posted on 1/3/26 at 2:11 pm
TRIGGER WARNING - LONG POST AHEAD
TL;DR: Look at the graphs. I don't know how to talk about this in less than 1,000 words. I am who I am. Sorry.

Around this time last year, I posted this topic analyzing the disparity between conference schedules among SEC teams in 2024. At the time, I thought it would be interesting to do an expanded strength of schedule/strength of record analysis looking at not just SEC teams, but all of FBS. One thing I was particularly curious about was how the CFP Committee rankings compare with calculated strength of record over the years.

There are various places to find this information - for example, FPI has strength of record data that you can compare to the CFP rankings - but I really wanted a data source that I could dive into beyond some FPI numbers on a web page. So.. I built my own.

I'll add a separate post detailing the process, but here's the short(ish) version: I pulled historical SP+ ratings, CFP and poll rankings, and game results from collegefootballdata.com. I pulled this data for the entire CFP era to-date - 2014 through 2025. I then built a tool to calculate strength of schedule, using SP+ data, for every FBS team over that 11-year period. My tool also calculates strength of record using the same SP+ data, and there are several levers I can pull to tweak the parameters / evaluate the results.

Methodology
Before I get into some of the results, a couple of quick notes & definitions just to make sure it's clear what we are looking at:

Snapshot in Time - End of Regular Season, Prior to Conference Championships
This is probably the most critical piece of the puzzle. You see, one of the issues with evaluating the CFP Committee rankings is that there's a subjective value placed on conference championships. There's no way for me to tell analytically whether this subjective value makes sense, and it really muddies the waters. To deal with this, all of the analyses that follow are based on the end of the regular season, prior to conference championship weekends. The entire snapshot for each season - records, rankings, schedule strength, etc. - is based on the end of the regular season.

Strength of Record
Strength of record, at it's most simplistic level, is a measure of how a team performed relative to the strength of their schedule. In this case, strength of record is reported as the probability that the team had a better record than an average top-12 team (in the given season) would have against the same schedule.

The Data
So with that out of the way, let's look at some data. My biggest question going into this was "is the CFP Committee focusing too much on W/L record and not enough on schedule?" So my first step was to take a look at calculated SOR vs. the CFP Committee rankings. Here is what that dataset looks like for the CFP top 25 over the past 11 years:



Note that these SOR values have been further normalized and re-centered, which allows comparison across multiple years (as long as we focus that comparison near the re-centering point, which is around the #10 team in this case). Here's what the same data looks like without that normalization and re-centering, for reference:



So going back to the normalized chart, I chose the top 10 as my center point for analysis. Originally I was looking at the top 11 - my logic was that most years, the top 11 teams in the CFP rankings should make the 12-team playoff. However, as it turned out, that wasn't the case either of the first two years of the expanded playoff. So I figured top 10 might make more sense.

The data points in magenta represent teams who were ranked in the top 10 by the committee, but did not have a top 10 strength of record. The data points in green represent teams ranked outside the top 10 by the committee, despite having strength of record in the top 10.

So the next question is.. who were these teams? Let's take a look:





Some of these are interesting. 2022 LSU obviously jumps out, but if you look at the SOR you'll notice that it's very low compared to the rest of the list. LSU had the 9th best SOR at the end of the '22 regular season primarily because there was a pretty weak field in 2022. Also worth noting that considering this is a snapshot before the SECCG, LSU very well may not have made a 12-team CFP in 2022 even if they were "properly" ranked by the committee.

Another that jumps out is 2025 BYU. Their 0.627 SOR means that their record, given the teams they played this year, is 62.7% likely to be better than an average top-12 team playing the same competition. They had a top-4 SOR but the committee had them ranked #11 prior to the conference championships. Ouch.

Here is another way of visualizing the same data:



The magenta dots represent teams that were ranked in the CFP top 10 at the end of the regular season. The x-axis is strength of schedule (schedules get harder as you go to the right) while the y-axis is strength of record (resume gets better as you go up).

I think this plot kind of tells the story I expected to tell, but only if you squint at it just right. The story would be that teams are better off at 10-2 with a weaker schedule than 9-3 with a harder schedule, even if that 9-3 record would actually be better because of the schedule difficulty. But you aren't talking about that many cases, and it's really at the margins (in that 0.2-0.4 SOR range, near the bottom of the expected CFP field).

The last thing I thought about was the reality that the CFP committee probably didn't care that much who was ranked #10 back in 2015. The 12-team playoff puts a higher level of scrutiny on the #8-12 (or so) teams in the rankings. So what if we only look at the two years so far of the 12-team playoff?





I think this looks a bit tighter. Again the biggest outlier is 2025 BYU, who really seems to have been screwed in the penultimate rankings.

Conclusions
All-in-all, I would say this analysis makes the CFP rankings look... better than I expected, actually. There are some clear head-scratchers, but overall it seems fairly reasonable considering we are looking at 11 years of data here. I have to admit, I was a bit surprised.

One thing that this analysis does not tackle, though, is how the rankings change following conference championship weekend. This is much harder to objectively analyze as I mentioned before. How do you put an objective value on a conference championship, beyond simply adding it to the win total/SOS calculation? It's also worth noting that some of the most controversial CFP committee decisions - particularly moving FSU out of the top 4 in 2023 - happened after conference championship weekend.
Posted by lostinbr
Baton Rouge, LA
Member since Oct 2017
12697 posts
Posted on 1/3/26 at 2:11 pm to
Part Two - Methodology

I suspect most people won't care about this, but for those who do: I wanted to explain where the numbers come from.

Strength of record is a way of looking at a team's record and asking "how would other top teams fare against that schedule?" Generally it's reported as a probability. In this case, I'm reporting it as probability that the team's record is better than the record of an average top-12 team against the same schedule.

In order to build up strength of schedule and strength of record, you need some sort of predictive metric. There are several of them out there, and I'd say ESPN FPI and Bill Connelly's SP+ are the two big ones. I chose to use SP+ because Connelly has been pretty open about how the ratings are built up, which gives me a lot more confidence in them.

You also need some sort of "reference team" to measure SOS/SOR against. Usually you will see published SOS/SOR metrics use either "an average FBS team" or "an average top-25 team." The reference that you use can make a big difference on the calculations. Here's a simplified example to illustrate the issue:





An average FBS team would be expected to win 50% of their games against team A's schedule, because all 4 games are against other average FBS teams. However, they would be expected to win 55% of their games against team B's schedule because 3 of the opponents are really bad. In other words, team A has a stronger SOS for an average FBS team.

However, a better reference team (in this case an average top-12 team) is expected to win almost all their games against mediocre opponents. As such, team B's schedule is actually more difficult - and therefore has a stronger SOS - for an average top 12 team.

Here's a real-world example using Texas' and Oklahoma's 2025 regular-season schedules:



An average FBS team would find Oklahoma's schedule more difficult, but an average top-12 team would find Texas' schedule more difficult.

I actually looked at three different reference points for this analysis: average FBS team, average top-25 team, and average top-12 team. Here is the distribution of 2025 strength of record based on each reference point:



Ultimately I found that there wasn't a ton of difference between using top-12 and top-25 as the reference. The most notable difference happens when you use average FBS instead. I went with top-12 because to me, it makes logical sense when you're trying to compare top-12 teams.

So how do we actually calculate this stuff? Basically it comes down to calculating game-by-game win probabilities using the predictive metric of choice (SP+ in my case). We can convert the SP+ differential between two teams (our reference team and each opponent on the schedule) to a Z-score. To do this, we need the standard deviation. In the past I've used 17 points as the STDev for SP+. However, now I actually have enough data to calculate it since I'm already looking at 11 years' worth of games anyway:



This is also how I went about verifying home field advantage, which remained at 2.5 points as expected. So using our ~14 point standard deviation and 2.5 point home advantage, we can calculate a Z-score for any matchup and then convert that to a win probability. That's actually the easy part.

The hard part is then crunching the numbers on 11 years of data. In the past when I looked at SEC schedules only (for only 1 year) I used a Monte Carlo simulation. But I really didn't use enough discrete simulations then, and doing enough discrete simulations now takes a long arse time because of the size of the dataset.

As it turns out, it was easier to solve everything analytically. I used a script that actually generates every win/loss permutation of a given team's schedule, at which point I can use the single-game probabilities to determine overall probability of each win/loss record.

I tested my script by running some sample probability distributions:





By the way, this is why I've been ragging on Ole Miss' schedule for 2 years now. An average top-12 team would have just under 50% probability to win 10+ games against LSU's 2025 regular season schedule, but would have over 50% probability to win 11+ games against Ole Miss' 2025 schedule. In other words, the schedule difference between Ole Miss and LSU is basically equivalent to spotting an entire game. Wild stuff.

Anywho, once I know my script works, I can run it over the entire 11 year period and then start comparing data with the CFP rankings.

There is one issue that I've noticed - as I mentioned in OP, I used the penultimate CFP rankings (prior to conference championship weekend) to remove the somewhat subjective value of conference championships from the analysis. However, the Big 12 did not play a conference championship game from 2011-2016. Instead, their final regular season game happened during conference championship weekend. So unfortunately, this means the snapshot is looking at Big 12 teams before they actually completed their regular season (at least from 2014-2016). I don't really have an elegant solution for this problem, so at this point it is what it is.

No idea whether anybody actually cares about any of this crap, but it's a side project I've been working on for a while (because I'm a nerd) and I have nowhere else to share it.
Posted by BallChamp00
Member since May 2015
7401 posts
Posted on 1/3/26 at 2:18 pm to
I think the 400th 0 is out of place and should be replaced by .05 just to make sure nobody is lost.
Posted by blacroix
Member since Sep 2019
549 posts
Posted on 1/3/26 at 3:34 pm to
quote:

because I'm a nerd


You got that right!
Posted by Yeti_Chaser
Member since Nov 2017
11916 posts
Posted on 1/3/26 at 3:40 pm to
Got any cliff notes?
Posted by Lptigerfan
Jeff Davis Parish
Member since May 2015
815 posts
Posted on 1/3/26 at 4:10 pm to
Is this a dissertation for a Math Statistics PhD ?
first pageprev pagePage 1 of 1Next pagelast page
refresh

Back to top
logoFollow TigerDroppings for LSU Football News
Follow us on X, Facebook and Instagram to get the latest updates on LSU Football and Recruiting.

FacebookXInstagram