September is coming and the NFL fans are excited about the arrival of another season. Also, many Fantasy players must be anxiously awaiting the long-awaited arrival of the Fantasy Draft. I was one of them, and with little time to follow the news of the busy offseason, I decided to use my Data Science skills to optimize my choices.

My idea was to create a better predictor of performance than the most used (such as FantasyPros predictor) using Machine Learning, and then use statistics to predict whether picking a player in a given round is a good option or not, based on his points’ projection.

So, the project pipeline was:

- Get data from NFL players’ past seasons (2018–2022, at least).
- Process this data to create relevant features for each position.
- Create an ML model for each position (QB, RB, WR, and TE).
- Calculate the probability of a player being selected in each round and evaluate if it’s a good pick based on his Projected Points.

All the code of the project is on the GitHub Repo, if you want to follow in details the implementation.

My first step was to get the data and process to create features for the ML model. I decided to get all the data from the FantasyPros website. They are probably the most famous website and provided everything I needed:

- Average Draft Position (ADP) of players in the following season and the past seasons, to create the model and to predict if it’s a good pick.
- Stats for the players for each position, to create the model.
- A points projection for the past season, to be used as a comparison to test my model.

Since they provided ADP data from season 2018 until the last one, 2022, I got these as my seasons to train the model.

After getting the data, the next step would be processing it. My idea was to use as a target the Points Per Game (PPG) value, so I weighted all the columns by the games played in the season for each player. With that, I created for each season, from 2018 to 2022, the following features:

- Simple Moving Average (SMA) of each stat for the past 3 seasons.
- Exponential Moving Average (EMA) of each stat for the past 3 seasons.
- YDS/ATT, TD/ATT for rushing stats.
- YDS/REC, YDS/TGT, REC/TGT, and TD/TGT for receiving stats.
- ADP and how many years of career.
- SMA and EMA of Fantasy Points for each player.

The goal was to use the above features as a baseline, see if they had predictive power, and create new features for each model specifically.

For the modeling step, I did a standardized analysis based on seeing the correlation between the baseline features I created and the target, and the correlation between the weighted version of the baseline features by the ADP, EMA Fantasy Points, and SMA Fantasy Points. I can summarise the results by:

- Weighing features by past Fantasy Points were effective for all positions.
- Weighing features by season ADP was effective only for the WR position.

The idea of weighing the features came to me because the stats of ADP and SMA/EMA of past Fantasy Points were significantly correlated with the target and the pure stats of the player weren’t. With the weighing, I could see a clear improvement in feature relevance for the stats. Below is the difference for some WR stats.

With the features done, I started to develop the models. For all positions the Random Forest models were the best, changing only the parameters, as shown below.

And, to assess model performance I compared the results for the 2022 season with the projected of FantasyPros. The result for the RB position can be seen below:

The comparison shows that my model performed better, with lower error and a higher R2 Score, for the RB position. This is what we wanted to use as a baseline for our draft.

The only remaining step to predict the performance of the players in the 2023 season was dealing with the rookies. Since they are in their first year in the league, we can’t calculate EMAs and SMAs for them. So, my idea was to replace their past performance with an average of the players closest to their ADP in the last 5 seasons. For example, Bijan Robinson’s ADP is 8.7, so I averaged the stats of the closest to 8.7 RB for the last 5 seasons to get an estimate of his performance. There isn’t a consensus on how to deal with rookies and this is one of the biggest challenges in NFL predictions.

After, I predicted the expected Points Per Game for the 2023 season and created a board with the projection for each player. The result is described in the image below.

The column AVG is the average ADP that the player currently has and the column STD is the standard deviation of ADP that the player has. Both are provided by FantasyPros and are used to calculate the probability of a player being drafted in each round.

To get an optimal draft we need to balance two things:

- Get the player with the highest projected points.
- Avoid getting players that will be available in the following round.

The first one is pretty obvious since we want to maximize the points that the team will make. The second is quite straightforward to understand too: even if he is a good player, if his chance of being available in the next round is high, it’s better to pick another player that is as good as but won’t be available in the future.

To target these needs I’ll work with two stats: VOR and Next Round Probability.

**What’s VOR? **Probably is the most used statistic to evaluate players in American leagues nowadays. It’s quite simple to understand: for each position, in this case, QB, TE, RB, and WR, we’ll have a player called “replacement player”, which can be chosen in many ways. I’ll choose my replacement players using the logic of the ADP threshold. It is, I’ll determine a threshold of ADP, and the closest player above this threshold by each position will be assigned as the replacement player. With the replacement player chosen, the calculation of the VOR is very simple. It’s the difference between Player Projected Points and Replacement Player Projected Points. I used as threshold ADP 100, and these are the Replacement Players’ projected points:

The positive thing about VOR is that it helps to capture the values of each position. The average expected PPG of a QB is obviously higher than an RB, but his replacement expected PPG is also higher, for example. That’s why is more common to get RBs over QBs, in Fantasy Drafts.

**Next Round Probability **calculation can be done using the cumulative normal distribution. This is easy to understand when we think that the Fantasy Draft is a probability distribution with each player as a point, having its average and standard deviation. That way, if we take the cumulative density function of the Draft distribution for a given player, we’ll have the probability of the player being picked before the target position, so just subtract 1 from this probability and we’ll have the probability of the player being available in the target position, which will be the next round position.

With these two statistics, I can create what I called as **VOR Loss**.

VOR Loss is basically how much of VOR I’m losing if I do not select the player in this round. It calculates the average VOR of the top 3 players in this position with more than 50% of chance of being available in the next round. The higher the VOR Loss of a player in a given round, the higher his value in this round. So, our last step is to create a framework to simulate the draft that calculates the VOR Loss round-by-round and keeps on track of how we are doing.

The logic is to receive as input the draft position, calculate VOR Loss, and display two tables: first containing players with more than a 30% chance of being available in the current round and second with the player with ADP inside this round. Below I show an example of these two tables using Position #5 as input.

The final task I did was the creation of a tracker that received the selected player as the input and stored his information and the accumulated VOR Loss for each position when you select him in this round. The result is shown below.

I can say that the project results were very satisfying. I had two goals with this, the first was to train my DS skills and the second was to buy time avoiding having to read multiple news and follow all updates of the offseason, and I think that I succeeded in both.

With the scraper of FantasyPros, I can track very easily all the changes in the draft logic, and also use the stats as a baseline for my Draft Framework. Also, I’m satisfied with the ML model I created. I wasn’t used to working with regression problems, so it was a very good project to train it.

As improvements, I think that we could target three things that would have a positive impact, but would increase a lot the difficulty of the project: automate the Draft Framework with live information from your draft website, improve the ML model with more deep stats, and create a predictor of NFL performance for rookies based on their draft and college stats. These are changes that take a long time to make but would improve a lot the complexity of the project.

Finally, if you want to see the code of the project in detail, this is the GitHub Repo. If you want to reach me, this is my LinkedIn. Thanks!