Update 1: Complex Invasion Sports and Complex Models

Having now come back from my two years in Scotland as part of the Joint Degree programme, I am excited to get started on my summer research on the added value of defensive players in the NFL. I have intended to begin with a literature review so that I can see the variety of approaches researchers have taken to analyze added value in sports. Having little previous work on NFL players, and even less on the defensive side of things, to look back on, I have had to look at the first approaches made in baseball and the extended works looking at “complex invasion” sports, sports where there is a lack of discrete outcomes and a high level of interdependence among players (i.e. soccer, basketball, football, etc.).

There is one major similarity that runs through all these approaches: the so-called Moneyball approach. This is the idea that the actions of players can be connected to the overall goal in winning in ways that sports organizations can derive benefit. For baseball, this approach led to discovery that the rate at which a player reached base was both a reliable predictor of games won and was systematically undervalued in the MLB. Thus, the Oakland Athletics, early adopters of this approach, were able to construct a team that won more games than the New York Yankees for a fraction of the cost.

Since then, teams in other sports leagues have tried to find such an easy indicator for success, partly because the difference between baseball and complex invasion sports is great. In soccer, there are trillions of combinations of players and passes that could result in the same score line. The final score line for a team is also a byproduct of the entire team’s ability to move the ball. However, a final score of 1-0 in baseball can be the product of one player hitting a home run. As a result, the models for complex invasion sports have to be slightly more complex than what has been traditionally used in baseball in order to isolate the effect of a singular player. I have seen approaches using logistic regression (Yurko et al 2018), computer algorithms to analyze space control (Rein et al. 2017),  and linear combinations of box-score statistics (Arthur 2017). It is harder, perhaps impossible, to find a singular statistic or rating to accurate connect a player’s actions with victory in complex invasion sports.

As a result, this first part of my research has been slightly overwhelming. There are a lot of statistical models that I have examined, yet I have no idea if they are robust or contain useful information I can use for my own model. The most I can do right now is to continue to read, and then to categorize the material in ways that I can utilize. I expect by my next post I will have a clearer idea of the direction I want my model to take.