They Aren’t Who We Thought They Were: Why the NFL is So Difficult to Analyze

Oct 20, 2013; Indianapolis, IN, USA; Indianapolis Colts quarterback Andrew Luck (12) throws a pass as he is hit by Denver Broncos defensive tackle Terrance Knighton (94) during the first half in the game at Lucas Oil Stadium. Mandatory Credit: Thomas J. Russo-USA TODAY Sports

Let’s play a game.

Who’s the better all-time quarterback? Dan Marino, Brett Favre or Steve Young? How about Johnny Unitas, Joe Montana or John Elway? Good luck reaching a consensus. A quick google search reveals that respected analysts from around the country and across the globe have engaged in this exercise, each of them reaching different conclusions in some form or fashion, often dramatically so. What do you weight more? Super Bowls or MVP awards? Wins or quarterback rating? Signature wins or signature losses? Everybody will have a slightly different answer to these questions and with good reason, it’s a subjective exercise with subjective methodologies and as such, invariably subjective results.

Of course that’s just one position of 22 on a field, and that figure doesn’t include kickoff and punt coverage teams, punt returners, kick returner, field goal snappers, holders and kickers. All of whom are important cogs in the winning or losing of an NFL game.

The NFL is often referred to by players as “the ultimate team game” and yet those same players often, upon retiring into life as an analyst, fall into the same rhetorical traps that so many fans do, pitting this player against that player, comparing stats side-by-side as if somehow equivalent statistical totals represent equivalent ability.

This, we know, is an illusion. It’s a mirage created by sports culture which necessitates instant debate and firm, unequivocal conclusions based on limited evidence. But the NFL isn’t like other sports, and I’m going to tell you why.

Small Sample Sizes

82, 162, 82, 16

No, this isn’t the start of a LOST joke (at least not a good one, assuming such a thing exists), those are the number of games in the regular season of the four major American professional sports leagues, NHL, MLB, NBA, and NFL respectively. I don’t think I need to point out that one is not like the others.

Sample size is a popular buzz word in NFL analysis. I think for many though it just kind of goes over their head, a term that makes sense as two words put together but the significance of which is largely misunderstood. In statistics, a small sample size is another way of saying that the data does not include enough information to properly draw meaningful inferences from it. Yet in the NFL this is often used as a sort of disclaimer, a footnote, “It’s a small sample size BUUUTTTT, here’s why RGIII should be benched.”

Think about the absurdity of Bulls fans and media suggesting Derrick Rose should be benched two games into the NBA season because he’s looked rusty coming off an ACL recovery, yet this is exactly what happens in the NFL.

Yes, when you only play 16 games in a season every one of them takes on added importance, but that’s not the same as saying that the outcome of every game is also somehow more significant as a representation of future performance. Nobody is losing their heads after the Miami Heat lost to a far inferior Sixers team, they simply accept it as one game in many, but in the NFL? Good luck getting a levelheaded analysis if Denver lost a game to Jacksonville (hell, Denver was roundly questioned after failing to cover a 28 point spread in that game).

Consider the 2006 NFL season in which the Colts lost an embarrassing game to Jacksonville in week 14, 44-17, giving up nearly 400 rushing yards in the process despite the opposing QB posting a QBR of 8.7. Following that game Bill Polian famously told Jim Irsay, “It’s over.” A month and a half later they hoisted the Lombardi Trophy as Super Bowl champions.

Was this some crazy fluke? Absolutely not. The NFL is filled with countless examples of heavily favored teams losing in the playoffs, it’s the nature of a single elimination format and a sample size too small to make reliable predictions; fantastic drama, but the “best” team rarely wins the championship.

If you want to know what an MLB season might look like if it was only 16 games long, here you go.

The Complexity of the Sport

The NFL Rulebook is a mind-numbing 121 pages long, and we’re not talking brochure pages with 16 point font, we’re talking 121 8×11 PDF pages with font so small you need a magnifying glass.

But aside from the rules, the game itself is staggeringly complex. I was once asked to write a “brief” explanation of the NFL game for someone who knew nothing about football, the final manuscript was 3,000 words and I didn’t even touch on scheme, formations, personnel packages and any number of other things.

Let’s say the outcome of a play is an incomplete pass. Your friend, who has never seen football before, asks you, “Hey (insert your name), what just happened?” The simple answer of course is, “The quarterback tried to throw the ball to his receiver but it was incomplete.” Great. But we know that this response barely scratches the surface of what may or may not have happened to cause that relatively pedestrian incompletion.

Let’s ignore all the preparation and game planning that go on during the week leading up to a game, which could, and does, fill volumes, here’s just a small sample of things that happen on just about every play in the NFL.

Pre-snap: The offensive coordinator has to establish based on down and distance, position on the field, current situation in the game, defensive tendencies, and his own player personnel (i.e. injuries during a game, unexpected protection issues, etc.) what play to call, and he has about 5 seconds to make that decision. The defensive coordinator is doing the same. This is the chess match of football.

Assuming the play is relayed correctly, and all the players in the huddle understand said call (something that is far from a given in a thunderously loud stadium), the quarterback then reads the defense’s response to the called formation. It then becomes his responsibility to ensure that he makes the correct read based on defensive personnel and formation. If the defense is in a nickel coverage the QB may need to audible into a run, for example.

The center is also making reads, calling out possible protection shifts, identifying the mike, and making adjustments based on information that may or may not be a feint by the defense. Wide receivers are responding to coverage as well, often times having several route options on any given play based on reads of the defense. If the QB, center, and WRs aren’t on the same page disaster can often ensue, a wrong route or incorrect read a potential game changing pick-6.

For their part, the defense is also reading the offense. It’s a never ending contest of strategy, perception and skill.

During the play: All that is just the pre-snap read. Now comes the fun part. Assuming all goes according to plan, now you have to execute it. A perfect play call is meaningless if the execution is sloppy, in fact the Colts spent years running the same concepts over and over and just did so with such precision execution that it was still effective.

Certainly, different positions on the field have more of an impact than others. The quarterback, for example, can overcome a lot of mistakes made by other players on a play, but when analyzing outcome it’s impossible to ignore the importance of a team’s collective success or failure.

We know this play resulted in an incomplete pass (remember our hypothetical football ignorant friend?), but why that pass was incomplete could be attributed to many factors, often ones known only to the coach and players. Did the quarterback and center correctly identify the protections? Did the offensive line and running backs execute those protections? Did the QB throw the ball in the proper location based on the correct pre-snap read of the defense? Did the WR run the correct route? Perhaps the defender just made an excellent play, perhaps the pass was just slightly off target, or maybe the receiver just dropped the ball.

The possibilities here are nearly endless, but the point in all this is that any given play in any given game does not exist in a vacuum. Certainly this is true of other sports as well, but in football this fact is amplified tenfold. A single mistake by any player on the field can domino into a failed play, which in turn can domino into a failed series, which can domino into a lost game, and with 16 games to a season every one is precious. Then again, sometimes a failed play is just a failed play.

All plays are not created equal and not all outcomes should be weighted as such, but that is the conundrum of NFL analysis. Saying, “We just can’t know for sure” simply isn’t good enough for fans demanding answers and expecting solutions.

Numbers Lie

In the wake of this complexity, many advanced statistical techniques have cropped up in an attempt to quantify performance in a sport with nearly infinite variability. Websites like Football Outsiders, Advanced NFL Stats, ESPN Stats and Information, Cold Hard Football Facts, Pro Football Focus, and of course Pro Football Reference, to name but a few, have taken to measuring the most minute details of every NFL snap in hopes of determining exactly what leads to winning and how it can be predicted in the future.

These tools are invaluable in getting to the heart of what really happens on an NFL football field, without them the task seems almost futile, like taking shots in the dark.

But as useful as these tools undeniably are, none of them are truly capable of cutting through the veil of complexity that shrouds the NFL game, and even when analyzed collectively can sometimes create more confusion than clarity. Trying to parse out the roles of coach vs. player in a given play is hard enough, parsing out player vs. other player is even more difficult.

Take Andrew Luck for example. Depending on which metric you use to measure overall QB performance he falls anywhere from 4th to 11th, and that doesn’t even include traditional metrics like passer rating. It’s also unclear, given the proprietary nature of many of these metrics, exactly how these determinations are reached or what factors are ignored. Does a QB with a porous offensive line get some kind of bonus for his performance? Does an INT that bounces of a WR’s hands weight equally to one that was thrown directly to the defense? Sometimes the answer appears to be yes, sometimes no. Then of course there are things that stats cannot possibly account for, such as an incorrect route by a WR resulting in an anticipated throw that’s intercepted.

Aside from the unknown nature of many of these measurements there’s also the sheer volume. Here’s a quick, and incomplete, list of advanced WR metrics:

Catch rate, drop rate, route depth, yards per route run, estimated points added, yards after catch, yards after catch per reception, target percentage, wide receiver rating, deep pass target percentage, deep pass catch rate, yards above replacement, defense-adjusted yards above replacement, defense-adjusted value over average, effective yards, win probability added, estimated points added, success rate.

That’s just WR and there are many I left off, good luck making a list for every QB metric out there.

Arguments for or against a player can almost certainly be made by picking and choosing whichever metrics fit your particular point of view, a practice that is so prevalent among fans and analysts that it’s just expected at this point.

And these are just individual metrics, team metrics that try to quantify entire team performance have also proliferated prodigiously in recent years. While certainly better than win-loss record alone, the NFL game is just too complicated and unpredictable to accept any one number as representing anything other than a best guess based on a small sample.


The purpose of this article was not to argue that all NFL analysis and discussion is a futile exercise in intellectual self-aggrandizing (though admittedly that’s what a lot of it is), but rather to illustrate, in some small way, the caution one must take in drawing conclusions from data with so many inherent flaws and stemming from such incredible complexity.

Further evidence could certainly be proffered, such as the surprisingly high failure rate of draft picks, or the fluctuation in player performance from one season to the next, but I’m willing to bet at this point you’ve heard enough.

The NFL game is stacked with drama, fascinating in its strategic diversity, and as much fun to watch as anything on TV, but as a member of a fan community and a writer myself (however humble) my fervent hope is that the future of NFL analysis will be marked by measured criticism and not reactionary pulp. We can all have fun debating who the best QB of all-time is or which team is going to win the Super Bowl, but it’s worth remembering that room for merited dissension is not only an essential element of human discourse but a necessary and warranted aspect of any NFL discussion.

You can follow me on Twitter here –