Monday, February 20, 2012

Analyzing college football's biggest games: part 2

In part 1 of this piece, we saw:
  • the ER model of the data
  • the MySQL implementation of the model
  • how the data was analyzed using a 3-layer view approach
In this second part, we're going to look at some of the data from the analysis, and answer these questions:
  1. which year were the games the most exciting?
  2. which year were the games the least exciting?
  3. did this past BCS season see any total scoring records set?
  4. what was 2004's claim to fame?

BCS bowl games, by year

Here's the data in chart form:













And here's that same data in tabular form, which we will call "year_report":

+------+----------------+---------------+---------+--------------+
| year | points, winner | points, loser | margin  | total points |
+------+----------------+---------------+---------+--------------+
| 1999 |        29.0000 |       17.7500 | 11.2500 |      46.7500 |
| 2000 |        32.2500 |       23.2500 |  9.0000 |      55.5000 |
| 2001 |        31.2500 |       13.7500 | 17.5000 |      45.0000 |
| 2002 |        44.5000 |       21.7500 | 22.7500 |      66.2500 |
| 2003 |        32.2500 |       17.0000 | 15.2500 |      49.2500 |
| 2004 |        25.0000 |       17.5000 |  7.5000 |      42.5000 |
| 2005 |        36.0000 |       19.0000 | 17.0000 |      55.0000 |
| 2006 |        34.7500 |       29.0000 |  5.7500 |      63.7500 |
| 2007 |        36.2000 |       20.2000 | 16.0000 |      56.4000 |
| 2008 |        40.0000 |       20.0000 | 20.0000 |      60.0000 |
| 2009 |        27.4000 |       16.6000 | 10.8000 |      44.0000 |
| 2010 |        31.0000 |       17.2000 | 13.8000 |      48.2000 |
| 2011 |        32.4000 |       19.2000 | 13.2000 |      51.6000 |
| 2012 |        40.0000 |       25.8000 | 14.2000 |      65.8000 |
+------+----------------+---------------+---------+--------------+

Now let's get to work on our quiz.

Question 1:  which year's games were the most exciting?

To answer this, we need the year where the margin of victory was smallest.

So let's try:

select * 
from year_report 
order by `margin` asc 
limit 1;

And let's see what that gives us:

+------+----------------+---------------+--------+--------------+
| year | points, winner | points, loser | margin | total points |
+------+----------------+---------------+--------+--------------+
| 2006 |        34.7500 |       29.0000 | 5.7500 |      63.7500 |
+------+----------------+---------------+--------+--------------+ 
 
It looks like 2006 was the best year to watch the BCS games -- on average, less than a touchdown separated the victors from the vanquished!

Question 2: which year's games were the least exciting?

This is very similar to question, so we'll slightly modify our query to

select * 
from year_report 
order by `margin` desc    -- now it's "desc" instead of "asc"!
limit 1;

Which returns:

+------+----------------+---------------+---------+--------------+
| year | points, winner | points, loser | margin  | total points |
+------+----------------+---------------+---------+--------------+
| 2002 |        44.5000 |       21.7500 | 22.7500 |      66.2500 |
+------+----------------+---------------+---------+--------------+

Wow, look at that!  In 2002, the winners had more than twice the points of the losers, and were winning by more than three touchdowns!  Snore!

Question 3: did this past BCS season see any total scoring records set?

 Let's try:

select * from year_report order by `total points` desc

Which returns:

+------+----------------+---------------+---------+--------------+
| year | points, winner | points, loser | margin  | total points |
+------+----------------+---------------+---------+--------------+
| 2002 |        44.5000 |       21.7500 | 22.7500 |      66.2500 |
| 2012 |        40.0000 |       25.8000 | 14.2000 |      65.8000 |
| 2006 |        34.7500 |       29.0000 |  5.7500 |      63.7500 |
| 2008 |        40.0000 |       20.0000 | 20.0000 |      60.0000 |
| 2007 |        36.2000 |       20.2000 | 16.0000 |      56.4000 |
| 2000 |        32.2500 |       23.2500 |  9.0000 |      55.5000 |
| 2005 |        36.0000 |       19.0000 | 17.0000 |      55.0000 |
| 2011 |        32.4000 |       19.2000 | 13.2000 |      51.6000 |
| 2003 |        32.2500 |       17.0000 | 15.2500 |      49.2500 |
| 2010 |        31.0000 |       17.2000 | 13.8000 |      48.2000 |
| 1999 |        29.0000 |       17.7500 | 11.2500 |      46.7500 |
| 2001 |        31.2500 |       13.7500 | 17.5000 |      45.0000 |
| 2009 |        27.4000 |       16.6000 | 10.8000 |      44.0000 |
| 2004 |        25.0000 |       17.5000 |  7.5000 |      42.5000 |
+------+----------------+---------------+---------+--------------+
14 rows in set (0.05 sec)

Ooh what a bummer!  We missed out on setting a new scoring record by 0.45 points per game!  That's 0.225 points per team per game!  In fact, if LSU had managed a single field goal against Alabama in the championship game, there would have been an average of 66.4 points per game, setting a new record!

Question 4: what was 2004's claim to fame?

How about a little hint:  you can see it in the answer to Question 3.  Did you get it?

Here it is:  in 2004, the fewest points per game were scored.  Not only that, but the winning teams scored a paltry, record-low 25 points per game; in fact, the winners in 2004 scored fewer points per game than the losers in 2012.  Makes you wonder whether defenses are getting worse or offenses better.

In fact, let's find all the years where winners scored less than losers in another year:

We'll use this query:

select l.year as `winner year`, r.year as `loser year`, l.`points, winner`, r.`points, loser` 
from year_report l 
inner join year_report r 
  on l.`points, winner` < r.`points, loser`;


Which results in:
+-------------+------------+----------------+---------------+
| winner year | loser year | points, winner | points, loser |
+-------------+------------+----------------+---------------+
|        2004 |       2006 |        25.0000 |       29.0000 |
|        2009 |       2006 |        27.4000 |       29.0000 |
|        2004 |       2012 |        25.0000 |       25.8000 |
+-------------+------------+----------------+---------------+
3 rows in set (0.05 sec)

Wow! There are three cases where winners have done worse than losers (in a different year, of course!)!