I really don't like the word Blog: Statistics

Showing posts with label Statistics. Show all posts

Tuesday, March 09, 2010

Tableau and Football Dynasties

I have been working on two of the things that I like at once, football and spiffy graphics.

The above graph is made from Tableau, a free data visualization program. The data is my NFL Dynasty data (more about it here from last year). From the graph we see that the Colts, even though they lost the Super Bowl, are now the "biggest" team in the NFL beating out the plummeting Steelers.

Click around on the graphic it should have interactive filters.

Wednesday, February 25, 2009

Building a Better Football Poll

One of the things I have been thinking about lately is a way to build a completely objective football ranking system that accounts for the quality of play on the field and can not be "gamed" by coaches.

The original idea of a way to include quality of win was by looking at the win margin. It is obvious that a team that won by a large margin is much better than a team that won by only a small margin against the same team. However football coaches knew that this was a factor so when they played easy teams the would pile on the points to help out their rankings. I believe now the computer components of the BCS are not allowed to take the margin of victory in to account when they make their rankings. So this lead me on to think about ways we can measure how good a team is, more than by just who they beat.

The first measurement I want to propose is to look at the "Time of Win". That is, how much time was left on the game clock when the eventual score that won the game was made. So if a team runs back the opening kickoff and then holds their opponents to less than a touchdown for the whole game, their Time of Win would be essentially 60 minutes, where as a hail marry at then end of the game would result in a Time of Win of a couple of seconds. This measurement would allow polls to take into consideration how much one team dominated another without encouraging the scoring to spiral out of hand. It would, however, make the teams play defense throughout the game.

I thought of a second good measurement last night while laying in bed: Offensive Yards per Point. This number would incorporate a few aspects of a teams performance into one figure. It would measure how well a team can finish off long drives, reward teams for creating turnovers, and also give a bonus for a team that has a good kick return game. It also would be somewhat difficult to game because scoring as many points with the fewest amounts of yards obtained is already a primary goal for coaches, so they can't really make much of a game plan change to reflect this being measured.

Other factors that I would consider would be power rankings, like the Massey rankings, which looks at who beats who. Also if the game was home or away. I would also like to include a couple of other measurements but it is kind of hard to think about relevant ones that can't be taken advantage of. Anyone else have any good ideas for ways to build a better football polling system?

Wednesday, February 11, 2009

Sports Dynasties

Some questions that I have when thinking about what makes a dynasty in sports.

1. Can there be more than one dynasty at a time?

2. How important is the regular season, how important is the playoffs?

3. You can't just look at one year at a time, so how many years do we look at?

4. Is winning the championship the only quality that makes a dynasty?

5. Can a dynasty last for only one season? (This question is somewhat related to the first question since only allowing one dynasty at a time means that some dynasties will get clipped to a very short time frame.)

I will give my answers soon to these questions when I think about them a bit more.

Sidenote: Is there anything that Obama can't do?

Thursday, October 16, 2008

Wanna Bet?

Lately I have been working on various ways to predict scores in football for my fantasy football league. But since I am trying to predict something that doesn't really happen (i.e. fantasy football games) I can always claim that the models that I predict are right. So I decided to give a whirl at predicting some NFL football games versus the spread. So lets see how this goes.


Game    M1  Conf. M2 Conf.
SD@Buf  Buf .36   Buf .73
NO@Car  NO  .25   NO  .66
Min@Chi Min .28   Min .52
Pit@Cin Cin .78   Pit .09
Ten@KC  KC  .20   KC  .75
Bal@Mia Mia .12   Mia .95
SF@NYG  SF  .44   SF  1.00
Dal@StL Dal 1.00  Dal .30
Det@Hou Det .31   Det .32
Ind@GB  Ind .45   GB  .45
NYJ@Oak Oak .47   NYJ .39
Cle@Was Was .74   Cle .09
Sea@TB  TB  .89   Sea .70
Den@NE  NE  .79   NE  .20

M1 and M2 are based on the same model, one just includes blocking on team defense. Conf. is the scaled confidence each pick has for the method.

I wonder if I can pick better than chance doing this.

Update: If I was going to bet $100 multiplied by my confidence, I would have ended up this week with the totals of: -$322 from the M1 system, and -$199 from the M2 system. No so good. I'll give it another try this week.

Friday, September 19, 2008

Pro Screw Yourself

My job is sort of like a statistical game show, where I get asked random questions about random statistical things. Most the time I don't have a problem, but sometimes I have no clue about the procedure the customer is asking about and I have to do a bit of research.

Unfortunately if I haven't heard about a procedure at all it means that it is either new or not used. And unfortunately that means that information on it is limited. To add further complications the information that I can find is usually provided by somebody who holds the idea dear. And to add one more layer of suckage, if the idea is flawed then the person will compensate by trying to complicate the idea by using fluff words to describe the process. And since that is the only resource for the topic I must read it and sift through the plies of crap words.

For example today I did some searching about a topic that some economics developed. The person writing the article felt that he needed to sound smarter than the average bear so he kept on using the term "pro rata", which I have never heard before, but It turns out that pro rata means in proportion. Would it have been that hard to just say in proportion?

General rule of thumb: "If the reader understands it, it is written correctly. If there is any confusion then it is written incorrectly." Inserting pro rata and a priori and mumbo-jumbo does not make you prove your point, it just confuses the reader about what you are actually doing.

I guess that might be the point.

Monday, September 08, 2008

Waist high and knee deep

I currently am [title] in NFL data. I figure that I need a data set that I know the quirks of so I can do some better testing, so I have decided to type up the NFL data for this year. Since week one is almost done I have enough data to start playing around so I have been seeing how things look. Here is an example of a graph I have made for factor loadings to predict wins:

Basically the way to road this graph is that the further right you go the more that stat helps you win NFL games, so Rushing Attempts are good, Kick Return Yards are bad. (Think about why getting Kick Return Yards are bad, it makes sense.)

In other news: Lollipop by Mika is a song that is one huge hook and nothing else, I'm not saying that the song is good, but it is brilliant. And to top it off the music video is a visual version of the song. This is as close to candy as you can get while listening to music (diabetics beware):

Thursday, February 28, 2008

The more you know.

The more I know about science the more I know scientists don't do it. And usually it isn't that bad of a thing, if we go down the wrong path for a couple of years it's no big deal. However, when there is a political agenda behind the science, the the scientists themselves are in a position where if they find out they are wrong they are out of a job we have a really good combination for really bad science. Can anybody guess which science I am talking about? That's right, the one that is put in charge of proving that man is causing global warming. This is the same one that predicted that there would be significantly more hurricanes last year, and that 2007 would be the hottest on record.

I don't really think the scientists are being devious when they make their conclusions, but I do think that they are selectively ignoring some facts while using shaky statistics to prove their ideas. A really good example of a guy that knows enough statistics to get himself into trouble is this guy. My main problems with his analysis is that he finds out which lag to use by the one that fits the model the best, and then goes on to say that after changing the data to the best lag the data fits the model really well. Well duh.

Also if you look at his final projection of the temperature it is quite within the bounds of the prediction that we will always be cooler (or at least the same temperature) as right now, i.e. the best model still does not predict a significant raise in temperature.

I wish it was just this one guy, but the more I learn about statistics the more I see how they are just using it to prove their point, not to discover the truth.

Wednesday, November 14, 2007

MCMC Hammer

After working on some personal research related to music, I think I have found a way to better explain the MCMC method. Below is a digram I have made that might just revolutionize the world of music and statistics:

I have so much faith in this that I have opened up a CafePress store for it. Check it out.

(Thanks Nicole for the idea! More about the MCMC method, more about MC Hammer.)