Tuesday, February 09, 2010

VORP, EqA and the folly of numbers, baseball stats that go nowhere


    I am a baseball fan who kind of likes box score stats. Not that stats say everything about a team or player but it is a natural to want to keep watching a stat like batting average or home runs or with a pitcher, earned run average, or how many times a team has beaten another team in a given year. Those are the stats that I like to keep track of and it is what has made fantasy leagues more appealing as well. Fast forward into the recent days of ‘new age’ stats, these are stats that a lot of people who are fans of baseball have never heard of. Advanced stats that use complicated methods of math to come up with numbers that are supposed to mean something. Bill James has made a lot of money each year publishing books full of numbers and stats that would make your head spin, and a lot of it seems like snake oil numbers to me.

     New coined stats such as VORP, which means: (from baseballprospectus.com)
Value Over Replacement Player. The number of runs contributed beyond what a replacement-level player at the same position would contribute if given the same percentage of team plate appearances. VORP scores do not consider the quality of a player's defense.

        Ok, I can sort of follow what that stat might tell me. How much better a value of a player a certain guy is over another player on offense. I’ll buy this one in a way. I’ll look at the number VORP a player has though and wonder what all went into making it up. Consider me at this point, very skeptical at what the stat and quality of it’s makeup actually means.

          Then you have the ‘way out there’ ‘new age’ stats like this one:

  EqA – the meaning and computation goes like this (also from Baseball prospectus.com)

Equivalent Average. A measure of total offensive value per out, with corrections for league offensive level, home park, and team pitching. EQA considers batting as well as baserunning, but not the value of a position player's defense. The EqA adjusted for all-time also has a correction for league difficulty. The scale is deliberately set to approximate that of batting average. League average EqA is always equal to .260.
EqA is derived from Raw EqA, which is
Any variables which are either missing or which you don't want to use can simply be ignored (be sure you ignore it for both the individual and league, though). You'll also need to calculate the RawEqa for the entire league (LgEqA).
Convert RawEqA into EqR, taking into account the league EqA LgEqA, league runs per plate appearance, the park factor PF, an adjustment pitadj for not having to face your own team's pitchers, and the difficulty rating. Again, you can ignore some of these as the situation requires. xmul can simply be called "2", while the PF, diffic, and pitadj can be set to "1".
EQAADJ=xmul*(RawEqa/LgEqa)* ((1+1/diffic)/2) + (1-xmul)
To get the final, fully adjusted EqA, we need to place this into a team environment.
This is an average team:
AVGTM=Lg(R/Out)*Lg(Outs/game)*PF*Games*(DH adjustment)
The DH adjustment is for playing in a league with a DH. "Games" is the number of games played by this player.
Replacing one player on the average team with our test subject:
Get pythagorean exponent
Calculate win percentage
Convert into adjusted space, where the Pythagorean exponent is set to 2.
Fully adjusted EqR:
EQR=.17235*((NEWTM-1)*27.*Games + Outs)
Fully adjusted EqA
EQA= (EQR/5/Outs)** 0.4

             Now this stat is deep, but it’s one that I wouldn’t give a second glance to, it’s too complicated and what does it prove? Do you see the computation above? This isn’t rocket science, it’s baseball. I don’t care for Algebra and am not a fan of these type of wild computations. It sells books, but in reality, it’s a bunch of useless computations that go nowhere. There are many, many, more of these zany brainy computations with names that go with them. There have been recent reports of teams that have bought into the value of some of these numbers and stats.I don't see how you can give much credence to this.

             Give me a team’s press notes, some stats from the box scores, and I’m good to go, What do I care if Player A’s numbers are 1.2354789 and Player B’s numbers for the same thing is 1.2235647 – am I supposed to think less of Player B with these Sabermetrics? I don’t think so.


  1. Rich,

    To steal a line from one of my favorite bloggers of all time, "here's the thing about statistics, which to me seems self-evident, but to pseudonymous blowhards might not: you don't have to use them, if you don't want to." - Ken Tremendous

    If you don't like these complicated statistics, and inventive ways of thinking about baseball, don't worry about them. However, just because you don't like them, don't discredit them because they require a little bit of brain power.

    They are not magic. Do you think it's a coincidence that the Red Sox hired Bill James around 2003, and subsequently won the world series in 2004 and 2007? Additionally, they have been competitive ever since.

    Same goes for the A's under Billy Beane. Is it a coincidence that with a team salary under $50 million that they won over 100 games twice?

    Also, while I have not done the math myself, I have seen the research. It is easy to check if these numbers are meaningful. All that is necessary is to test the correlation between a large number of samples against the goal of offense in baseball (scoring runs), and you can see for yourself.

  2. The validity of all the calculations in these stats are leaving out the human factor in my opinion. Maybe the player didn't feel well on certain games, he may have been slightly injured at the time a play may not have been made, causing his numbers on these certain stats to be skewed.

    My point of the article was basically how much credence can you put on these type of stats? I've leafed through books of them year after year myself and it just kind of leaves me in doubt at most of the real complicated stats, and as they say with any numbers problem (and something I learned even in computer programming), garbage in- garbage out.

    Thanks for the comment though!

  3. Rich,

    "The validity of all the calculations in these stats are leaving out the human factor in my opinion. Maybe the player didn't feel well on certain games, he may have been slightly injured at the time a play may not have been made, causing his numbers on these certain stats to be skewed."

    That's the point of stats. They're supposed to take the human element out of it (even though it probably balances out over time anyway). It is the exact same thing for the stats you like (AVG, ERA, etc). The thing that anyone who knows anything about statistics knows is that small sample sizes are bad (you're occasional day where a player doesn't feel well or w.e) that's why these statistics are compiled over seasons and seasons and seasons worth of games. So those small aberrations are obliterated.

    Also, you can put more credence into these stats because they are more highly correlated with winning. It has been proven by testing results with the values. Just for the record, when it comes to "simple" stats, OBP and SLG are MUCH, MUCH, MUCH better barometers than AVG.