A Derivation Of The Pythagorean Won Loss Formula In Baseball

A Derivation Of The Pythagorean Won Loss Formula In Baseball

Abstract.
It has been noted that in many professional sports leagues a good predictor of a teamâ€™s end of season won-loss percentage is Bill Jamesâ€™ Pythagorean Formula RSobsÎ³RSobsÎ³+RAobsÎ³superscriptsubscriptRSobsÝ›Â¾superscriptsubscriptRSobsÝ›Â¾superscriptsubscriptRAobsÝ›Â¾\frac\rm RS_\rm obs^\gamma\rm RS_\rm obs^\gamma+\rm RA_\rm obs% ^\gammadivide start_ARG roman_RS start_POSTSUBSCRIPT roman_obs end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_Î³ end_POSTSUPERSCRIPT end_ARG start_ARG roman_RS start_POSTSUBSCRIPT roman_obs end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_Î³ end_POSTSUPERSCRIPT + roman_RA start_POSTSUBSCRIPT roman_obs end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_Î³ end_POSTSUPERSCRIPT end_ARG, where RSobssubscriptRSobs\rm RS_\rm obsroman_RS start_POSTSUBSCRIPT roman_obs end_POSTSUBSCRIPT (resp. RAobssubscriptRAobs\rm RA_\rm obsroman_RA start_POSTSUBSCRIPT roman_obs end_POSTSUBSCRIPT) is the observed average number of runs scored (allowed) per game and Î³Ý›Â¾\gammaitalic_Î³ is a constant for the league; for baseball the best agreement is when Î³Ý›Â¾\gammaitalic_Î³ is about 1.821.821.821.82. This formula is often used in the middle of a season to determine if a team is performing above or below expectations, and estimate their future standings.

We provide a theoretical justification for this formula and value of Î³Ý›Â¾\gammaitalic_Î³ by modeling the number of runs scored and allowed in baseball games as independent random variables drawn from Weibull distributions with the same Î²Ý›Â½\betaitalic_Î² and Î³Ý›Â¾\gammaitalic_Î³ but different Î±Ý›Â¼\alphaitalic_Î±; the probability density is

f(x;Î±,Î²,Î³)={Î³Î±((x-Î²)/Î±)Î³-1e-((x-Î²)/Î±)Î³if xâ‰¥Î²0otherwise.Ý‘Â“Ý‘Â¥Ý›Â¼Ý›Â½Ý›Â¾casesÝ›Â¾Ý›Â¼superscriptÝ‘Â¥Ý›Â½Ý›Â¼Ý›Â¾1superscriptÝ‘Â’superscriptÝ‘Â¥Ý›Â½Ý›Â¼Ý›Â¾if xâ‰¥Î²0otherwise.f(x;\alpha,\beta,\gamma)\ =\ \begin{cases}\frac{\gamma}{\alpha}\ ((x-\beta)/% \alpha)^{\gamma-1}\ e^{-((x-\beta)/\alpha)^{\gamma}}&\text{\rm if $x\geq\beta$}\\ 0&\text{\rm otherwise.}\end{cases}\ italic_f ( italic_x ; italic_Î± , italic_Î² , italic_Î³ ) = { start_ROW start_CELL divide start_ARG italic_Î³ end_ARG start_ARG italic_Î± end_ARG ( ( italic_x - italic_Î² ) / italic_Î± ) start_POSTSUPERSCRIPT italic_Î³ - 1 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - ( ( italic_x - italic_Î² ) / italic_Î± ) start_POSTSUPERSCRIPT italic_Î³ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_CELL start_CELL if italic_x â‰¥ italic_Î² end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL otherwise. end_CELL end_ROW
This model leads to a predicted won-loss percentage of (RS-Î²)Î³(RS-Î²)Î³+(RA-Î²)Î³superscriptRSÝ›Â½Ý›Â¾superscriptRSÝ›Â½Ý›Â¾superscriptRAÝ›Â½Ý›Â¾\frac{({\rm RS}-\beta)^{\gamma}}{({\rm RS}-\beta)^{\gamma}+({\rm RA}-\beta)^{% \gamma}}divide start_ARG ( roman_RS - italic_Î² ) start_POSTSUPERSCRIPT italic_Î³ end_POSTSUPERSCRIPT end_ARG start_ARG ( roman_RS - italic_Î² ) start_POSTSUPERSCRIPT italic_Î³ end_POSTSUPERSCRIPT + ( roman_RA - italic_Î² ) start_POSTSUPERSCRIPT italic_Î³ end_POSTSUPERSCRIPT end_ARG; here RSRS{\rm RS}roman_RS (resp. RARA{\rm RA}roman_RA) is the mean of the Weibull random variable corresponding to runs scored (allowed), and RS-Î²RSÝ›Â½{\rm RS}-\betaroman_RS - italic_Î² (resp. RA-Î²RAÝ›Â½{\rm RA}-\betaroman_RA - italic_Î²) is an estimator of RSobssubscriptRSobs{\rm RS_{\rm obs}}roman_RS start_POSTSUBSCRIPT roman_obs end_POSTSUBSCRIPT (resp. RAobssubscriptRAobs{\rm RA_{\rm obs}}roman_RA start_POSTSUBSCRIPT roman_obs end_POSTSUBSCRIPT). An analysis of the 14 American League teams from the 2004 baseball season shows that (1) given that the runs scored and allowed in a game cannot be equal, the runs scored and allowed are statistically independent; (2) the best fit Weibull parameters attained from a least squares analysis and the method of maximum likelihood give good fits. Specifically, least squares yields a mean value of Î³Ý›Â¾\gammaitalic_Î³ of 1.791.791.791.79 (with a standard deviation of .09.09.09.09) and maximum likelihood yields a mean value of Î³Ý›Â¾\gammaitalic_Î³ of 1.741.741.741.74 (with a standard deviation of .06.06.06.06), which agree beautifully with the observed best value of 1.821.821.821.82 attained by fitting RSobsÎ³RSobsÎ³+RAobsÎ³superscriptsubscriptRSobsÝ›Â¾superscriptsubscriptRSobsÝ›Â¾superscriptsubscriptRAobsÝ›Â¾\frac{{\rm RS_{\rm obs}}^{\gamma}}{{\rm RS_{\rm obs}}^{\gamma}+{\rm RA_{\rm obs% }}^{\gamma}}divide start_ARG roman_RS start_POSTSUBSCRIPT roman_obs end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_Î³ end_POSTSUPERSCRIPT end_ARG start_ARG roman_RS start_POSTSUBSCRIPT roman_obs end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_Î³ end_POSTSUPERSCRIPT + roman_RA start_POSTSUBSCRIPT roman_obs end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_Î³ end_POSTSUPERSCRIPT end_ARG to the observed winning percentages.

Key words and phrases:
Pythagorean Won-Loss Formula, Weibull Distribution, Hypothesis Testing

2000 Mathematics Subject Classification:
46N30 (primary), 62F03, 62P99 (secondary).

The goal of this paper is to derive Bill Jamesâ€™ Pythagorean Formula (see [Ja], as well as [An, Ol]) from reasonable assumptions about the distribution of scores. Given a sports league, if the observed average number of runs a team scores and allows are RSobssubscriptRSobs{\rm RS_{\rm obs}}roman_RS start_POSTSUBSCRIPT roman_obs end_POSTSUBSCRIPT and RAobssubscriptRAobs{\rm RA_{\rm obs}}roman_RA start_POSTSUBSCRIPT roman_obs end_POSTSUBSCRIPT, then the Pythagorean Formula predicts the teamâ€™s won-loss percentage should be RSobsÎ³RSobsÎ³+RAobsÎ³superscriptsubscriptRSobsÝ›Â¾superscriptsubscriptRSobsÝ›Â¾superscriptsubscriptRAobsÝ›Â¾\frac{{\rm RS_{\rm obs}}^{\gamma}}{{\rm RS_{\rm obs}}^{\gamma}+{\rm RA_{\rm obs% }}^{\gamma}}divide start_ARG roman_RS start_POSTSUBSCRIPT roman_obs end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_Î³ end_POSTSUPERSCRIPT end_ARG start_ARG roman_RS start_POSTSUBSCRIPT roman_obs end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_Î³ end_POSTSUPERSCRIPT + roman_RA start_POSTSUBSCRIPT roman_obs end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_Î³ end_POSTSUPERSCRIPT end_ARG for some Î³Ý›Â¾\gammaitalic_Î³ which is constant for the league. Initially in baseball the exponent Î³Ý›Â¾\gammaitalic_Î³ was taken to be 2222 (which led to the page_seo_title), though fitting Î³Ý›Â¾\gammaitalic_Î³ to the observed records from many seasons lead to the best Î³Ý›Â¾\gammaitalic_Î³ being about 1.821.821.821.82. Often this formula is applied part way through a season to estimate a teamâ€™s end of season standings. For example, if halfway through a season a team has far more wins than this formula predicts, analysts often claim the team is playing over their heads and predict they will have a worse second-half.

Rather than trying to find the best Î³Ý›Â¾\gammaitalic_Î³ by looking at many teamsâ€™ won-loss percentages, we take a different approach and derive the formula and optimal value of Î³Ý›Â¾\gammaitalic_Î³ by modeling the runs scored and allowed each game for a team as independent random variables drawn from Weibull distributions with the same Î²Ý›Â½\betaitalic_Î² and Î³Ý›Â¾\gammaitalic_Î³ but different Î±Ý›Â¼\alphaitalic_Î± (see Â§3 for an analysis of the 2004 season which shows that, subject to the condition that the runs scored and allowed in a game must be distinct integers, the runs scored and allowed are statistically independent, and Â§4 for additional comments on the independence). Recall the three-parameter Weibull distribution (see also [Fe2]) is

f(x;Î±,Î²,Î³)={Î³Î±(x-Î²Î±)Î³-1e-((x-Î²)/Î±)Î³if xâ‰¥Î²0otherwise.Ý‘Â“Ý‘Â¥Ý›Â¼Ý›Â½Ý›Â¾casesÝ›Â¾Ý›Â¼superscriptÝ‘Â¥Ý›Â½Ý›Â¼Ý›Â¾1superscriptÝ‘Â’superscriptÝ‘Â¥Ý›Â½Ý›Â¼Ý›Â¾if xâ‰¥Î²0otherwise.f(x;\alpha,\beta,\gamma)\ =\ \begin{cases}\frac{\gamma}{\alpha}\left(\frac{x-% \beta}{\alpha}\right)^{\gamma-1}e^{-((x-\beta)/\alpha)^{\gamma}}&\text{\rm if % $x\geq\beta$}\\ 0&\text{\rm otherwise.}\end{cases}italic_f ( italic_x ; italic_Î± , italic_Î² , italic_Î³ ) = { start_ROW start_CELL divide start_ARG italic_Î³ end_ARG start_ARG italic_Î± end_ARG ( divide start_ARG italic_x - italic_Î² end_ARG start_ARG italic_Î± end_ARG ) start_POSTSUPERSCRIPT italic_Î³ - 1 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - ( ( italic_x - italic_Î² ) / italic_Î± ) start_POSTSUPERSCRIPT italic_Î³ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_CELL start_CELL if italic_x â‰¥ italic_Î² end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL otherwise. end_CELL end_ROW (1.1)
We denote the means by RSRS{\rm RS}roman_RS and RARA{\rm RA}roman_RA, and we show below that RS-Î²RSÝ›Â½{\rm RS}-\betaroman_RS - italic_Î² (resp. RA-Î²RAÝ›Â½{\rm RA}-\betaroman_RA - italic_Î²) is an estimator of the observed average number of runs scored (resp. allowed) per game. The reason RS-Î²RSÝ›Â½{\rm RS}-\betaroman_RS - italic_Î² and not RSRS{\rm RS}roman_RS is the estimator of the observed average runs scored per game is due to the discreteness of the runs scored data; this is described in greater detail below. Our main theoretical result is proving that this model leads to a predicted won-loss percentage of

Won-Loss Percentage(RS,RA,Î²,Î³)=(RS-Î²)Î³(RS-Î²)Î³+(RA-Î²)Î³;Won-Loss PercentageRSRAÝ›Â½Ý›Â¾superscriptRSÝ›Â½Ý›Â¾superscriptRSÝ›Â½Ý›Â¾superscriptRAÝ›Â½Ý›Â¾\mbox{\rm Won-Loss Percentage}({\rm RS},{\rm RA},\beta,\gamma)\ =\ \frac{({\rm RS}-\beta)^{\gamma% }}{({\rm RS}-\beta)^{\gamma}+({\rm RA}-\beta)^{\gamma}};Won-Loss Percentage ( roman_RS , roman_RA , italic_Î² , italic_Î³ ) = divide start_ARG ( roman_RS - italic_Î² ) start_POSTSUPERSCRIPT italic_Î³ end_POSTSUPERSCRIPT end_ARG start_ARG ( roman_RS - italic_Î² ) start_POSTSUPERSCRIPT italic_Î³ end_POSTSUPERSCRIPT + ( roman_RA - italic_Î² ) start_POSTSUPERSCRIPT italic_Î³ end_POSTSUPERSCRIPT end_ARG ; (1.2)
note for all Î³Ý›Â¾\gammaitalic_Î³ that if RS=RARSRA{\rm RS}={\rm RA}roman_RS = roman_RA in (2.6) then as we would expect the won-loss percentage is 50%percent5050\%50 %.

In Â§3 we analyze in great detail the 2004 baseball season for the 14 teams of the American League. Complete results of each game are readily available (see for example [Al]), which greatly facilitates curve fitting and error analysis. For each of these teams we used the method of least squares and the method of maximum likelihood to find the best fit Weibulls to the runs scored and allowed per game (with each having the same Î³Ý›Â¾\gammaitalic_Î³ and both having Î²=-.5Ý›Â½.5\beta=-.5italic_Î² = - .5; we explain why this is the right choice for Î²Ý›Â½\betaitalic_Î² below). Standard Ï‡2superscriptÝœÂ’2\chi^{2}italic_Ï‡ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT tests (see for example [CaBe]) show our fits are adequate. For continuous random variables representing runs scored and runs allowed, there is zero probability of both having the same value; the situation is markedly different in the discrete case. In a baseball game runs scored and allowed cannot be entirely independent, as games do not end in ties; however, modulo this condition, modified Ï‡2superscriptÝœÂ’2\chi^{2}italic_Ï‡ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT tests (see [BF, SD]) do show that, given that runs scored and allowed per game must be distinct integers, the runs scored and allowed per game are statistically independent. See [Ci] for more on the independence of runs scored and allowed.

Thus the assumptions of our theoretical model are met, and the Pythagorean Formula should hold for some exponent Î³Ý›Â¾\gammaitalic_Î³. Our main experimental result is that, averaging over the 14 teams, the method of least squares yields a mean of Î³Ý›Â¾\gammaitalic_Î³ of 1.791.791.791.79 with a standard deviation of .09.09.09.09 (the median is 1.791.791.791.79 as well); the method of maximum likelihood yields a mean of Î³Ý›Â¾\gammaitalic_Î³ of 1.741.741.741.74 with a standard deviation of .06.06.06.06 (the median is 1.761.761.761.76). This is in line with the numerical observation that Î³=1.82Ý›Â¾1.82\gamma=1.82italic_Î³ = 1.82 is the best exponent.

In order to obtain simple closed form expressions for the probability of scoring more runs than allowing in a game, we assume that the runs scored and allowed are drawn from continuous and not discrete distributions. This allows us to replace discrete sums with continuous integrals, and in general integration leads to more tractable calculations than summations. Of course assumptions of continuous run distribution cannot be correct in baseball, but the hope is that such a computationally useful assumption is a reasonable approximation to reality; it may be more reasonable in a sport such as basketball, and this would make an additional, interesting project. Closed form expressions for the mean, variance and probability that one random variable exceeds another are difficult for general probability distributions; however, the integrations that arise from a Weibull distribution with parameters (Î±,Î²,Î³)Ý›Â¼Ý›Â½Ý›Â¾(\alpha,\beta,\gamma)( italic_Î± , italic_Î² , italic_Î³ ) are very tractable. Further, as the three parameter Weibull is a very flexible family and takes on a variety of different shapes, it is not surprising that for an appropriate choice of parameters it is a good fit to the runs scored (or allowed) per game. What is fortunate is that we can get good fits to both runs scored and allowed simultaneously, using the same Î³Ý›Â¾\gammaitalic_Î³ for each; see [BFAM] for additional problems modeled with Weibull distributions. For example, Î³=1Ý›Â¾1\gamma=1italic_Î³ = 1 is the exponential and Î³=2Ý›Â¾2\gamma=2italic_Î³ = 2 is the Rayleigh distribution. Note the great difference in behavior between these two distributions. The exponentialâ€™s maximum probability is at x=Î²Ý‘Â¥Ý›Â½x=\betaitalic_x = italic_Î², whereas the Rayleigh is zero at x=Î²Ý‘Â¥Ý›Â½x=\betaitalic_x = italic_Î². Additionally, for any M>Î²Ý‘Â€Ý›Â½M>\betaitalic_M >italic_Î² any Weibull has a non-zero probability of a team scoring (or allowing) more than MÝ‘Â€Mitalic_M runs, which is absurd of course in the real world. The tail probabilities of the exponential are significantly greater than those of the Rayleigh, which indicates that perhaps something closer to the Rayleigh than the exponential is the truth for the distribution of runs.

We have incorporated a translation parameter Î²Ý›Â½\betaitalic_Î² for several reasons. First, to facilitate applying this model to sports other than baseball. For example, in basketball no team scores fewer than 20 points in a game, and it is not unreasonable to look at the distribution of scores above a baseline. A second consequence of Î²Ý›Â½\betaitalic_Î² is that adding PÝ‘ÂƒPitalic_P points to both the runs scored and runs allowed each game does not change the won-loss percentage; this is reflected beautifully in (1.2), and indicates that it is more natural to measure scores above a baseline (which may be zero). Finally, and most importantly, as remarked there are issues in the discreteness of the data and the continuity of the model. In the least squares and maximum likelihood curve fitting we bin the runs scored and allowed data into bins of length 1111; for example, a natural choice of bins is

[0,1)âˆª[1,2)âˆªâ‹¯âˆª[9,10)âˆª[10,12)âˆª[12,âˆž).0112â‹¯910101212[0,1)\ \cup\ [1,2)\ \cup\ \cdots\ \cup\ [9,10)\ \cup\ [10,12)\ \cup\ [12,% \infty).[ 0 , 1 ) âˆª [ 1 , 2 ) âˆª â‹¯ âˆª [ 9 , 10 ) âˆª [ 10 , 12 ) âˆª [ 12 , âˆž ) . (1.3)
As baseball scores are non-negative integers, all of the mass in each bin is at the left endpoint. If we use untranslated Weibulls (i.e., Î²=0Ý›Â½0\beta=0italic_Î² = 0) there would be a discrepancy in matching up the means.

For example, consider a simple case when in half the games the team scores 0 runs and in the other half they score 1. Let us take as our bins [0,1)01[0,1)[ 0 , 1 ) and [1,2)12[1,2)[ 1 , 2 ), and for ease of exposition we shall find the best fit function constant on each bin. Obviously we take our function to be identically 1212\frac{1}{2}divide start_ARG 1 end_ARG start_ARG 2 end_ARG on [0,2)02[0,2)[ 0 , 2 ); however, the observed mean is 12â‹…0+12â‹…1=12â‹…120â‹…12112\frac{1}{2}\cdot 0+\frac{1}{2}\cdot 1=\frac{1}{2}divide start_ARG 1 end_ARG start_ARG 2 end_ARG â‹… 0 + divide start_ARG 1 end_ARG start_ARG 2 end_ARG â‹… 1 = divide start_ARG 1 end_ARG start_ARG 2 end_ARG whereas the mean of our piecewise constant approximant is 1111. If instead we chose [-.5,.5).5.5[-.5,.5)[ - .5 , .5 ) and [.5,1.5).51.5[.5,1.5)[ .5 , 1.5 ) as our bins then the approximant would also have a mean of 1212\frac{1}{2}divide start_ARG 1 end_ARG start_ARG 2 end_ARG. Returning to our model, we see a better choice of bins is

[-.5,.5]âˆª[.5,1.5]âˆªâ‹¯âˆª[7.5,8.5]âˆª[8.5,9.5]âˆª[9.5,11.5]âˆª[11.5,âˆž)..5.5.51.5â‹¯7.58.58.59.59.511.511.5[-.5,.5]\ \cup\ [.5,1.5]\ \cup\ \cdots\ \cup\ [7.5,8.5]\ \cup\ [8.5,9.5]\ \cup% \ [9.5,11.5]\ \cup\ [11.5,\infty).[ - .5 , .5 ] âˆª [ .5 , 1.5 ] âˆª â‹¯ âˆª [ 7.5 , 8.5 ] âˆª [ 8.5 , 9.5 ] âˆª [ 9.5 , 11.5 ] âˆª [ 11.5 , âˆž ) . (1.4)
An additional advantage of the bins of (1.4) is that we may consider either open or closed endpoints, as there are no baseball scores that are half-integral. Thus, in order to have the baseball scores in the center of their bins, we take Î²=-.5Ý›Â½.5\beta=-.5italic_Î² = - .5 and use the bins in (1.4). In particular, if the mean of the Weibull approximating the runs scored (resp. allowed) per game is RSRS{\rm RS}roman_RS (resp. RARA{\rm RA}roman_RA) then RS-Î²RSÝ›Â½{\rm RS}-\betaroman_RS - italic_Î² (resp. RA-Î²RAÝ›Â½{\rm RA}-\betaroman_RA - italic_Î²) is an estimator of the observed average number of runs scored (resp. allowed) per game.

2. Theoretical Model and Predictions

We determine the mean of a Weibull distribution with parameters (Î±,Î²,Î³)Ý›Â¼Ý›Â½Ý›Â¾(\alpha,\beta,\gamma)( italic_Î± , italic_Î² , italic_Î³ ), and then use this to prove our main result, the Pythagorean Formula (Theorem 2.2). Let f(x;Î±,Î²,Î³)Ý‘Â“Ý‘Â¥Ý›Â¼Ý›Â½Ý›Â¾f(x;\alpha,\beta,\gamma)italic_f ( italic_x ; italic_Î± , italic_Î² , italic_Î³ ) be the probability density of a Weibull with parameters (Î±,Î²,Î³)Ý›Â¼Ý›Â½Ý›Â¾(\alpha,\beta,\gamma)( italic_Î± , italic_Î² , italic_Î³ ):

f(x;Î±,Î²,Î³)={Î³Î±(x-Î²Î±)Î³-1e-((x-Î²)/Î±)Î³if xâ‰¥Î²0otherwise.Ý‘Â“Ý‘Â¥Ý›Â¼Ý›Â½Ý›Â¾casesÝ›Â¾Ý›Â¼superscriptÝ‘Â¥Ý›Â½Ý›Â¼Ý›Â¾1superscriptÝ‘Â’superscriptÝ‘Â¥Ý›Â½Ý›Â¼Ý›Â¾if xâ‰¥Î²0otherwise.f(x;\alpha,\beta,\gamma)\ =\ \begin{cases}\frac{\gamma}{\alpha}\left(\frac{x-% \beta}{\alpha}\right)^{\gamma-1}e^{-((x-\beta)/\alpha)^{\gamma}}&\text{\rm if % $x\geq\beta$}\\ 0&\text{\rm otherwise.}\end{cases}italic_f ( italic_x ; italic_Î± , italic_Î² , italic_Î³ ) = { start_ROW start_CELL divide start_ARG italic_Î³ end_ARG start_ARG italic_Î± end_ARG ( divide start_ARG italic_x - italic_Î² end_ARG start_ARG italic_Î± end_ARG ) start_POSTSUPERSCRIPT italic_Î³ - 1 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - ( ( italic_x - italic_Î² ) / italic_Î± ) start_POSTSUPERSCRIPT italic_Î³ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_CELL start_CELL if italic_x â‰¥ italic_Î² end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL otherwise. end_CELL end_ROW (2.1)
For sâˆˆâ„‚Ý‘ â„‚s\in\mathbb{C}italic_s âˆˆ blackboard_C with the real part of sÝ‘ sitalic_s greater than 00, recall the Î“Î“\Gammaroman_Î“-function (see [Fe1]) is defined by

Î“(s)=âˆ«0âˆže-uus-1du=âˆ«0âˆže-uusduu.Î“Ý‘ superscriptsubscript0superscriptÝ‘Â’Ý‘Â¢superscriptÝ‘Â¢Ý‘ 1differential-dÝ‘Â¢superscriptsubscript0superscriptÝ‘Â’Ý‘Â¢superscriptÝ‘Â¢Ý‘ dÝ‘Â¢Ý‘Â¢\Gamma(s)\ =\ \int_{0}^{\infty}e^{-u}u^{s-1}{\mathrm{d}}u\ =\ \int_{0}^{\infty% }e^{-u}u^{s}\frac{{\mathrm{d}}u}{u}.roman_Î“ ( italic_s ) = âˆ« start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT âˆž end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_u end_POSTSUPERSCRIPT italic_u start_POSTSUPERSCRIPT italic_s - 1 end_POSTSUPERSCRIPT roman_d italic_u = âˆ« start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT âˆž end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_u end_POSTSUPERSCRIPT italic_u start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT divide start_ARG roman_d italic_u end_ARG start_ARG italic_u end_ARG . (2.2)
Letting Î¼Î±,Î²,Î³subscriptÝœÂ‡Ý›Â¼Ý›Â½Ý›Â¾\mu_{\alpha,\beta,\gamma}italic_Î¼ start_POSTSUBSCRIPT italic_Î± , italic_Î² , italic_Î³ end_POSTSUBSCRIPT denote the mean of f(x;Î±,Î²,Î³)Ý‘Â“Ý‘Â¥Ý›Â¼Ý›Â½Ý›Â¾f(x;\alpha,\beta,\gamma)italic_f ( italic_x ; italic_Î± , italic_Î² , italic_Î³ ), we have

Î¼Î±,Î²,Î³subscriptÝœÂ‡Ý›Â¼Ý›Â½Ý›Â¾\displaystyle\mu_{\alpha,\beta,\gamma}italic_Î¼ start_POSTSUBSCRIPT italic_Î± , italic_Î² , italic_Î³ end_POSTSUBSCRIPT =\displaystyle\ =\ = âˆ«Î²âˆžxâ‹…Î³Î±(x-Î²Î±)Î³-1e-((x-Î²)/Î±)Î³dxsuperscriptsubscriptÝ›Â½â‹…Ý‘Â¥Ý›Â¾Ý›Â¼superscriptÝ‘Â¥Ý›Â½Ý›Â¼Ý›Â¾1superscriptÝ‘Â’superscriptÝ‘Â¥Ý›Â½Ý›Â¼Ý›Â¾differential-dÝ‘Â¥\displaystyle\int_{\beta}^{\infty}x\cdot\frac{\gamma}{\alpha}\left(\frac{x-% \beta}{\alpha}\right)^{\gamma-1}e^{-((x-\beta)/\alpha)^{\gamma}}{\mathrm{d}}xâˆ« start_POSTSUBSCRIPT italic_Î² end_POSTSUBSCRIPT start_POSTSUPERSCRIPT âˆž end_POSTSUPERSCRIPT italic_x â‹… divide start_ARG italic_Î³ end_ARG start_ARG italic_Î± end_ARG ( divide start_ARG italic_x - italic_Î² end_ARG start_ARG italic_Î± end_ARG ) start_POSTSUPERSCRIPT italic_Î³ - 1 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - ( ( italic_x - italic_Î² ) / italic_Î± ) start_POSTSUPERSCRIPT italic_Î³ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT roman_d italic_x (2.3)

=\displaystyle== âˆ«Î²âˆžÎ±x-Î²Î±â‹…Î³Î±(x-Î²Î±)Î³-1e-((x-Î²)/Î±)Î³dx+Î².superscriptsubscriptÝ›Â½â‹…Ý›Â¼Ý‘Â¥Ý›Â½Ý›Â¼Ý›Â¾Ý›Â¼superscriptÝ‘Â¥Ý›Â½Ý›Â¼Ý›Â¾1superscriptÝ‘Â’superscriptÝ‘Â¥Ý›Â½Ý›Â¼Ý›Â¾differential-dÝ‘Â¥Ý›Â½\displaystyle\int_{\beta}^{\infty}\alpha\frac{x-\beta}{\alpha}\cdot\frac{% \gamma}{\alpha}\left(\frac{x-\beta}{\alpha}\right)^{\gamma-1}e^{-((x-\beta)/% \alpha)^{\gamma}}{\mathrm{d}}x\ +\ \beta.âˆ« start_POSTSUBSCRIPT italic_Î² end_POSTSUBSCRIPT start_POSTSUPERSCRIPT âˆž end_POSTSUPERSCRIPT italic_Î± divide start_ARG italic_x - italic_Î² end_ARG start_ARG italic_Î± end_ARG â‹… divide start_ARG italic_Î³ end_ARG start_ARG italic_Î± end_ARG ( divide start_ARG italic_x - italic_Î² end_ARG start_ARG italic_Î± end_ARG ) start_POSTSUPERSCRIPT italic_Î³ - 1 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - ( ( italic_x - italic_Î² ) / italic_Î± ) start_POSTSUPERSCRIPT italic_Î³ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT roman_d italic_x + italic_Î² .
We change variables by setting u=(x-Î²Î±)Î³Ý‘Â¢superscriptÝ‘Â¥Ý›Â½Ý›Â¼Ý›Â¾u=\left(\frac{x-\beta}{\alpha}\right)^{\gamma}italic_u = ( divide start_ARG italic_x - italic_Î² end_ARG start_ARG italic_Î± end_ARG ) start_POSTSUPERSCRIPT italic_Î³ end_POSTSUPERSCRIPT. Then du=Î³Î±(x-Î²Î±)Î³-1dxdÝ‘Â¢Ý›Â¾Ý›Â¼superscriptÝ‘Â¥Ý›Â½Ý›Â¼Ý›Â¾1dÝ‘Â¥{\mathrm{d}}u=\frac{\gamma}{\alpha}\left(\frac{x-\beta}{\alpha}\right)^{\gamma% -1}{\mathrm{d}}xroman_d italic_u = divide start_ARG italic_Î³ end_ARG start_ARG italic_Î± end_ARG ( divide start_ARG italic_x - italic_Î² end_ARG start_ARG italic_Î± end_ARG ) start_POSTSUPERSCRIPT italic_Î³ - 1 end_POSTSUPERSCRIPT roman_d italic_x and we have

Î¼Î±,Î²,Î³subscriptÝœÂ‡Ý›Â¼Ý›Â½Ý›Â¾\displaystyle\mu_{\alpha,\beta,\gamma}italic_Î¼ start_POSTSUBSCRIPT italic_Î± , italic_Î² , italic_Î³ end_POSTSUBSCRIPT =\displaystyle\ =\ = âˆ«0âˆžÎ±uÎ³-1â‹…e-udu+Î²superscriptsubscript0â‹…Ý›Â¼superscriptÝ‘Â¢superscriptÝ›Â¾1superscriptÝ‘Â’Ý‘Â¢differential-dÝ‘Â¢Ý›Â½\displaystyle\int_{0}^{\infty}\alpha u^{\gamma^{-1}}\cdot e^{-u}{\mathrm{d}}u% \ +\ \betaâˆ« start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT âˆž end_POSTSUPERSCRIPT italic_Î± italic_u start_POSTSUPERSCRIPT italic_Î³ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT â‹… italic_e start_POSTSUPERSCRIPT - italic_u end_POSTSUPERSCRIPT roman_d italic_u + italic_Î² (2.4)

=\displaystyle== Î±âˆ«0âˆže-uu1+Î³-1duu+Î²Ý›Â¼superscriptsubscript0superscriptÝ‘Â’Ý‘Â¢superscriptÝ‘Â¢1superscriptÝ›Â¾1dÝ‘Â¢Ý‘Â¢Ý›Â½\displaystyle\alpha\int_{0}^{\infty}e^{-u}u^{1+\gamma^{-1}}\frac{{\mathrm{d}}u% }{u}\ +\ \betaitalic_Î± âˆ« start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT âˆž end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_u end_POSTSUPERSCRIPT italic_u start_POSTSUPERSCRIPT 1 + italic_Î³ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT divide start_ARG roman_d italic_u end_ARG start_ARG italic_u end_ARG + italic_Î²

=\displaystyle== Î±Î“(1+Î³-1)+Î².Ý›Â¼Î“1superscriptÝ›Â¾1Ý›Â½\displaystyle\alpha\Gamma(1+\gamma^{-1})\ +\ \beta.italic_Î± roman_Î“ ( 1 + italic_Î³ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) + italic_Î² .

A similar calculation determines the variance. We record these results:

Lemma 2.1.

The mean Î¼Î±,Î²,Î³subscriptÝœÂ‡Ý›Â¼Ý›Â½Ý›Â¾\mu_{\alpha,\beta,\gamma}italic_Î¼ start_POSTSUBSCRIPT italic_Î± , italic_Î² , italic_Î³ end_POSTSUBSCRIPT and variance ÏƒÎ±,Î²,Î³2subscriptsuperscriptÝœÂŽ2Ý›Â¼Ý›Â½Ý›Â¾\sigma^{2}_{\alpha,\beta,\gamma}italic_Ïƒ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Î± , italic_Î² , italic_Î³ end_POSTSUBSCRIPT of a Weibull with parameters (Î±,Î²,Î³)Ý›Â¼Ý›Â½Ý›Â¾(\alpha,\beta,\gamma)( italic_Î± , italic_Î² , italic_Î³ ) are

Î¼Î±,Î²,Î³subscriptÝœÂ‡Ý›Â¼Ý›Â½Ý›Â¾\displaystyle\mu_{\alpha,\beta,\gamma}italic_Î¼ start_POSTSUBSCRIPT italic_Î± , italic_Î² , italic_Î³ end_POSTSUBSCRIPT =\displaystyle\ =\ = Î±Î“(1+Î³-1)+Î²Ý›Â¼Î“1superscriptÝ›Â¾1Ý›Â½\displaystyle\alpha\Gamma(1+\gamma^{-1})+\betaitalic_Î± roman_Î“ ( 1 + italic_Î³ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) + italic_Î²

ÏƒÎ±,Î²,Î³2subscriptsuperscriptÝœÂŽ2Ý›Â¼Ý›Â½Ý›Â¾\displaystyle\sigma^{2}_{\alpha,\beta,\gamma}italic_Ïƒ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Î± , italic_Î² , italic_Î³ end_POSTSUBSCRIPT =\displaystyle\ =\ = Î±2Î“(1+2Î³-1)-Î±2Î“(1+Î³-1)2.superscriptÝ›Â¼2Î“12superscriptÝ›Â¾1superscriptÝ›Â¼2Î“superscript1superscriptÝ›Â¾12\displaystyle\alpha^{2}\Gamma\left(1+2\gamma^{-1}\right)-\alpha^{2}\Gamma\left% (1+\gamma^{-1}\right)^{2}.italic_Î± start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_Î“ ( 1 + 2 italic_Î³ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) - italic_Î± start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_Î“ ( 1 + italic_Î³ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (2.5)

We can now prove our main result:

Theorem 2.2 (Pythagorean Won-Loss Formula).

Let the runs scored and runs allowed per game be two independent random variables drawn from Weibull distributions with parameters (Î±RS,Î²,Î³)subscriptÝ›Â¼normal-RSÝ›Â½Ý›Â¾(\alpha_{\rm RS},\beta,\gamma)( italic_Î± start_POSTSUBSCRIPT roman_RS end_POSTSUBSCRIPT , italic_Î² , italic_Î³ ) and (Î±RA,Î²,Î³)subscriptÝ›Â¼normal-RAÝ›Â½Ý›Â¾(\alpha_{\rm RA},\beta,\gamma)( italic_Î± start_POSTSUBSCRIPT roman_RA end_POSTSUBSCRIPT , italic_Î² , italic_Î³ ) respectively, where Î±RSsubscriptÝ›Â¼normal-RS\alpha_{\rm RS}italic_Î± start_POSTSUBSCRIPT roman_RS end_POSTSUBSCRIPT and Î±RAsubscriptÝ›Â¼normal-RA\alpha_{\rm RA}italic_Î± start_POSTSUBSCRIPT roman_RA end_POSTSUBSCRIPT are chosen so that the means are RSnormal-RS{\rm RS}roman_RS and RAnormal-RA{\rm RA}roman_RA. If Î³>0Ý›Â¾0\gamma>0italic_Î³ >0 then

Won-Loss Percentage(RS,RA,Î²,Î³)=(RS-Î²)Î³(RS-Î²)Î³+(RA-Î²)Î³.Won-Loss PercentageRSRAÝ›Â½Ý›Â¾superscriptRSÝ›Â½Ý›Â¾superscriptRSÝ›Â½Ý›Â¾superscriptRAÝ›Â½Ý›Â¾\mbox{\rm Won-Loss Percentage}({\rm RS},{\rm RA},\beta,\gamma)\ =\ \frac{({\rm RS}-\beta)^{\gamma% }}{({\rm RS}-\beta)^{\gamma}+({\rm RA}-\beta)^{\gamma}}.Won-Loss Percentage ( roman_RS , roman_RA , italic_Î² , italic_Î³ ) = divide start_ARG ( roman_RS - italic_Î² ) start_POSTSUPERSCRIPT italic_Î³ end_POSTSUPERSCRIPT end_ARG start_ARG ( roman_RS - italic_Î² ) start_POSTSUPERSCRIPT italic_Î³ end_POSTSUPERSCRIPT + ( roman_RA - italic_Î² ) start_POSTSUPERSCRIPT italic_Î³ end_POSTSUPERSCRIPT end_ARG . (2.6)

Proof.

Let XÝ‘Â‹Xitalic_X and YÝ‘ÂŒYitalic_Y be independent random variables with Weibull distributions (Î±RS,Î²,Î³)subscriptÝ›Â¼RSÝ›Â½Ý›Â¾(\alpha_{\rm RS},\beta,\gamma)( italic_Î± start_POSTSUBSCRIPT roman_RS end_POSTSUBSCRIPT , italic_Î² , italic_Î³ ) and (Î±RA,Î²,Î³)subscriptÝ›Â¼RAÝ›Â½Ý›Â¾(\alpha_{\rm RA},\beta,\gamma)( italic_Î± start_POSTSUBSCRIPT roman_RA end_POSTSUBSCRIPT , italic_Î² , italic_Î³ ) respectively, where XÝ‘Â‹Xitalic_X is the number of runs scored and YÝ‘ÂŒYitalic_Y the number of runs allowed per game. As the means are RSRS{\rm RS}roman_RS and RARA{\rm RA}roman_RA, by Lemma 2.1 we have

RSRS\displaystyle{\rm RS}\ roman_RS =\displaystyle\ =\ = Î±RSÎ“(1+Î³-1)+Î²subscriptÝ›Â¼RSÎ“1superscriptÝ›Â¾1Ý›Â½\displaystyle\alpha_{\rm RS}\Gamma(1+\gamma^{-1})+\betaitalic_Î± start_POSTSUBSCRIPT roman_RS end_POSTSUBSCRIPT roman_Î“ ( 1 + italic_Î³ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) + italic_Î²

RARA\displaystyle{\rm RA}roman_RA =\displaystyle\ =\ = Î±RAÎ“(1+Î³-1)+Î².subscriptÝ›Â¼RAÎ“1superscriptÝ›Â¾1Ý›Â½\displaystyle\alpha_{\rm RA}\Gamma(1+\gamma^{-1})+\beta.italic_Î± start_POSTSUBSCRIPT roman_RA end_POSTSUBSCRIPT roman_Î“ ( 1 + italic_Î³ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) + italic_Î² . (2.7)
Equivalently, we have

Î±RSsubscriptÝ›Â¼RS\displaystyle\alpha_{\rm RS}italic_Î± start_POSTSUBSCRIPT roman_RS end_POSTSUBSCRIPT =\displaystyle\ =\ = RS-Î²Î“(1+Î³-1)RSÝ›Â½Î“1superscriptÝ›Â¾1\displaystyle\frac{{\rm RS}-\beta}{\Gamma(1+\gamma^{-1})}divide start_ARG roman_RS - italic_Î² end_ARG start_ARG roman_Î“ ( 1 + italic_Î³ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) end_ARG

Î±RAsubscriptÝ›Â¼RA\displaystyle\alpha_{\rm RA}italic_Î± start_POSTSUBSCRIPT roman_RA end_POSTSUBSCRIPT =\displaystyle\ =\ = RA-Î²Î“(1+Î³-1).RAÝ›Â½Î“1superscriptÝ›Â¾1\displaystyle\frac{{\rm RA}-\beta}{\Gamma(1+\gamma^{-1})}.divide start_ARG roman_RA - italic_Î² end_ARG start_ARG roman_Î“ ( 1 + italic_Î³ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) end_ARG . (2.8)

We need only calculate the probability that XÝ‘Â‹Xitalic_X exceeds YÝ‘ÂŒYitalic_Y. Below we constantly use the integral of a probability density is 1111. We have

Prob(X>Y)=âˆ«x=Î²âˆžâˆ«y=Î²xf(x;Î±RS,Î²,Î³)f(y;Î±RA,Î²,Î³)dydxProbÝ‘Â‹Ý‘ÂŒsuperscriptsubscriptÝ‘Â¥Ý›Â½superscriptsubscriptÝ‘Â¦Ý›Â½Ý‘Â¥Ý‘Â“Ý‘Â¥subscriptÝ›Â¼RSÝ›Â½Ý›Â¾Ý‘Â“Ý‘Â¦subscriptÝ›Â¼RAÝ›Â½Ý›Â¾differential-dÝ‘Â¦differential-dÝ‘Â¥\displaystyle\mbox{Prob}(X>Y)\ =\ \int_{x=\beta}^{\infty}\int_{y=\beta}^{x}f(x% ;\alpha_{\rm RS},\beta,\gamma)f(y;\alpha_{\rm RA},\beta,\gamma){\mathrm{d}}y\;% {\mathrm{d}}xProb ( italic_X >italic_Y ) = âˆ« start_POSTSUBSCRIPT italic_x = italic_Î² end_POSTSUBSCRIPT start_POSTSUPERSCRIPT âˆž end_POSTSUPERSCRIPT âˆ« start_POSTSUBSCRIPT italic_y = italic_Î² end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT italic_f ( italic_x ; italic_Î± start_POSTSUBSCRIPT roman_RS end_POSTSUBSCRIPT , italic_Î² , italic_Î³ ) italic_f ( italic_y ; italic_Î± start_POSTSUBSCRIPT roman_RA end_POSTSUBSCRIPT , italic_Î² , italic_Î³ ) roman_d italic_y roman_d italic_x

=âˆ«x=Î²âˆžâˆ«y=Î²xÎ³Î±RS(x-Î²Î±RS)Î³-1e-((x-Î²)/Î±RS)Î³Î³Î±RA(y-Î²Î±RA)Î³-1e-((y-Î²)/Î±RA)Î³dydxabsentsuperscriptsubscriptÝ‘Â¥Ý›Â½superscriptsubscriptÝ‘Â¦Ý›Â½Ý‘Â¥Ý›Â¾subscriptÝ›Â¼RSsuperscriptÝ‘Â¥Ý›Â½subscriptÝ›Â¼Ý‘Â…Ý‘Â†Ý›Â¾1superscriptÝ‘Â’superscriptÝ‘Â¥Ý›Â½subscriptÝ›Â¼RSÝ›Â¾Ý›Â¾subscriptÝ›Â¼RAsuperscriptÝ‘Â¦Ý›Â½subscriptÝ›Â¼RAÝ›Â¾1superscriptÝ‘Â’superscriptÝ‘Â¦Ý›Â½subscriptÝ›Â¼RAÝ›Â¾differential-dÝ‘Â¦differential-dÝ‘Â¥\displaystyle=\ \int_{x=\beta}^{\infty}\int_{y=\beta}^{x}\frac{\gamma}{\alpha_% {\rm RS}}\left(\frac{x-\beta}{\alpha_{RS}}\right)^{\gamma-1}e^{-((x-\beta)/% \alpha_{\rm RS})^{\gamma}}\frac{\gamma}{\alpha_{\rm RA}}\left(\frac{y-\beta}{% \alpha_{{\rm RA}}}\right)^{\gamma-1}e^{-((y-\beta)/\alpha_{\rm RA})^{\gamma}}{% \mathrm{d}}y\;{\mathrm{d}}x= âˆ« start_POSTSUBSCRIPT italic_x = italic_Î² end_POSTSUBSCRIPT start_POSTSUPERSCRIPT âˆž end_POSTSUPERSCRIPT âˆ« start_POSTSUBSCRIPT italic_y = italic_Î² end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT divide start_ARG italic_Î³ end_ARG start_ARG italic_Î± start_POSTSUBSCRIPT roman_RS end_POSTSUBSCRIPT end_ARG ( divide start_ARG italic_x - italic_Î² end_ARG start_ARG italic_Î± start_POSTSUBSCRIPT italic_R italic_S end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT italic_Î³ - 1 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - ( ( italic_x - italic_Î² ) / italic_Î± start_POSTSUBSCRIPT roman_RS end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_Î³ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT divi

Articles