Darwins of the Diamond

Alan Schwarz chronicles baseball's lifelong fascination with statistics and the people who turned it into a science.

Originally Published: July 8, 2004
By Alan Schwarz | Special to ESPN.com

Most fans, players and even team executives assume that baseball's infatuation with statistics is simply a byproduct of the information age, a phenomenon that blossomed only after the arrival of Bill James and computers in the 1980s. They couldn't be more wrong.

Alan Schwarz, the senior writer of Baseball America and a weekly contributor to ESPN.com, will forever change that misperception with his new book, "The Numbers Game," just published by St. Martin's Press. Schwarz provides the first-ever history of baseball statistics, showing how baseball and its numbers have been inseparable ever since the pastime's birth in 1845, and telling the story of this obsession through the characters who felt it most: Henry Chadwick, the 19th-century writer who invented the first box score and harped endlessly about which statistics mattered and which did not; George Lindsey, a Canadian military officer who spent hundreds of hours scoring games to figure out how often runners on second base score on a single; Hal Richman, who invented Strat-O-Matic baseball when he was 11 years old; and dozens more.

In this exclusive excerpt, Schwarz travels back to the early 1960s to profile the work of Earnshaw Cook, a kooky metallurgist (and consultant to the U.S. government's efforts to build the atom bomb), who retired to while away his days pursuing his theories about baseball statistics -- while Bill James was still in junior high. So you think baseball stats are just a modern fan fixation? Read on ...

Through the 1950s, college professors and graduate students began to pepper academic journals with sophisticated analyses of baseball. In 1952, Frederick Mosteller of Harvard University used binomial probability theory to prove that the best-of-seven World Series was a horribly unreliable way to determine baseball's champion. Four years later, an article in American Statistician discussed a method to adjust standings for the strength of each team's schedule. And in 1960, two Stanford grad students presented a paper to the American Statistical Association called "The Distribution of Runs in the Game of Baseball," which appears to be the first advanced attempt to combine the probabilities of hits, walks, outs, and more into a model of how runs score.

All these studies, however, were read only by tiny audiences, many of whom blanched at the idea of allowing so frivolous a subject as baseball into their otherwise important affairs. It would take a commercially published book to introduce statistical analysis of baseball beyond academia and to the more average fan. That book arrived in the summer of 1964: "Percentage Baseball," written by a bowtied Baltimore gentleman named Earnshaw Cook. More than a decade before Bill James' "Baseball Abstract" series began, Cook's "Percentage Baseball" became the first -- and perhaps most befuddling -- full-length book to be written on what we now call sabermetrics.

Cook was a most unlikely sort to devote four years to the study of baseball statistics. Gray-haired in his 60s, the stern and iron-faced Cook was an aristocrat at heart -- conservative to the core, constantly complaining about the "Japs" and inflation, and always, always wearing a smoking jacket, sport shirt, and tie (often a clip-on). He spoke with a faux English accent to lend an elitist aura, and his gruff bearing was so authoritative that his Chesapeake retriever would, on command, fetch the newspaper and then go to the closet and bring down his bedroom slippers. He signed all correspondence with his blueblood pedigree: "Princeton '21."

Cook had played varsity baseball at Old Nassau, and even was a distant cousin of George Earnshaw, the former big league pitcher, but went on to a distinguished career as a metallurgist, studying metals and their alloys down to the protons and electrons at their core, primarily for the American Brake Shoe Co. of Mahwah, N.J. He wrote "Engineering Properties of Heat Resistant Alloys" and other light classics. He served as a consultant on the Manhattan Project and for the Atomic Energy Commission before retiring in 1945 and ultimately nesting in a small estate in Baltimore. With little to occupy his time thereafter, other than training his dogs and playing golf at the nearby Elkridge Club, he decided to prove, once and for all, damn it, that Ty Cobb was better than Babe Ruth.

Ruth
Ruth

Cobb
Cobb

This Cobb-versus-Ruth, slapper-versus-slugger debate had raged for generations and became almost religious for each sect's most ardent followers. Cobb was the master of the single-and-speed game prevalent during his prime of 1907 to 1922, ringing up a .366 lifetime batting average (still the all-time major league record) and 892 stolen bases (a mark that lasted almost 50 years). Ruth's awesome power diverted focus from batting average to home runs, much to the chagrin of fans clinging to the game's original style. Cobb himself complained in a letter in 1952, "Now they have gone to the hit per distance game. They look as if they will be lucky to hit .340 or maybe less . . . The hit and run, stolen base, bunt, and sacrifice are deteriorating from unuse and they only hit for their amusement and pleasure for the home run." Everyone agreed that a home run was more valuable than a single, but by how much? To what extent did stolen bases and the venerable sacrifice aid in the scoring of runs? Come to think of it, what was the optimal lineup order? How should relief pitchers be used? Cook didn't want to go "by the book," blindly accepting the time-honored answers to these questions -- so he wrote a book of his own. In fits and starts over several years, Cook holed up in his study, overlooking the golf course, and pounded on baseball statistics. His slide rule and colored pencils stood at attention above his angled drafting table, piles of The Sporting News and its "Baseball Register" at the ready.

In his spare time, Cook used his research to teach math to his teenaged nephew, Bryson. Bryson returned the favor with one heck of a public relations coup. Bryson Cook's best friend was a kid named Gil Deford, whose big brother Frank had just, as luck would have it, graduated from college and begun writing for Sports Illustrated. Frank Deford learned of this quirky baseball scientist and mentioned him to his editor, who dispatched the young writer to interview Cook during the winter of 1964. The editor liked the piece so much he made it the lead feature story of the March 23 issue, with the headline "BASEBALL IS PLAYED ALL WRONG."

Deford breathlessly previewed Cook's discoveries: The sacrifice was generally worthless. Platooning was a waste of time. Sluggers should bat first. Games should be started by a "relief " pitcher who would leave for a pinch hitter at the first opportunity, followed by a "starting"-caliber pitcher who would then pitch four or five innings. All this, Cook claimed, would add a total of 250 runs and perhaps 25 wins to a team each season, numbers clearly absurd today but in 1964 as arresting as a Bob Gibson fastball at the noggin. "Right now," Deford gushed in his four-page article, "Earnshaw Cook knows more about baseball than anyone else in the world."

Cook asserted that by employing his theories, a .500 club could instantly become a pennant winner. Deford naturally presented baseball personnel as being alarmed, even humbled, by these findings. "If these figures are correct," Dodgers manager Walter Alston told SI, "Cook must have something … Maybe we've just been playing what we assumed was the proper way." Bill Veeck, the renegade owner of the White Sox, loved the idea of minimizing at-bats by weak-hitting pitchers. "By all means," Veeck raved, "get the pitcher out of there!" But privately most team personnel brushed Cook's theories aside. Pulitzer Prize-winning New York Times columnist Arthur Daley wrote, "It is highly unlikely that Cook, the iconoclast, will influence one manager to alter his pattern to the slightest degree … What was good enough for John McGraw is good enough for them." The recalcitrance of baseball executives only hardened when Cook's full-length book, "Percentage Baseball," came out a few months later. It was barely intelligible to even educated readers; beyond sprinkling the text with haughty Latin phrases and quotations from the likes of Francis Bacon, Cook might as well have put a statistics textbook in a blender. Pages of graphs, equations, and probabilistic gibberish, such as:

Eqv. p.SHS.y = 1.1564 DX.y + 1.4370 DX.z + .1507 divided by 2.3225 DX.z + .2477

made much of his methodology hopelessly inaccessible. (This surely added to Cook's aura as a sort of baseball Merlin. His hometown Baltimore Sun would consult him for pennant-race predictions -- "projections," he insisted -- and use headlines such as "BIRDS TO FINISH 3D, STATISTICIAN DECLARES," and "COOK, BASEBALL'S SEER, IS PITCHING ORIOLES.") Cook loved the attention. While beguiling readers with a blizzard of figures and charts, "Percentage Baseball" represented the first comprehensive effort to model all aspects of baseball through applied probability theory, and was met with apprehensive awe.

Cook rooted his approach in a statistic he developed called the Scoring Index. He examined the relative chances of all offensive events -- single, double, sacrifice, steal, and so on -- and all the ways they can be strung together to score runs before three outs end an inning. (For example, a single can be followed by two straight outs, a steal, and a single. Or a batter can reach on an error and jog home on a later batter's home run, etc …) Cook concluded that the probability of scoring a run was proportional to the chances of reaching first base multiplied by the chances of advancing that runner. His final formula looked like this:

p.R = (K) (p.H + p.BB + p.E.o + p.HB - 2p.SH - p.XBH) (p.TB)

This equation, however disguised in ugly and cumbersome notation, was actually rooted in concepts some 20 to 30 years ahead of general acceptance. K was just a numerical constant. Each "p." merely denoted the probability of the event that followed it (for example, p.BB meant the probability of a base on balls, i.e., walks per plate appearance) so the entire equation can be seen as having plate appearances (PA) as its denominator. Ignoring errors and the subtraction of sacrifices and extra-base hits for the moment, the right-hand side's two terms reduce to:

			(H + BB + HB) * (TB)

PA PA

This is far more recognizable to modern fans of statistics: It's effectively the product of on-base percentage and slugging percentage, and greatly resembles the approach Bill James took to developing his renowned Runs Created formula in the late 1970s.

Armed with his scoring index and other statistical tools, Cook examined dozens of baseball strategies and trends. Summarizing them here would require another entire book. But these were the most significant matters Cook analyzed, with his conclusions:

  • The best hitter ever: Lo and behold, Ty Cobb, with a scoring index of .2161, beat out Babe Ruth (.2109), Ted Williams (.1962), Lou Gehrig (.1828) and Mickey Mantle (.1743). Cobb's score was actually .1574, but Cook, citing how Cobb had performed mostly in the dead-ball era when home runs were very rare, adjusted his figure upward by comparing his performance to that of his contemporaries. Cobb and Ruth, at the time widely considered the best two hitters ever, had never before stood Nos. 1 and 2 in any statistic. Scoring index let them rise to the top.

  • The base-out matrix: How many runs score after any point in an inning? For example, Cook's probability equations figured that with one out and runners on first and second, an average of .92 runs were bound to score. Here are his figures for each of the 24 possible situations:

    BASES OCCUPIED

    	   	  0	  1	  2	   3	 1,2	 1,3	 2,3      Full
    OUTS	0      |	.34	.77	.94	1.04	1.36	1.46	1.63	2.06
    OUTS	1      |	.18	.47	.63	.72	.92	1.01	1.17	1.46
    OUTS	2      |	.07	.21	.32	.38	.46	.52	.64	.78

  • The stolen base: Cook determined that an average runner trying to swipe second base typically was a bad risk with none or one out, but a good risk with two. Then he assessed how the breakeven points fluctuated depending on the abilities of the runner on first and the hitter at the plate. For example, Minnie Minoso, the speedy outfielder for the White Sox, was successful in just 57 percent of his stolen-base attempts during 1960 and 1961; Cook claimed he should never run while any hitter with a .258 batting average or higher was up.

    Chat: Alan Schwarz
     
    Alan Schwarz answered your questions about his new book in chat on Thursday.

  • The sacrifice bunt: Cook maintained that this was a bad play, except under certain conditions -- when a hitter was as poor as the typical pitcher, or when just one run was needed in the late innings of a close game. Outs were that precious. "There are two primary objects in baseball," Cook explained. "The first is to score runs. And the second is not to make outs." The latter concept didn't become understood by a majority of baseball executives for 35 years, and continues to baffle more than a few franchises.

  • The batting order: Most lineups put a speedster at the top, a bat-control artist second, and then the three best hitters third, fourth and fifth. But because roughly two-ninths of games end when the first or second batter makes the final out, the best hitters often miss a chance at another at-bat. Cook claimed that the lineup should always just put the best hitter first, followed by the second-best, and so on; even though sluggers such as Hank Aaron or Willie Mays might get fewer RBI because they would bat with fewer runners on base, that cost would be overcome by their getting roughly 40 more at-bats over the course of a season.

  • Platooning pitchers: Cook estimated that pitchers' horrible skills at the plate cost the average team 113 runs per season (of course there was yet no DH), and suggested that those hurlers virtually never be allowed to come to the plate at all. Given that complete games were becoming rarer in the early 1960s, Cook advocated starting a relief-caliber pitcher for two or three innings and pinch hitting for him at the first opportunity, and then using a starting-caliber pitcher to throw the next five innings or so, batting no more than once. Cook designed a 10-pitcher rotation to accomplish this.

  • Other assertions: Baseball's enlargement of the strike zone prior to the 1963 season had decreased offense up to 15 percent; a team's winning percentage should equal .484 times its ratio of runs scored to runs allowed; and the intentional walk is often a poor strategy, one that should be ordered, Cook wrote, "only if the actual 'DX.z' of the third batsman is less than his own equivalent 'DX.z' … "

    Cook's rigorous mathematics ultimately failed him on both ends of the reader spectrum. Baseball fans were spooked by the intercept coefficients, distribution curves, and bizarre graphical techniques that recalled too many failed algebra quizzes. Professional statisticians, meanwhile, accused "Percentage Baseball" of sloppy and amateurish use of probability theory. "The book is easier reading for a baseball fan than for a mathematician," Scientific American sniffed. The military science journal Operations Research cited "very inadequate numerical evidence," "no tests of significance, fit or correlation," and other crimes. "Some of the mathematical presentations which form the heart of the book are atrocious," the review sneered. "Aside from a number of outright mistakes in manipulating probabilities … in a desperate attempt to relate [scoring index] to scoring the author is obliged to introduce a 'specific luck factor' that makes one's probability nerves twitch."

    Cook did appeal, however, to a narrow band of audience: young, mathematically inclined fans who relished his application of probability to baseball, however flawed that application may have been. Several of them -- Carl Morris, Pete Palmer, Eric Walker, and more -- became noted academic and applied statisticians, and were heard from years later in the field of baseball analysis. It was Walker, in fact, who grew up to become Sandy Alderson's early statistics confidant in Oakland, and was the primary catalyst of baseball's modern on-base percentage craze.

    Did major league executives pay attention to Cook at the time? Not really. Those who did were too young to have much influence in their traditionally stodgy industry. Tal Smith of the Houston Astros was a thirtysomething statistics buff who relished Cook's rigorous examination of accepted baseball opinion, but knew that altering that opinion was another matter altogether. "It was difficult to get anybody interested in any statistical information in the '60s -- they only wanted batting average and ERA and not to be bothered with it," recalled Smith, now president of the Astros. "Our general manager, Spec Richardson, was so statistically challenged that I'm not sure he knew what a batting average was. Managers in those days, most of them were old school. [Cook's approach] was nothing they were raised with. When I came across it there wasn't any way to put it into any practical effort."

    Lou Gorman worked in the minor league department of the Baltimore Orioles in the late 1960s, knew of Cook's work from articles in the Sun, and after moving on to the expansion Kansas City Royals in 1969 kept a copy of "Percentage Baseball" on the shelf behind his desk. One day, the Royals' owner, Ewing Kauffman, noticed it.

    "What is that?" he asked.

    "It's a book a scientist wrote about baseball," Gorman said. "He analyzes lineups, strategies, and other things with hard-core math and statistics."

    Kauffman, who made billions in the pharmaceutical industry, took the book in his hands and flipped through the pages. "I want you to have this guy call me collect," he told Gorman. "I want to meet him."

    One month later, Kauffman sat down with Cook in the owner's suite at Kansas City's old Municipal Stadium. "Ewing loved the book and what Cook said in it," Gorman recalled, "but it never got to the point where he would use it to run the club. He stayed out of baseball decisions." Gorman, however, went on to a distinguished career with the Royals, Mariners, Mets, and Red Sox, always keeping "Percentage Baseball" in mind, and on his shelf.

    At least one major league player took notice of "Percentage Baseball" and became an avid devotee. Davey Johnson, a young second baseman for the Orioles, just happened to also have a mathematics degree from Trinity University in Texas. Johnson read the book and noted that Cook lived in Baltimore, so he looked him up and visited his home. The two became friends, playing golf on occasion and always talking baseball. Johnson integrated some of Cook's ideas when he became a manager himself, leading the Mets to the 1986 World Series championship and later managing the Reds, Orioles and Dodgers. Johnson credits Cook with underscoring the importance of on-base percentage (rather than speed) at the top of batting orders and the inadvisability of intentional walks. "I'd played for a lot of managers who would just make out their lineup the old-fashioned way -- they didn't know what they were doing," Johnson remembered. "Cook showed how statistics could be used to run a ballclub better." In the end, the Times's Arthur Daley was proven wrong: Cook did influence a manager, a future one. Like most groundbreaking ideas, Cook's were appreciated most by people young enough not to feel threatened.

    Cook himself did not witness the moderate influence he ultimately had. Eight years after the original publication of "Percentage Baseball" he followed up with a second volume, "Percentage Baseball and the Computer," in which he employed computer-simulation techniques to fine-tune, and usually validate, his theories. This book received considerably less attention than his first. In a 1975 letter Cook wrote, "At the age of 75, I do not expect to survive to witness any changes of attitude … It is my opinion that the application of basic statistical analysis will eventually receive the attention of professional baseball -- as it has in so many other fields of endeavor with increasing use of the ubiquitous digital computer." He spent a few years applying his mathematics to golf -- he tried to prove that the great Bobby Jones was better than newcomer Jack Nicklaus -- but ultimately discarded the idea. He died in late 1987 of a heart attack.

    Before that, however, Cook was asked by the Hall of Fame in Cooperstown to donate the slide rule on which he did some of his calculations for "Percentage Baseball." "I am more than content," he typed in a note of appreciation, "to present my last testament herewith to the Greatest Game of them all."

    Alan Schwarz is the senior writer for Baseball America. His book, "The Numbers Game: Baseball's Lifelong Fascination With Statistics," can be ordered on his website, www.alanschwarz.com.
  • ALSO SEE