Outlining The Relationship Between Baseball Data Analytics Portals: A Primer
Like all baseball data nerds, I got to a crucial point this week in asking: what the heck are the distinctions between the handful of baseball data analytics sites and tools? We got the MLB proper - Major League Baseball - obviously King Cheese. But what is the relation between the MLB and FanGraphs, Baseball Prospectus, Baseball Savant, Baseball Reference, and SABR (Society for American Baseball Research)?
You might be wondering why I have a niche concern. Well, I'm enrolled in SABR's Advanced Analytics II course, and want to know where to look for the most reliable data and tools for specific areas of study (i.e.: batter's barrels, pitcher's WAR, park adjustments, the affect of the National League introducing Designated Hitters in 2022, the average home runs scored in 2001 and why, which wOBA formula to use, I could go on).
Barring this, I at least want to understand the limitations of my source choices. Contained here is not an exhausted overhaul of what each organization does, or an in-depth study on how they go about doing the things they do. Instead, my goal is to simply outline areas of overlap and distinction, to know who broadly does or provides what information. Context is a researcher's best, if only, friend.
Back to the topic at hand: Where can I go to find this info? No single source of truth seems to exist, so I called upon Perplexity.ai for assistance.
I asked: "Can you tell me the relationship between Baseball Savant, Baseball Prospectus, FanGraphs, Baseball Reference, and SABR?.
Perplexity.ai told me that, in the world of baseball analytics and statistics, each hold distinct positions, in: data type and analysis offered; data sources, focus, and user base, and yet have some interconnectedness. Of course, GenAI isn't interpretive the way the human mind is, able to articulate the value and trueness of a source according to your specific needs. And so, other relevant information is included where necessary, derived from other sources listed herein. Let's go!
Data Type, Data Source, User Base, and Analysis Offered
BAM! Suddenly a new player, with the mention of MLBAM (Major League Baseball Advanced Media), the first proper name drop listed under Baseball Savant (Figure 3). Of course! MLBAM brings us Statcast, automated camera and radar technology for tracking and collecting player data. With this, the realization that Baseball Savant is nested under MLBAM, which is nested under the King Cheese itself, the MLB. If you doubt a direct connection, check out Baseball Savant's web address: baseballsavant.mlb.com - right there, three identifying letters requiring no further code-cracking. Ergo, a solid relationship exists between the MLB, MLBAM (obviously), and Baseball Savant: Baseball Savant is a subdomain of the MLB.
As we know now, Baseball Savant gets Statcast data directly from the MLB, where FanGraphs and Baseball Prospectus draw upon this same data for their metrics and analyses, but are not hamstrung to only use Statcast data, like Baseball Savant. Baseball Savant, though, has access to the raw Statcast data, which means they get the first crack at it.
While FanGraphs and Baseball Prospectus are both independently owned and operated, both for advanced statistics and player projections, FanGraphs is tuned toward more detailed yet approachable Sabermetrics, with robust explainer articles. [Quick point-of-order definition sidebar: Sabermetrics is the statistical analysis of baseball play, centred on comparing and evaluating players, with an aim toward prediction. SABR is the founding body, formed in 1971. You might recall the aforementioned course I mentioned taking, brought to us by SABR. Start here for more.] FanGraphs has articles, RotoGraphs (for fantasy baseball advice and analysis), a community blog, an encyclopedia of Sabermetric stats and methods, and even podcasts (hello Effectively Wild), to name a few offerings.
Quick sidebar for bridging: The podcast Effectively Wild originated at Baseball Prospectus but moved to FanGraphs in 2017. Before you gasp at a potential tea-spill situation, the move resulted from what seems a congenial shift in hosts, which now included a host from each org. A beautiful baseball bridge born, between FanGraphs and Baseball Prospectus.
More on Baseball Prospectus then: Baseball Prospectus developed PECOTA (Player Empirical Comparison and Optimization Test Algorithm), in addition to other statistical tools. Perhaps the finest of backronyms, PECOTA came to be in 2003. Not to be too confusing, but PECOTA falls within the rubric of Sabermetrics, with it's aims, but has proprietary computational formulas that make it distinct from, say, what happens over at FanGraphs or otherwise. Again though, there are overlaps; Baseball Prospectus also offers products like articles, stats reports, fantasy baseball tools, and podcasts, but seems to offer edgier or more avant-garde concepts (I can't vouch for this just yet, but curious reading begins here). To me, it seems Baseball Prospectus assumes you have the knowledge you need to be a Saberist, while FanGraphs holds your hands with an archive of methodologically illustrative show-your-work articles.
But of course FanGraphs would have to provide detailed how-to info! Just look at what a page from their site looks like (Figure 5), next to Baseball Prospectus of the same player (Figure 6). FanGraphs has everything available on a single page, searchable and clickable with scrolling contents on the left, while Baseball Prospectus hides detail in click-to-reveal tabs.
So, if Baseball Prospectus and FanGraphs have a decent amount of overlap in data offerings (albeit with substantial differences), how does one decide between each, especially when they each have membership rates? It might come down to data detail and layout preferences. Baseball Prospectus looks minimalistic, with toggle options to view more, and FanGraphs is maximalistic, offering immediate visibility to advanced stats like Statcast, win probability, and plate discipline calculations. Take a peek at the above images and decide for yourself according to your needs.
Very simply, Baseball Reference is privately owned and operated (by Sports Reference LLC), and is a database of historical and traditional statistics. Sources used by Baseball Reference come from official league sources, Retrosheet, and historical records, and researchers can contribute data directly. Baseball Reference has the feel of an academic database, which makes perfect sense as it was founded by a math Ph.D while working on his dissertation.
A direct note on user base, if not surmisable from the above: Each have potentially overlapping and yet potentially distinct user bases. Where Baseball Savant attracts those who only want to see Statcast data, and Baseball Reference is excellent for writing a paper on a player from 1953, FanGraphs and Baseball Prospectus are for everyone from professional analysts and enthusiasts alike, including those who play Fantasy Baseball. But where does one go to see if there's conferences or training on various baseball topics, stats or otherwise?
We come back to SABR, the lone wolf of bunch, as the only nonprofit organization. Consider this one an influencer of sorts, as SABR brings sports data people and academics together for annual analytics conferences (biometrics anyone?), in addition to pushing the needle forward with dedicated research committees and conferences highlighting underrepresented groups (shout out to the watershed Negro League Research Committee and the Women in Baseball Conference). And so, while not operating statistical databases, SABR focuses on collaboration with publishing, convening, and developing Aspiring Saberists.
Interconnectedness
Finally, Perplexity does well in summarizing the interconnectedness between each, and so I'll quote directly (a surprise if you know me, a major no-no because of the oft robotic GenAI tone): "While each platform has its unique strengths, they collectively contribute to the advancement of baseball analytics and provide complimentary resources to fans, analysts, and industry professionals."
To recap: There is a direct connection between Baseball Savant and the MLB (through MLBAM and Statcast), and Baseball Reference sits alone as a main source for traditional and historical data. Baseball Propsectus and FanGraphs, on the other hand, share a love child with the Effectively Wild podcast, while also providing Sabermetric data nerds and professionals with advanced stat tools and resources for player projections (though you can only access PECOTA through Baseball Prospectus, and FanGraphs piles everything in right at your happily overwhelmed fingertips). And SABR is your go-to for Sabermetrics training and thought leadership.
Can You See The Baseball Analytics Tree Yet?
Verbally visualizing the above relationships as a tree is soothing. So here goes:
Soil -> MLB
Roots -> SABR
Trunk -> Baseball Reference
Main branches -> FanGraphs, Baseball Prospectus, Baseball Savant
Big branch directly growing from soil, but intertwined with the Baseball Savant branch -> MLBAM
And now for the hand drawn version. Who says data minds can't be artsy?
Have I missed anything integral to this tree? New branches? Should there be leaves? Or a whole forrest? Ok, now we're getting in the weeds.