Back


Predicting MLB Success from Triple-A Performance
2026



~5 Minute Read

Overview

In this project, my team and I built machine learning models to predict a batter’s MLB performance using their Triple-A performance. The goal was to determine which offensive skills best transfer to Major League Baseball and identify prospects who may be undervalued by traditional scouting methods.

Question:  Can we use Triple-A statistics and Statcast data to predict future MLB batting performance?

The target variable was wRC+ (Weighted Runs Created Plus). It adjusts for park effects and league environment, allowing hitters to be compared on a standardized scale. A wRC+ of 100 is league average, while a wRC+ of 120 indicates a hitter who creates 20% more runs than the league average.


Data Collection

We combined data from: 
  • FanGraphs Triple-A batting statistics
  • Baseball Savant Statcast metrics
  • MLB offensive outcomes

The final datasets consisted of:
  • 502 Triple-A players who later had >= 100 MLB plate appearances
  • 902 Triple-A players with < 100 MLB plate appearances for predictions
  • Data spanning the 2021-2025 seasons

Features included

Traditional Statistics:
  • Batting Average
  • OBP
  • ISO
  • Home Run Rate
  • Walk Rate
  • Strikeout Rate
Statcast Metrics:
  • Exit Velocity
  • Hard Hit %
  • Barrel Rate
  • Launch Angle
  • Expected Statistics (xBA, xwOBA)
Player Profile Metrics:
  • Age
  • Playing Time (Games, Plate Appearances)
  • Consecutive Triple-A Seasons

Some features were added to standardize for playing time. For example:
  • Strikeout Rate = Strikeouts / Plate Appearances

Also, we
  • Removed unstable samples (<100 MLB PA)
  • Imputed missing values using median imputation
  • Standardized features where appropriate

    Modeling

    We trained and compared five models:

    1. Linear Regression
    2. Lasso Regression
    3. Random Forest
    4. Tuned Random Forest
    5. XGBoost

    We also built an ensemble model that averaged predictions from LASSO, Random Forest, and XGBoost.

    Models were evaluated with a 80/20 train-test split using:

    • RMSE
    • MAE

    Results
    Predictive Model Performance on Test Partition



    The ensemble model achieved the overall strongest performance with lowest RMSE, lowest MAE, and highest R². While predictive power was modest, this resulted made us realize that MLB success depends on far more than minor league statistics alone. There are other important factors such as injuries and coaching prior to playing in the Major League.

    What Traits Predict MLB Success?

    One of the most interesting findings from this project was that different models emphasize different skills.

    • Lasso Regression


    LASSO highlighted: 

    • Walk rate (bb_rate)
    • Singles rate
    • Triples rate
    • Differences between expected and actual offensive production

    This shows that plate discipline is a valuable skill in the Major League.

    • Random Forest & XGBoost



    On the other hand, tree-based models had greater emphasis on:

    • Strikeout rate (k_rate)
    • Exit velocity
    • Barrel rate
    • Hard-hit %
    • Launch speed

    These models favored players with strong power and good contact quality.

    Prospect Archetype Experiment

    To better understand model behavior, we created several hypothetical player profiles.


    The models consistently predicted power-hitters to achieve higher wRC+ values than speed oriented or pure contact hitters. The key takeaway is: Power metrics such as barrel rate, exit velocity, and home run rate had more predictive power than speed or batting-average-driven archetypes.

    Note: Of course, elite all around players had highest wRC+.

    Top Projected Prospects

    Using the ensemble model, we ranked Triple-A hitters with less than 100 MLB plate appearances as of May 22, 2026.



    Predicting a player’s exact MLB wRC+ is challenging, as shown by the relatively high RMSE across all models. However, as an encouraging example, Bryce Eldridge, the highest-ranked prospect on our list, hit a walk-off grand slam against the Nationals on June 10, 2026, shortly after our analysis!

    Credit: STATS 141XP Teammates - Andrew Bush,  Fabiola Campuzano, Ashley Chan, and Stewart Fang
    Data Sources:  Triple-A and MLB statistics were obtained from FanGraphs, while Statcast batted-ball and quality-of-contact metrics were collected from Baseball Savant.


    Index

    sanghyundkim@outlook.com