AI & ML

Building a Practical Expected Goals (xG) Model Using R and worldfootballR

May 09, 2026 5 min read views

In the realm of football analytics, expected goals (xG) has emerged as a transformative metric, allowing analysts to gauge the quality of scoring opportunities rather than just tallying goals. The growing sophistication of xG models has become essential for teams seeking competitive advantages, revealing deeper insights about performance that's hidden within the basic statistics of shots and goals. This evolution from traditional metrics to a comprehensive model delineates a new landscape of football analysis, presenting a strong case for embracing data-driven approaches.

Why Expected Goals Matter

The significance of xG goes beyond merely quantifying attempts at goal. By assigning a probability to each shot based on its characteristics—distance, angle, body part, and game situation—the xG model provides a clearer picture of a player's and a team's scoring potential. For instance, a shot taken from just a few meters out will typically yield a high xG score, while an ambitious attempt from distance will predictably score lower. This probabilistic framework better aligns with how football is played and understood, setting a foundation for more nuanced analyses.

Essential Components of an xG Model

Building an effective xG model relies on various features that encapsulate the shot's context. Key variables include:

  • Shot distance from the goal
  • Angle of the shot
  • Body part utilized
  • Minute of the match
  • Types of play situation (e.g., open play, set pieces)
  • Home or away context

Incorporating these features ensures that the expected goals output reflects realistic scoring probabilities, capturing the intricacies of match dynamics that conventional statistics might overlook.

Building Your xG Model in R

For analysts comfortable with R, crafting a reproducible xG model is straightforward, thanks to libraries like worldfootballR. Using packages such as dplyr, ggplot2, and pROC, analysts can streamline their football analytics process:

install.packages(c("tidyverse", "ggplot2", "dplyr", "worldfootballR"))

This installation sets you up with the core tools necessary for data analysis. Next, constructing a synthetic dataset mimicking real match events can serve as a practical testing ground, ensuring your workflows are robust before applying them to real-world datasets from sources like FBref or Statsbomb.

Feature Engineering: Crucial for Accurate Predictions

Feature engineering plays a pivotal role in enhancing the accuracy of your xG model. Key adjustments involve calculating crucial metrics like the distance to the goal and the angle of the shot.

goal_x <- 120
goal_y <- 40
shots <- shots %>%
mutate(
distance_to_goal = sqrt((goal_x - x_location)^2 + (goal_y - y_location)^2),
angle_to_goal = atan2(abs(goal_y - y_location), goal_x - x_location),
angle_degrees = angle_to_goal * 180 / pi
)

These new features feed directly into your model, allowing it to make more informed predictions about whether a shot is likely to result in a goal.

Logistic Regression for Binary Outcomes

Given that the outcome of a shot is binary—either it results in a goal or does not—logistic regression surfaces as an ideal modeling technique. This statistical method suits the nature of xG calculations well, offering flexibility for including multiple predictive features.

xg_model <- glm(
goal ~ distance_to_goal + angle_degrees + body_part + situation + home_away + minute,
data = train_data,
family = binomial()
)

Through this approach, a clear statistical relationship emerges between your predictive variables and goal-scoring likelihood, paving the way for nuanced performance evaluations.

Evaluating and Refining Your Model

It's critical to assess your xG model’s calibration and predictive power. Two essential metrics for evaluation are the Brier Score and the area under the ROC curve (AUC). A good xG model should yield a Brier Score near zero and produce well-calibrated probabilities that reflect actual goal outcomes across numerous attempts.

roc_obj <- roc(response = test_predictions$goal, predictor = test_predictions$xg)
auc(roc_obj)

Such evaluations guide analysts on whether to refine their model by introducing additional features or selecting alternate modeling techniques, such as random forests or gradient boosting, for higher predictive accuracy.

Advanced Developments and Future Directions

The initial xG model serves as a springboard for further exploration. Introducing interaction terms can account for the different effects of variables under various circumstances, leading to more tailored insights. For example, a header's conversion probability may respond differently to distance compared to a shot taken with a foot. Future iterations may also harness advancements in machine learning and event data to deepen the analysis—layers that leverage tracking data or player positioning could significantly enhance the model’s reliability.

Conclusion: The Path Forward in Football Analytics

As football continues to embrace data analytics, the expected goals model stands out as a prime example of how advanced metrics can reshape our understanding of performance. While traditional statistics will always hold value, those working in this space must recognize the broader implications of xG in tactical evaluation and player development. By adopting sophisticated modeling techniques and pursuing a data-driven mindset, football analysts can uncover insights that transcend mere box scores, elevating the analysis to a strategic level that can drive success on the pitch.