
Pi-Ratings vs Elo vs xG: Three Models, One Football Match
Three statistical lenses for the same 90 minutes. Each picks up a different part of the truth — Goalence picks one, but here's why all three matter.
The Three Pillars of Football Modelling
If you want to predict a football match, there are three statistical traditions to choose from: Elo (the chess-rating veteran), Pi-Ratings (the home/away specialist), and xG (the shot-quality measure). Each one captures part of the truth, none captures all of it.
Goalence uses Pi-Ratings + Poisson at the core, but every prediction we ship is influenced by all three traditions. Here's how they compare.
Elo: The Chess Method, Borrowed
Elo was invented for chess in the 1960s. A single number per player. Win, your number goes up; lose, it goes down. The amount of movement depends on whether the result was expected — beating someone 200 points above you is worth more than beating someone 200 points below.
For football, Elo treats every team as one rating. Strength: simple, robust, well-understood. Weakness: it ignores the fact that football has a massive home-field advantage. Real Madrid at the Bernabéu and Real Madrid in Atletico's away dressing room aren't really the same team — but a single Elo number can't tell you that.

Pi-Ratings: One Team, Two Numbers
Pi-Ratings (Constantinou & Fenton, 2013) takes the Elo idea and splits each team into a home rating and an away rating. Two numbers per team. Real Madrid at home gets one rating; Real Madrid away gets another. After every match, both ratings update based on the actual result vs the pre-match expectation.
Strength: captures home-field heterogeneity that Elo misses. La Liga's home advantage averages 0.35 goals — Pi-Ratings makes that gap explicit.
Weakness: harder to compare across leagues. A Pi-Rating of 1.5 in the Premier League means something different from 1.5 in MLS. Goalence handles this with league-specific calibration.
xG: The Shot-Quality Lens
xG (Expected Goals) is the youngest of the three — it became mainstream around 2015. Instead of looking at results, xG looks at the quality of chances. Every shot gets a probability based on its location, angle, body part, and game state. A 0.30 xG shot is one most strikers would convert about 30% of the time.
Strength: captures process, not just outcome. A team that creates 2.5 xG but loses 0-1 is unlucky; xG flags this.
Weakness: doesn't directly tell you who wins. You still need a way to turn xG into a scoreline probability — and that's where Poisson comes in.
Goalence's Choice
We use Pi-Ratings + Poisson at the core because the combination handles home-field correctly, updates after every match, and converts cleanly into outcome probabilities. xG informs the lambda values, but the rating engine is Pi-Ratings.
Is this the best system? Honest answer: no system is. Our forward-tracking accuracy sits around 58% on resolved matches — better than pure Elo (~54%) and pure xG-to-result (~55%) in the same window, comparable to ensemble approaches. The forward-tracking is what matters: we log every prediction before kickoff and compare against the actual result.
None of these models predict the future perfectly. They make the unpredictable a little less so.

Tags
Frequently asked questions
Why doesn't Goalence use Elo?⌄
Elo treats every team as a single rating, ignoring home/away asymmetry. In a league like La Liga where home advantage averages 0.35 goals, Elo loses meaningful signal. Pi-Ratings handles this natively.
Is xG more accurate than Pi-Ratings?⌄
Different question, different answer. xG measures process (chance quality); Pi-Ratings measures team strength updated by results. They're complementary — Goalence uses both: xG to inform lambda values, Pi-Ratings to derive base strengths.
What is forward-tracking?⌄
Every Goalence prediction is logged before kickoff and compared against the actual result after FT. No back-fill, no 'what we would have predicted'. The accuracy figure on /methodology reflects exactly that.