jaehyeong.net
writing
2026-04-12·baseball·8 min

Park factors are lying to you (a little)

Why the standard 100-centered park factor is a noisy estimator for KBO's small sample, and what a partial-pooling fix looks like.

This is a placeholder draft. The full essay is in a Notion doc that I haven't been brave enough to publish yet.

The question

Why does KBO's quoted park factor for a given stadium swing 8–12 points year over year? Is the park changing, or is the estimator?

The setup

A standard park factor is a ratio of run-scoring at home vs. away. With ~70 home games per team and ~10 teams, you get a sample that feels large but is dwarfed by year-to-year offensive variance. The naive estimator picks up that noise and prints it as if it were a park trait.

The fix (sketch)

Partial pooling across years per park, with an informative prior set by the league mean and a between-year variance term. The shrinkage is dramatic for parks with mild deviations and gentle for the obvious outliers.

brm(
  runs ~ 1 + (1 | park) + (1 | season:park),
  data = games,
  family = poisson(),
  prior = prior(normal(0, 0.2), class = "sd")
)

What I want to write next