Publication: Characterizing National Soccer Identity via K-means Clustering of World Cup Match Performances
Files
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Access Restrictions
Abstract
This study investigates national playing style identity in professional soccer by applying unsupervised machine learning techniques to match statistics from the 2018 and 2022 FIFA World Cups. Motivated by countries like Spain and Brazil with well-known, signature playing styles, we aim to explore whether other countries exhibit national playing styles in the World Cup and to what extent these styles have cultural and historical ties. Our study uses a dataset of 200 match performances from 24 countries with 21 features that represent in-depth match statistics relating to possession, passing, defensive actions, goalkeeping, and shooting from FBRef.com. We implement four variations of k-means clustering assisted by principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE), and uniform manifold approximation and projection (UMAP) for clustering and visualization. Our results show seven clusters in the match data, each corresponding to a well-known playing style or strategic approach. We find that countries with strong national soccer identity more frequently use one playing style while other countries vary their playing styles between matches. While we observe some correlation between chosen playing style and geopolitical factors like income, population size, and geographical region, the globalization of soccer markets appears to have diminished these effects. This study demonstrates how national playing styles can be quantitatively identified and used to understand how countries express their identity through professional soccer.