Skip to content

4. K Means Clustering of Teams Based on Performances

Tara Nguyen edited this page Dec 18, 2020 · 10 revisions

Previous: Exploratory Data Analysis

Contents


For each league, k-means clustering was performed on all teams to put them into four clusters based on the points per game (PPG) and win proportions averaged across all five seasons for each season-end position.

Cluster Sizes and Means

  • Bundesliga

    Cluster ID Size Mean PPG Mean win proportion
    1 6 1.60 .45
    2 7 1.18 .30
    3 3 .83 .20
    4 2 2.26 .69
  • La Liga

    Cluster ID Size Mean PPG Mean win proportion
    1 4 1.62 .46
    2 10 1.21 .31
    3 3 .78 .18
    4 3 2.19 .66
  • Major League Soccer (MLS)

    Cluster ID Size Mean PPG Mean win proportion
    1 5 .88 .22
    2 3 1.81 .53
    3 10 1.49 .41
    4 6 1.19 .32
  • Premier League (EPL)

    Cluster ID Size Mean PPG Mean win proportion
    1 5 1.76 .50
    2 9 1.21 .32
    3 4 .82 .20
    4 2 2.33 .73

Visualization of the Clusters

kmeans.png

In each of the European leagues, the "best" cluster (i.e., one with the highest PPG and win proportions), each of which having two members, is far separated from the other clusters. This points toward the big gap between the top 2 teams and the other teams in these leagues. In the EPL, there is also a big gap between the "second-best" cluster (denoted by black circles) and the "third-best" one (denoted by red triangles), further demonstrating a lack of competitive balance in the league. In the Bundesliga and La Liga, the gaps among the "non-best" clusters are smaller and quite similar to one another. In comparison, in the MLS, the clusters are evenly separated, pointing toward a high level of competitive balance here.


Next: Conclusions and Final Thoughts