Discovering the Best Passer in the Bundesliga Using Data and SQL
Jul 11, 2024
Analyzing football performance through data is an exciting endeavor that reveals hidden gems and offers concrete insights into player capabilities. Today, we dive into the Bundesliga's 2015-2016 season to determine who the best passer was. We'll use the power of SQL's GROUP BY
functionality combined with passing data from the StatsBomb API, which provides free data for various competitions.
Setting Up the Data
First, we load the passing data from the StatsBomb API into an SQL server. This dataset contains a wealth of information, but for our purposes, we've created a view that filters out only the passes from the Bundesliga's 2015-2016 season. The key columns we are working with include:
- player_name: Name of the player.
- team_name: Name of the team.
- pass_completed: 1 if the pass was completed, 0 if not.
- pass_attempted: Always 1 since each row represents a single pass event.
- pass_outcome: Indicates whether the pass was complete, incomplete, or another type such as offside or injury clearance.
Filtering Relevant Data
To determine the best passer, we focus on the pass completion ratio. This ratio is calculated using the columns pass_completed
and pass_attempted
. However, not all pass outcomes are relevant. For example, passes marked as offside, injury clearance, or unknown do not contribute to assessing a player's passing ability. Therefore, we exclude these from our analysis:
After identifying the relevant pass outcomes (complete, incomplete, out), we set up our filter:
Grouping and Aggregating Data
To find the best passer, we need to aggregate our data by player and team. We use the GROUP BY
clause to achieve this:
This query gives us the total number of completed and attempted passes for each player. Next, we calculate the pass completion ratio by dividing the total completed passes by the total attempted passes:
Filtering by Minimum Passes Attempted
To ensure our analysis is meaningful, we only consider players who attempted at least 1,000 passes during the season. This filter is applied after the grouping stage using the HAVING
clause:
Combining all these elements, our final query looks like this:
Results and Insights
Running this query reveals that Joshua Kimmich had the highest pass completion ratio among players who attempted at least 1,000 passes, with an impressive 92.5%. This is a remarkable achievement, especially considering his young age during the 2015-2016 season.
For additional insights, we can find the top passer for each team by ordering the results first by team_name
and then by pass_completion_ratio
:
This provides a detailed view of the best passers across different teams in the Bundesliga.
Conclusion
Using SQL and data from the StatsBomb API, we successfully identified Joshua Kimmich as the best passer of the 2015-2016 Bundesliga season based on his pass completion ratio. By leveraging SQL's GROUP BY
and HAVING
clauses, we ensured our analysis was robust and meaningful. This methodology can be applied to other datasets and seasons, offering a powerful tool for football analysts and enthusiasts.
If you have any questions or would like to delve deeper into the data, feel free to leave a comment. I'm here to help!
Are you struggling to remember the SQL syntax?