Skip to content

This a project which aimed at predicting whether a Formula 1 driver will finish on the podium (top 3) using historical data, advanced feature engineering, and machine learning.

Notifications You must be signed in to change notification settings

wsiqz/formula-1-podium-predictor

Repository files navigation


🏎️ Formula 1 Race Outcome Prediction

Predicting whether a Formula 1 driver will finish on the podium (top 3) using historical data, advanced feature engineering, and machine learning.


📌 Table of Contents


Introduction

The world of Formula 1 is rich with data—qualifying rounds, race conditions, team strategies, driver performances, and track histories. This project leverages that data to predict one thing: will a driver finish on the podium?


🏁 Key Terminology

  • Driver — The individual competing in the race. Each F1 team has two drivers.
  • Constructor (Team) — The organization that builds and races the car. Examples: Mercedes, Ferrari, Red Bull Racing.
  • Grand Prix (Race, Round) — A single event in the F1 calendar, typically held over a weekend, consisting of practice sessions, qualifying, and the main race.
  • Qualifying — A session that determines the starting grid for the race. A better qualifying position often improves race performance.
  • Grid Position: The starting position of a driver in the race.
  • Podium — The top 3 finishers in a race — 1st, 2nd, and 3rd place. These are the drivers who physically stand on the podium after the race and receive trophies.
  • Pole Position — The first position on the starting grid, awarded to the fastest qualifier.
  • Pit Stop — When a driver enters the pit lane to change tyres or fix minor issues. Time-consuming, but sometimes strategically vital. The pit stop itself ideally takes 2-3 seconds, but the whole process of entering and exiting the pits lasts about 20-25 seconds.
  • DNF (Did Not Finish) — When a driver does not complete the race due to a crash, mechanical failure, or other issue.

Problem Statement

Goal: Predict whether a driver will finish on the podium using a historical dataset.

This is a binary classification problem with significant class imbalance (only ~15% of drivers finish in the top 3).


Data Sources


Methodology

  1. Data cleaning & merging
  2. Feature engineering (both static and temporal)
  3. Statistical testing
  4. Model training with hyperparameter tuning
  5. Evaluation with real-world test set (2024 races)
  6. Deployment-ready pipeline + API

Exploratory Data Analysis

Exploration included:

  • Class imbalance check
  • Grid position impact
  • Team and driver podium rates
  • Circuit-based performance
  • Weather condition summaries
  • Global distribution of F1 circuits

Statistical Testing

We used:

  • Chi-Squared Tests: For independence of categorical features.
  • Mann-Whitney U Tests: For differences in feature distributions across classes.
  • Results informed feature selection.

Feature Engineering

Key features:

  • Driver Experience (race count)
  • Recent Performance (last 3 races)
  • Rolling Average Finish
  • Constructor Podium Rate
  • Track-Specific Averages
  • Weather Flags (wet, windy, hot, cold)
  • Binary-Encoded Categorical Features

All feature engineering is encapsulated in a reusable F1DataPreprocessor transformer.


Modeling

1. Baseline Models

  • Logistic Regression
  • Random Forest

2. Class Imbalance Handling

  • Cost-sensitive learning
  • SMOTE and over-sampling

3. Advanced Models

  • HistGradientBoostingClassifier
  • LightGBM, XGBoost, CatBoost
  • Ensembles with Voting and Stacking

4. Hyperparameter Tuning

  • Optuna with AUC & F1 scores

🏁 Final Model

Best Model: Random Forest with Optuna-tuned hyperparameters Test AUC: 937 Test F1: 0.72 Test Precision: 0.64


Deployment

Components

  • Custom Transformer (F1DataPreprocessor)

  • Custom Pipeline (F1Pipeline)

  • Joblib Model Saving:

    joblib.dump(preprocessor, "f1_preprocessor.pkl")
    joblib.dump(model, "models/model.pkl")

FastAPI Inference Server

A FastAPI service is provided for predicting podium chances in real-time.

Dockerized API

The API is dockerized for easy deployment:

docker build -t f1-predictor .
docker run -p 8000:8000 f1-predictor

🛠️ How to Run

  1. Clone the repo:

    git clone https://github.com/wsiqz/formula-1.git
    cd formula-1
  2. Install dependencies:

    pip install -r requirements.txt
  3. Run the pipeline notebook:

    notebooks/f1.ipynb
    
  4. Train and export the model:

    joblib.dump(preprocessor, "f1_preprocessor.pkl")
    joblib.dump(model, "models/model.pkl")
  5. Start FastAPI server:

    uvicorn app.main:app --reload

🧾 Conclusion

This project demonstrates how domain knowledge, careful feature engineering, and rigorous modeling techniques can be combined to solve real-world predictive problems—even in complex, dynamic environments like Formula 1.


About

This a project which aimed at predicting whether a Formula 1 driver will finish on the podium (top 3) using historical data, advanced feature engineering, and machine learning.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages