🏏 Cricketer Stats Analytics Pipeline

A modular Python project to scrape, process, and analyze detailed cricket player statistics from ESPN CricInfo. The data is stored in AWS S3 and visualized using Power BI dashboards.

🔍 Overview

This project builds an end-to-end data pipeline that:

Scrapes player-level data (batting, bowling, fielding, all-rounder stats, and personal info)
Transforms and standardizes the data
Aggregates it into master datasets
Uploads it to AWS S3 for dashboarding in Power BI or Tableau

The pipeline is structured using modular Python scripts and classes, with extensibility for containerization and orchestration.

🔀 Tech Stack

Tool	Purpose
Python	Core programming language
Selenium	Web scraping from CricInfo
pandas	Data transformation
boto3	Interacting with AWS S3
Power BI	Dashboard visualization

🔁 Workflow

Scrape data using Selenium for a specific player.
Transform data into clean, analysis-ready format.
Aggregate data across multiple players into master datasets.
Upload to AWS S3 in both raw and transformed forms.
Visualize insights using Power BI.

📂 Project Structure

cricketer-stats/
├── research lab/                 # Exploratory notebooks or R&D scripts
│   ├── EDA.ipynb                 # EDA Notebook
│   ├── check.ipynv               # notebook for experimentation
├── scripts/                      # Python module folder with core logic
│   ├── scraper/                  # Web scraping module
│   ├── transformer/              # Data cleaning and formatting module
│   |── loader/                   # S3 upload/download logic
│   |── aggregator/               # aggregation logic to generate master dataframes
├── scripts.egg-info/             # Auto-generated metadata for Python packaging
├── tests/                        # Test scripts for modules
│   ├── aggregator_test.py
│   ├── scraper_test.py
│   └── transformer_test.py
├── visuals/                      # Power BI (.pbix) and design elements
├── .env                          # AWS credentials and other environment setup
├── .gitignore
├── extras.md                     # Feature backlog and future plans
├── README.md                     # Project documentation
├── requirements.txt              # List of Python dependencies
├── setup.py                      # Package setup file for pip installation
├── test.py                       # Optional test runner script
├── workflow.md                   # Workflow explanation

📌 Key Features

🔍 Scrapes detailed stats: batting, bowling, fielding, allround, and player profile
♻️ Clean modular classes: ScrapeData, TransformData, LoadData, Aggregator
☁️ Uses AWS S3 as the cloud data store
📊 Output ready for Power BI or Tableau
🧱 Code structured for easy scaling and future automation

🚀 Setup Instructions

Clone this repository:

git clone https://github.com/your-username/cricketer-stats.git
cd cricketer-stats

Create and configure a .env file with your AWS credentials:

AWS_ACCESS_KEY_ID=your_key
AWS_SECRET_ACCESS_KEY=your_secret
AWS_DEFAULT_REGION=ap-south-1

Install project dependencies:
```
pip install -r requirements.txt
```
Install the project locally as a package:
```
pip install -e .
```
Run any of the test scripts:
```
python tests/aggregator_test.py
```
⚠️ Make sure to set player_name and bucket_name before running the tests.

🛣 Roadmap & Extras

See extras.txt for upcoming enhancements, including:

Dynamic ground info scraping
Incremental loading logic
Secret manager integration
Docker + Airflow orchestration
Custom exception handling

📖 Using as a Local Python Package

This project is structured as an installable package using setup.py. This allows you to import the core modules (scraper, loader, transformer, aggregator) anywhere in your system.

Installation:

From the project root, run:

pip install -e .

Example Usage:

from scraper import ScrapeData
from transformer import TransformData

This makes testing and modular development cleaner, especially across notebooks and scripts.

📖 For Reference

See workflow.md for a high-level overview of the entire ETL-to-dashboard pipeline.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🏏 Cricketer Stats Analytics Pipeline

🔍 Overview

🔀 Tech Stack

🔁 Workflow

📂 Project Structure

📌 Key Features

🚀 Setup Instructions

🛣 Roadmap & Extras

📖 Using as a Local Python Package

Installation:

Example Usage:

📖 For Reference

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
data		data
research lab		research lab
scripts		scripts
tests		tests
visuals		visuals
.gitignore		.gitignore
README.MD		README.MD
extras.md		extras.md
requirements.txt		requirements.txt
setup.py		setup.py
test.py		test.py
workflow.md		workflow.md

yashj1301/cricketer-stats

Folders and files

Latest commit

History

Repository files navigation

🏏 Cricketer Stats Analytics Pipeline

🔍 Overview

🔀 Tech Stack

🔁 Workflow

📂 Project Structure

📌 Key Features

🚀 Setup Instructions

🛣 Roadmap & Extras

📖 Using as a Local Python Package

Installation:

Example Usage:

📖 For Reference

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages