Skip to content

Ayesha24banu/Customer-Purchase-Behaviour-Analysis-in-Retail

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

32 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ›’ Customer Purchase Behavior Analysis in Retail

This project presents a full-cycle data science solution for analyzing and deriving insights from retail transactional data using Python and Power BI. It involves structured data cleaning, EDA, customer segmentation using RFM + KMeans, Market Basket Analysis using the Apriori algorithm, and business dashboard creation. Outputs are stored in CSV, visualized with Matplotlib/Seaborn, and optionally integrated with a MySQL database or BI dashboards (Power BI/Tableau).


🎯 Business Objective

This project aims to derive strategic insights from customer purchase data in an e-commerce/retail environment by:

  • Identify customers purchasing patterns and trends
  • Segmenting customers based on behavioral metrics (Recency, Frequency, Monetary)
  • Generate association rules for product bundling
  • Recommend strategies for targeted marketing and inventory optimixation
  • Visualize insights through an interactive Power BI dashboard

πŸ“¦ Dataset Overview

Feature Description
InvoiceNo Unique transaction ID
StockCode Unique product ID
Description Product description
Quantity Quantity purchased
InvoiceDate Date and time of transaction
UnitPrice Price per item
CustomerID Unique customer ID
Country Country of purchase

⚠️ Note: Full dataset not included due to GitHub file size limits. Sample CSVs are used for demonstration.


🧰 Tools and Technologies

Layer Technology
Language Python 3.10+
Data Handling Pandas, NumPy
Visualization Matplotlib, Seaborn
ML Algorithms KMeans (Scikit-learn), Apriori (mlxtend)
Database MySQL (via mysql-connector-python)
Notebook Jupyter Notebook
Dashboard (opt.) Power BI (4-page executive dashboard)

🧱 Project Architecture

customer_purchase_analysis/
β”œβ”€β”€ data/ # Raw dataset
β”‚ └── online_retail.csv
β”‚
β”œβ”€β”€ scripts/ # Modular ETL/ML scripts
β”‚ β”œβ”€β”€ _init_.py
β”‚ β”œβ”€β”€ utils.py
β”‚ β”œβ”€β”€ data_cleaning.py
β”‚ β”œβ”€β”€ mysql_pipeline.py
β”‚ β”œβ”€β”€ eda_analysis.py
β”‚ β”œβ”€β”€ rfm_segmentation.py
β”‚ └── market_basket.py
β”‚
β”œβ”€β”€ notebooks/ # Main pipeline orchestrator
β”‚ └── purchase_analysis.ipynb
β”‚
β”œβ”€β”€ outputs/
β”‚ β”œβ”€β”€ data/
β”‚ β”‚ β”œβ”€β”€ clean_online_retail.csv
β”‚ β”‚ β”œβ”€β”€ rfm_segments.csv
β”‚ β”‚ └── association_rules.csv
β”‚ └── figures/
β”‚ β”œβ”€β”€ eda_fig/
β”‚ β”œβ”€β”€ rfm_fig/
β”‚ └── mba_fig/
β”‚
β”œβ”€β”€ logs/
β”‚ └── process_log.log
β”œβ”€β”€ Reports/
β”‚ β”œβ”€β”€ Customer_Purchase_Analysis.pbix
β”‚ β”œβ”€β”€ Customer_Purchase_Analysis.pdf
β”‚ β”œβ”€β”€ Customer_Purchase_Analysis.pptx
β”‚ β”œβ”€β”€ BI_Executive_Summary.png
β”‚ β”œβ”€β”€ BI_Sales_Trend.png
β”‚ β”œβ”€β”€ BI_RFM_Segments.png
β”‚ └── BI_Market_Basket.png
β”œβ”€β”€ requirements.txt
└── README.md

πŸ” Project Workflow

πŸ“Œ Step 1: Data Cleaning

  • Drop duplicates
  • Handle missing values (esp. CustomerID)
  • Removes missing or invalid values
  • Creates new columns like TotalPrice
  • Logs all steps and saves cleaned file

πŸ“ Outputs: image

πŸ“„ Cleaned dataset: outputs/data/clean_online_retail.csv

πŸ“Œ Step 2: MySQL Pipeline

  • Insert & retrieve cleaned data into/from MySQL
  • Optional for production deployment and data integration
  • Handles deduplication and backup

πŸ“ Outputs:

image image

πŸ“Œ Step 3: Exploratory Data Analysis (EDA)

  • Top 10 selling products
  • Monthly & daily revenue trends
  • Hourly purchase patterns (peak times)
  • Country-wise revenue distribution`

πŸ“ Outputs: outputs/figures/eda_fig/ image image image image image

πŸ“Œ Step 4: Key Performance Indicator (KPI) Calculation

  • Total Revenue
  • Unique Customers
  • Quantity Sold
  • Average Order Value
  • Core KPIs (Revenue, Quantity, AOV, etc.)

πŸ“ Outputs image

πŸ“Œ Step 5: RFM Segmentation

  • Calculates Recency, Frequency, and Monetary values
  • Removes outliers using IQR
  • Scales features and applies KMeans clustering β†’ 4 customer segments
  • Uses silhouette + elbow methods to determine optimal k
  • Segments labeled for business use:
    • Loyal Valuable Customers
    • Recent High-Spenders
    • Occasional Low-Spenders
    • Inactive Spenders

πŸ“ Outputs: outputs/figures/rfm_fig/ image image image

πŸ“„ RFM: outputs/data/rfm_segments.csv

πŸ“Œ Step 6: Market Basket Analysis

  • Applies Apriori algorithm to find frequent itemsets
  • Generates association rules (support, confidence, lift)
  • Visualizes top rules (bubble chart, lift bar chart)
  • Great for cross-selling & bundling strategies

πŸ“ Outputs: outputs/figures/mba_fig/ image image image

πŸ“„ Rules: outputs/data/association_rules.csv

πŸ“Š Step 7: Power BI Dashboard Visualization

  • 4 Pages:
    • Executive Summary
    • Sales Analysis
    • Customer Segments
    • Association Rules

πŸ“ˆ Example Business Insights

Insight Value
πŸ“Œ 70% of revenue Comes from top 20% of customers
🎯 Peak time 10 AM – 2 PM on weekdays
πŸ’° Best countries UK (80%), Germany, Netherlands
πŸ›οΈ Bundling "Gift box set" + "Teacups" has 62% confidence
πŸ“Š Segment 4 clusters with tailored marketing strategies

πŸ’Ό Use Cases

  • πŸ“¦ Inventory planning based on top co-purchases
  • 🎯 Loyalty programs for high-value customers
  • πŸ“’ Targeted email offers during peak purchase times
  • πŸ“Š Executive dashboards via Power BI (optional)

πŸ€– Results (Summary)

  • Customer Segments:

    • Loyal Valuable Customers
    • Inactive Spenders
    • Occasional Low Spenders
    • Recent High Spenders
  • Example Association Rule:

    • If user buys "Set of Teacups" β†’ 62% likely to buy "Gift Wrap"

πŸ“Š Power BI Dashboard Overview

4 Page Executive BI Dashboard (Reports/):

Page 1: Executive Summary

  • Total Revenue, Orders, Customers
  • Revenue by Country
  • Segment distribution (from RFM)
BI_Executive_Summary

Page 2: Sales Trends

  • Monthly/Weekly Revenue Trend
  • Top Products Sold
  • Peak Hour Purchases
BI_Sales_Trend

Page 3: RFM Customer Segments

  • RFM Cluster Scatter Plots
  • Segment-specific KPIs
BI_RFM_Segments

Page 4: Market Basket Rules

  • Rules Table (A ➑ B)
  • Top Rules by Lift
  • Scatter: Confidence vs Support
BI_Market_Basket

πŸš€ Getting Started

Prerequisites

  • Python 3.10+
  • MySQL Server (optional)
  • Jupyter Notebook

Installation

git clone https://github.com/Ayesha24banu/Customer-Purchase-Behaviour-Analysis-in-Retail.git
cd Customer-Purchase-Behaviour-Analysis-in-Retail
pip install -r requirements.txt

Run Notebook

jupyter notebook notebooks/purchase_analysis.ipynb

⚠️ Update your MySQL credentials inside purchase_analysis.ipynb and mysql_pipeline.py.


output videos:

Jupyter Notebook: https://drive.google.com/file/d/1Jip6S2ppr5XhR7zQi5dREoGPfj3NtdXh/view?usp=drive_link

Power BI: https://drive.google.com/file/d/1NLckyX9VrAv5E3ddQYDTrqJXJIr0i5D8/view?usp=drive_link


πŸ“ Conclusion

  • RFM segmentation helps personalize marketing and optimize offers.
  • Market Basket Analysis guides product placement, bundling, and inventory management.
  • Visual outputs can be used by business teams with minimal technical effort.

πŸ”„ Future Enhancements

  • Live segmentation using streaming data
  • Recommendation engine using collaborative filtering
  • Customer lifetime value prediction
  • Streamlit app for business teams
  • AutoML for dynamic segmentation
  • NLP analysis on customer reviews
  • Real-time customer segmentation pipeline
  • API-based deployment via FastAPI or Flask

πŸ“Ž Deliverables

  • purchase_analysis.ipynb: Master notebook
  • rfm_segments.csv: RFM clustering results
  • association_rules.csv: Market basket rules results
  • Visual charts in /outputs/figures
  • MySQL-ready table insertions (optional)
  • Power BI dashboard images in Reports/

πŸ™ Acknowledgment

Thanks to the UCI & Kaggle community for the retail dataset.


πŸ‘€ Author

Ayesha Banu

  • πŸŽ“ M.Sc. Computer Science | πŸ… Gold Medalist
  • πŸ’Ό Data Scientist | Data Analyst | Full-Stack Python Developer | GenAI Enthusiast
  • πŸ“« LinkedIn
  • Project: Customer Purchase Behavior Analysis in Retail -- 2025

πŸ“„ License

Distributed under the MIT License. See LICENSE file for details.