- ๐ Project Overview
- ๐ Step 1. Dataset & Table Structure
- ๐งน Step 2. Data Cleaning Pipeline (Steps 01โ07)
- ๐ Step 3. Sales KPIs Analysis
- ๐ฏ Step 4. Customer Segmentation using RFM
- ๐ Step 5. Advanced Sales & Operational Analysis
- ๐ Step 6. Power BI Dashboard
This project presents a comprehensive analysis of sales performance and customer behavior for Faragir Sanat Mehrbin, a laboratory equipment manufacturing company. The objective is to clean raw transactional data, generate actionable KPIs, and create segmentation and trend insights that can drive business decisions.
Using SQL, we processed and analyzed 3 years of transaction-level data (2018โ2020), building views and reports for integration into dashboards (Power BI). Final outputs were saved as .xlsx
files and structured for business intelligence reporting.
The dataset was imported into a SQL database and structured using the following table:
CREATE TABLE sales_data (
InvoiceNumber TEXT,
SalesChannel TEXT,
WarehouseLocationCode TEXT,
ProcurementDate TEXT,
SalesOrderDate TEXT,
DispatchDate TEXT,
DeliveryDate TEXT,
Currency TEXT,
SalesRepID INTEGER,
CustomerID INTEGER,
BranchCode TEXT,
ProductCode INTEGER,
QuantityOrdered INTEGER,
DiscountRate REAL,
UnitSalePrice REAL,
UnitProductionCost REAL
);
The raw data underwent structured cleaning:
-- 01. Remove rows with NULLs in critical fields
DELETE FROM sales_data
WHERE
InvoiceNumber IS NULL OR
SalesOrderDate IS NULL OR
CustomerID IS NULL OR
ProductCode IS NULL;
-- 02. Remove rows where price or quantity is zero or negative
DELETE FROM sales_data
WHERE
UnitSalePrice IS NULL OR UnitSalePrice <= 0 OR
QuantityOrdered IS NULL OR QuantityOrdered <= 0;
-- 03. Standardise date formats (if any slashes exist, replace with dashes)
-- This is only useful if the data was imported with slashes (e.g., '2018/05/31')
UPDATE sales_data
SET SalesOrderDate = REPLACE(SalesOrderDate, '/', '-')
WHERE SalesOrderDate LIKE '%/%';
-- 04. Trim extra spaces from text fields
UPDATE sales_data
SET
InvoiceNumber = TRIM(InvoiceNumber),
SalesChannel = TRIM(SalesChannel),
WarehouseLocationCode = TRIM(WarehouseLocationCode),
ProductCode = TRIM(ProductCode),
ProductGroup = TRIM(ProductGroup),
ProductCategory = TRIM(ProductCategory),
ProductSubCategory = TRIM(ProductSubCategory),
ProductDescription = TRIM(ProductDescription),
Currency = TRIM(Currency);
-- 05. Remove duplicate rows based on InvoiceNumber + CustomerID + ProductCode
-- Keeps the first occurrence only (based on rowid)
DELETE FROM sales_data
WHERE rowid NOT IN (
SELECT MIN(rowid)
FROM sales_data
GROUP BY InvoiceNumber, CustomerID, ProductCode
);
-- 06. Keep only IRR currency rows (filter out other currencies)
DELETE FROM sales_data
WHERE Currency != 'IRR';
-- 07. Keep only rows within the 2018โ2020 range
-- Optional: Adjust this based on project scope
DELETE FROM sales_data
WHERE SalesOrderDate < '2018-01-01' OR SalesOrderDate > '2020-12-31';
Understanding key performance indicators (KPIs) is crucial for any business. For Faragir Sanat Mehrbin, we calculated foundational KPIs using SQL to assess:
- Order volume
- Customer base
- Revenue
- Average order size
-- Overall KPIs: Total Orders, Unique Customers, Revenue & AOV
SELECT
COUNT(DISTINCT InvoiceNumber) AS total_orders,
COUNT(DISTINCT CustomerID) AS unique_customers,
ROUND(SUM(QuantityOrdered * UnitSalePrice), 2) AS total_revenue,
ROUND(AVG(QuantityOrdered * UnitSalePrice), 2) AS avg_order_value
FROM sales_data;
๐ Output Preview: Click to expand
total_orders | unique_customers | total_revenue | avg_order_value |
---|---|---|---|
7991 | 999 | 82,692,726.60 | 10,348.23 |
Understanding the revenue contribution of each sales channel enables Faragir Sanat Mehrbin to optimize resource allocation and tailor sales strategies. This section calculates the number of orders and total revenue per channel.
-- Revenue Breakdown by Sales Channel
SELECT
SalesChannel,
COUNT(DISTINCT InvoiceNumber) AS total_orders,
SUM(QuantityOrdered * UnitSalePrice) AS revenue
FROM sales_data
GROUP BY SalesChannel;
๐ Output
SalesChannel | total_orders | revenue |
---|---|---|
Distributor | 2,038 | 21,388,088.40 |
Online | 2,041 | 20,943,617.10 |
Agency | 1,952 | 20,248,639.50 |
Direct Sales | 1,960 | 20,112,381.60 |
๐ Full dataset available in sales KPI .xlsx
โ Sheet: Sales Channel Revenue
Understanding the revenue contribution of each branch enables Faragir Sanat Mehrbin to identify high-performing locations and guide strategic investment. This section calculates the number of orders, total revenue, and each branchโs percentage contribution to overall sales.
-- Revenue Breakdown by Branchcode
SELECT
BranchCode,
ROUND(SUM(QuantityOrdered * UnitSalePrice), 2) AS total_revenue,
COUNT(DISTINCT InvoiceNumber) AS total_orders,
ROUND(
SUM(QuantityOrdered * UnitSalePrice) * 100.0 /
(SELECT SUM(QuantityOrdered * UnitSalePrice) FROM sales_data), 1
) AS revenue_percent
FROM sales_data
GROUP BY BranchCode
ORDER BY total_revenue DESC;
๐ Output
BranchCode | total_revenue | total_orders | revenue_percent |
---|---|---|---|
MHD004 | 17,577,463.4 | 1,660 | 21.3 |
ISF002 | 16,812,732.1 | 1,647 | 20.3 |
TBZ005 | 16,437,873.8 | 1,563 | 19.9 |
SHZ003 | 16,007,197.8 | 1,565 | 19.4 |
TEH001 | 15,857,459.5 | 1,556 | 19.2 |
๐ Full dataset available in sales KPI .xlsx
โ Sheet: Revenue by Branch Code
Analyzing revenue trends over time helps Faragir Sanat Mehrbin identify seasonality, growth periods, and periods requiring strategic intervention. This query aggregates total monthly revenue based on the SalesOrderDate
.
-- Monthly Revenue Trend
SELECT
SUBSTR(SalesOrderDate, 1, 7) AS month,
ROUND(SUM(QuantityOrdered * UnitSalePrice), 2) AS monthly_revenue
FROM sales_data
GROUP BY month
ORDER BY month;
๐ Output Preview: Click to expand
month | monthly_revenue |
---|---|
2018-05 | 75629.6 |
2018-06 | 2454752.7 |
2018-07 | 2707550.4 |
... | ... |
2020-11 | 2977607.3 |
2020-12 | 3046617.3 |
๐ Full dataset available in sales KPI .xlsx
โ Sheet: Monthly Revenue
Identifying top-paying customers helps Faragir Sanat Mehrbin prioritize key accounts, tailor loyalty strategies, and understand high-value client behavior. This section extracts the top 10 customers based on total purchase value.
-- Top 10 Customers by Total Revenue
SELECT
CustomerID,
SUM(QuantityOrdered * UnitSalePrice) AS customer_revenue
FROM sales_data
GROUP BY CustomerID
ORDER BY customer_revenue DESC
LIMIT 10;
๐ Output Preview: Click to expand
CustomerID | customer_revenue |
---|---|
2701 | 251,591.70 |
2119 | 231,183.50 |
2903 | 225,247.30 |
2052 | 225,247.30 |
2778 | 222,172.00 |
2517 | 218,889.00 |
2678 | 217,059.90 |
2589 | 215,116.90 |
2121 | 210,768.60 |
2995 | 201,830.80 |
๐ Full dataset available in the sheet "Top 10 Customers"
inside the sales KPI.xlsx file.
Segmenting customers based on Recency, Frequency, and Monetary Value (RFM) allows Faragir Sanat Mehrbin to target campaigns more effectively, retain valuable clients, and re-engage churned customers.
-- Step 1: Create RFM base table
CREATE VIEW vw_customer_rfm AS
SELECT
CustomerID,
ROUND(JULIANDAY('now') - JULIANDAY(MAX(SalesOrderDate))) AS recency,
COUNT(*) AS frequency,
ROUND(SUM(QuantityOrdered * UnitSalePrice), 2) AS monetary
FROM sales_data
GROUP BY CustomerID;
-- Step 2: Add RFM Segment Labels
SELECT *,
CASE
WHEN recency <= 30 THEN 'New'
WHEN recency <= 90 THEN 'Active'
WHEN recency <= 180 THEN 'At Risk'
ELSE 'Churned'
END AS recency_segment,
CASE
WHEN frequency >= 10 THEN 'Loyal'
WHEN frequency >= 5 THEN 'Returning'
ELSE 'One-Time'
END AS frequency_segment,
CASE
WHEN monetary >= 100000 THEN 'High-Value'
WHEN monetary >= 50000 THEN 'Mid-Value'
ELSE 'Low-Value'
END AS monetary_segment
FROM vw_customer_rfm;
๐ Click to expand
CustomerID | recency | frequency | monetary | recency_segment | frequency_segment | monetary_segment |
---|---|---|---|---|---|---|
2000 | 1882 | 5 | 65,010.10 | Churned | Returning | Mid-Value |
2001 | 1821 | 7 | 70,169.10 | Churned | Returning | Mid-Value |
2002 | 1759 | 6 | 88,406.50 | Churned | Returning | Mid-Value |
2003 | 1911 | 5 | 31,999.20 | Churned | Returning | Low-Value |
2004 | 1695 | 13 | 153,054.80 | Churned | Loyal | High-Value |
2005 | 1726 | 5 | 70,316.50 | Churned | Returning | Mid-Value |
2006 | 1865 | 11 | 110,154.70 | Churned | Loyal | High-Value |
2007 | 1692 | 10 | 64,722.00 | Churned | Loyal | Mid-Value |
2008 | 2101 | 9 | 114,744.20 | Churned | Returning | High-Value |
2009 | 1689 | 15 | 128,358.60 | Churned | Loyal | High-Value |
2010 | 1696 | 9 | 103,474.80 | Churned | Returning | High-Value |
2011 | 1818 | 6 | 79,830.50 | Churned | Returning | Mid-Value |
2012 | 1805 | 12 | 110,409.30 | Churned | Loyal | High-Value |
....... | ....... | ....... | ....... | ....... | ....... | ....... |
๐ Full segmentation results are available in the customer segmentation.xlsx
file.
To extract deeper insights from the sales dataset, a set of advanced SQL queries were developed to evaluate profitability, delivery efficiency, representative performance, repeat customer rate, and financial health over time.
-- ๐ Identify the top products based on profit (revenue - production cost)
SELECT
ProductCode,
ROUND(SUM((UnitSalePrice - UnitProductionCost) * QuantityOrdered), 2) AS total_profit,
ROUND(SUM(QuantityOrdered * UnitSalePrice), 2) AS total_revenue,
ROUND(SUM(QuantityOrdered * UnitProductionCost), 2) AS total_cost
FROM sales_data
GROUP BY ProductCode
ORDER BY total_profit DESC
LIMIT 10;
-- ๐ Evaluate operational efficiency by measuring delivery time in days
SELECT
BranchCode,
ROUND(AVG(JULIANDAY(DeliveryDate) - JULIANDAY(DispatchDate)), 2) AS avg_delivery_days,
COUNT(*) AS total_orders
FROM sales_data
WHERE DeliveryDate IS NOT NULL AND DispatchDate IS NOT NULL
GROUP BY BranchCode
ORDER BY avg_delivery_days;
-- ๐ผ Assess sales representatives by revenue and how much discount they give
SELECT
SalesRepID,
ROUND(SUM(QuantityOrdered * UnitSalePrice), 2) AS total_revenue,
ROUND(SUM(QuantityOrdered * UnitSalePrice * DiscountRate), 2) AS total_discount_given,
COUNT(DISTINCT CustomerID) AS served_customers
FROM sales_data
GROUP BY SalesRepID
ORDER BY total_revenue DESC;
-- ๐ How many customers placed more than one order?
SELECT
COUNT(*) * 100.0 / (SELECT COUNT(DISTINCT CustomerID) FROM sales_data) AS repeat_customer_percentage
FROM (
SELECT CustomerID
FROM sales_data
GROUP BY CustomerID
HAVING COUNT(DISTINCT InvoiceNumber) > 1
);
-- ๐ Financial health: Revenue vs cost per month
SELECT
DATE(SalesOrderDate, 'start of month') AS month,
ROUND(SUM((UnitSalePrice - UnitProductionCost) * QuantityOrdered), 2) AS gross_profit,
ROUND(SUM(UnitSalePrice * QuantityOrdered), 2) AS revenue,
ROUND(SUM(UnitProductionCost * QuantityOrdered), 2) AS production_cost,
ROUND(
SUM((UnitSalePrice - UnitProductionCost) * QuantityOrdered) * 100.0 /
NULLIF(SUM(UnitSalePrice * QuantityOrdered), 0), 2
) AS gross_margin_percent
FROM sales_data
GROUP BY month
ORDER BY month;
๐ฐ Top 10 Most Profitable Products
ProductCode | total_profit | total_revenue | total_cost |
---|---|---|---|
1010 | 434,924.47 | 1,088,267.60 | 653,343.13 |
1004 | 428,063.67 | 1,149,277.80 | 721,214.13 |
1071 | 424,500.81 | 1,075,772.10 | 651,271.29 |
1091 | 404,410.79 | 986,494.60 | 582,083.81 |
1062 | 395,708.16 | 1,070,371.90 | 674,663.74 |
1041 | 394,044.89 | 1,072,596.30 | 678,551.41 |
1049 | 388,628.41 | 1,008,222.70 | 619,594.29 |
1070 | 388,394.58 | 975,285.50 | 586,890.92 |
1063 | 376,479.50 | 953,550.70 | 577,071.20 |
1088 | 358,813.34 | 979,486.40 | 620,673.06 |
๐ Branch-Level Delivery Efficiency
BranchCode | avg_delivery_days | total_orders |
---|---|---|
TBZ005 | 5.44 | 1,563 |
ISF002 | 5.46 | 1,647 |
SHZ003 | 5.52 | 1,565 |
MHD004 | 5.53 | 1,660 |
TEH001 | 5.57 | 1,556 |
๐ผ Sales Representative Performance
SalesRepID | total_revenue | total_discount_given | served_customers |
---|---|---|---|
26 | 3346569.6 | 366156.17 | 255 |
1 | 3261359 | 435669.68 | 255 |
13 | 3242525.3 | 423124.1 | 276 |
... | ... | ... | ... |
๐ Repeat Customers
๐ Repeat customer rate: 99.89%
๐ Monthly Financial Summary
month | gross_profit | revenue | production_cost | gross_margin_percent |
---|---|---|---|---|
5/1/2018 | 24264.45 | 75629.6 | 51365.15 | 32.08 |
6/1/2018 | 903955.49 | 2454752.7 | 1550797.21 | 36.82 |
7/1/2018 | 1013208.10 | 2707550.4 | 1694342.30 | 37.42 |
8/1/2018 | 1076989.82 | 2909421.4 | 1832431.57 | 37.02 |
... | ... | ... | ... | ... |
๐ Raw results exported as advance analysis.xlsx
for integration with Power BI dashboards and final report delivery.
After cleaning and transforming the data using SQL and RFM techniques, the final datasets were imported into Power BI to create an interactive sales performance dashboard. This dashboard integrates all key metrics, including revenue trends, sales channel analysis, branch performance, customer segmentation, and top customer analysis.
The Power BI dashboard was built using the tables and views created in the previous steps, along with additional DAX formulas for dynamic calculations and segmentation. The dashboard file and published link are both available in the dashboard folder of this repository.
๐ Access the dashboard file and online report:
- Power BI dashboard file:
- View Interactive Power BI Dashboard