EDA stands for Exploratory and Explanatory Data Analysis. I am using an e-commerce dataset containing various online transaction information. This dataset has 8 variables and 4870 rows. Here is the description of each variable:
- InvoiceNo: A unique number/code for each sale or purchase
- StockCode: A unique number/code for each product
- Description: The name or description of each product
- Quantity: The quantity of products bought/sold
- InvoiceDate: The date the invoice was issued
- UnitPrice: The price per unit of the product
- CustomerID: A unique number for each customer
- Country: The country where the transaction took place or the location of the customer who made the purchase
The purpose of this analysis is to provide a basic overview and deep understanding of the transaction information conducted from December 2010 to December 2011. Here are some aspects being analyzed:
- Products with the highest sales
- Countries with the highest sales
- Correlation between product prices and the quantity sold
- Customers with the highest purchases
Some insights obtained include:
- November 2011 had the highest sales period
- The United Kingdom dominated the sales compared to other countries. The highest sales occurred in November 2011
- There is a weak and negative correlation between Unit Price and Quantity. This means that products with low Unit Prices are not always sold in high quantities, and/or products with high Unit Prices are not always sold in low quantities
- Some customers purchased a few items but incurred high costs (total price)
These insights can help the company to:
- Maximize profit in countries with high sales volumes and apply the same sales strategies in countries with low sales volumes
- Maintain customer loyalty by providing rewards to customers with the highest purchases
- Evaluate products with low sales volumes
- Identify market segments for specific products so that recommendation features can be accurately applied to each customer segment
This analysis still needs further development, especially in terms of:
- Evaluating the causes of the significant sales increase in November and the significant decrease in sales in the following month (December)
- Classifying the types of products with low sales and identifying the reasons behind it.
If you have any suggestions or feedback, please don't hesitate to contact to me in direct message on Email or LinkedIn: [email protected] or https://www.linkedin.com/in/novia-anggita-aprilianti/
#python #EDA #featuresengineering #datacleaning #ecommerce