This project implements a house price prediction system using a custom Linear Regression model in C++ and a data cleaning pipeline in Python.
-
Data Cleaning:
Theclean.py
script processes the raw dataset (data.csv
) by selecting relevant features, removing missing values and duplicates, and saving the result asdata_cleaned.csv
. -
Model Training:
TheLinearRegression.cpp
file contains the C++ implementation of the Linear Regression algorithm. It reads the cleaned dataset, applies a standard scalar (centering each feature and target variable by subtracting the mean and scaling by the standard deviation), trains the model, and evaluates predictions.
Additionally, it provides a custom train_test_split function to split the dataset into training and testing sets.
data.csv
: Raw dataset with housing information.clean.py
: Python script for data preprocessing.data_cleaned.csv
: Clean dataset generated fromclean.py
.LinearRegression.cpp
: C++ source code implementing the Linear Regression model.README.md
: Project documentation.
- Python 3.x with the
pandas
library. - Install dependencies with:
pip install pandas
- A C++ compiler supporting C++11 or later (e.g., g++).
- Standard C++ libraries.
Run the following command to clean the dataset:
python3 clean.py
This generates data_cleaned.csv
which is required by the C++ model.
Compile the C++ source code using:
g++ HousePricePrediction.cpp DataPreprocessing.cpp LinearRegression.cpp -o ./out
Then execute the model with:
./out
- Ensure that
data_cleaned.csv
is available before running the C++ application. - Adjust the parameters in the source files as needed.
- For any issues during compilation or execution, verify that all dependencies are properly installed.
This project is licensed under the Apache License. See the LICENSE file for details.