Skip to content

mahdikhashan/tu-wien-machine-learning-operation

Repository files navigation

Machine Learning Operation Course Project

Setup

to run the project, you either can use the provided nix-shell to have a environment for jupter notebooks and testing. also, you can care a custom Python env with the provided requirement file in the test path.

Run Tests

python -m pytest tests/test_dataset_quality.py

Dataset Quality Tests

to make sure the flow of data is consistent and data passes the minimum required quality, I have decided to have following test cases:

Test Case Description Range
test_load_csv Ensures the dataset is successfully loaded and not empty. N/A (checks file presence and data existence).
test_validate_no_missing_values_in_title Ensures every job posting has a title. N/A (ensures completeness).
test_validate_no_missing_values_in_description Ensures every job posting has a description. N/A (ensures completeness).
test_validate_column_in_range (max_salary) Ensures salaries are within a reasonable range. 0.0 (no negative salaries) to 1,000,000.0.
test_validate_column_values_in_list (pay_period) Ensures pay_period contains only valid values. Must be one of: ["BIWEEKLY", "HOURLY", "MONTHLY", "WEEKLY", "YEARLY"].

Dataset

  1. https://www.kaggle.com/datasets/arshkon/linkedin-job-postings

About

TU Wien Machine Learning Operation (194.182) Course Project

Topics

Resources

License

Stars

Watchers

Forks