to run the project, you either can use the provided nix-shell to have a environment for jupter notebooks and testing. also, you can care a custom Python env with the provided requirement file in the test path.
python -m pytest tests/test_dataset_quality.py
to make sure the flow of data is consistent and data passes the minimum required quality, I have decided to have following test cases:
Test Case | Description | Range |
---|---|---|
test_load_csv |
Ensures the dataset is successfully loaded and not empty. | N/A (checks file presence and data existence). |
test_validate_no_missing_values_in_title |
Ensures every job posting has a title. | N/A (ensures completeness). |
test_validate_no_missing_values_in_description |
Ensures every job posting has a description. | N/A (ensures completeness). |
test_validate_column_in_range (max_salary ) |
Ensures salaries are within a reasonable range. | 0.0 (no negative salaries) to 1,000,000.0 . |
test_validate_column_values_in_list (pay_period ) |
Ensures pay_period contains only valid values. |
Must be one of: ["BIWEEKLY", "HOURLY", "MONTHLY", "WEEKLY", "YEARLY"] . |