MDLE First Assignment

The objective of this project was to implement the A-Priori algorithm to obtain the most frequent itemsets for a list of conditions for a large set of patients, obtaining then associations between conditions by extracting rules of the forms (X) -> Y and (X, Y ) -> Z. Another goal was to implement and apply LSH to identify similar news articles from a dataset.

Course

This project was developed under the Mining Large Scale Datasets course of University of Aveiro.

How to run

Exercise 1

For each k (2 or 3), run the following command, inside the /src/ directory:

spark-submit conditions.py <K> ../data/conditions.csv

For a sample run, execute:

spark-submit conditions.py <K> ../data/conditions_truncated.csv

The results can be found inside the /results/ directory.

Exercise 2

Run the following command, inside the /src/ directory:

spark-submit lsh.py ../data/covid_news_truncated.json <R> <B>

Grade

This project's grade was 16,7 out of 20.

Authors

Eduardo Santos: eduardosantoshf
Pedro Bastos: bastos-01

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
data		data
helpers		helpers
results		results
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
Assignment_1.pdf		Assignment_1.pdf
README.md		README.md
conditions.csv.gz		conditions.csv.gz
requirements.txt		requirements.txt
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MDLE First Assignment

Course

How to run

Exercise 1

Exercise 2

Grade

Authors

About

Uh oh!

Contributors 2

Uh oh!

Languages

eduardosantoshf/most-frequent-itemsets

Folders and files

Latest commit

History

Repository files navigation

MDLE First Assignment

Course

How to run

Exercise 1

Exercise 2

Grade

Authors

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages