Skip to content
This repository was archived by the owner on Jul 27, 2023. It is now read-only.

MDLE First Assignment - The objective of this project was to implement the A-Priori algorithm to obtain the most frequent itemsets for a list of conditions for a large set of patients, obtaining then associations between conditions by extracting some rules, and also to implement and apply LSH to identify similar news articles from a dataset.

Notifications You must be signed in to change notification settings

eduardosantoshf/most-frequent-itemsets

Repository files navigation

MDLE First Assignment

The objective of this project was to implement the A-Priori algorithm to obtain the most frequent itemsets for a list of conditions for a large set of patients, obtaining then associations between conditions by extracting rules of the forms (X) -> Y and (X, Y ) -> Z. Another goal was to implement and apply LSH to identify similar news articles from a dataset.

Course

This project was developed under the Mining Large Scale Datasets course of University of Aveiro.

How to run

Exercise 1

For each k (2 or 3), run the following command, inside the /src/ directory:

spark-submit conditions.py <K> ../data/conditions.csv

For a sample run, execute:

spark-submit conditions.py <K> ../data/conditions_truncated.csv

The results can be found inside the /results/ directory.

Exercise 2

Run the following command, inside the /src/ directory:

spark-submit lsh.py ../data/covid_news_truncated.json <R> <B>

Grade

This project's grade was 16,7 out of 20.

Authors

About

MDLE First Assignment - The objective of this project was to implement the A-Priori algorithm to obtain the most frequent itemsets for a list of conditions for a large set of patients, obtaining then associations between conditions by extracting some rules, and also to implement and apply LSH to identify similar news articles from a dataset.

Topics

Resources

Stars

Watchers

Forks

Contributors 2

  •  
  •