high-performance-computing

This repository contains 3 different solutions to parallelising a provided simple solver for the Navier-Stokes equations - used to treat laminar flows of viscous, incompressible fluids

The three technologies used are OpenMP, CUDA & MPI.

Initial Investigation

Initially used gprof to identify the most time consuming section of the code. This was identified as the poisson function - accounting for approximately 96.53% of the program's runtime.

Validation

A python script validator.py was used to validate the outputs of each port by comparing the VTK files to the original. The script categorized each value in the files as exact, close (±0.02), or wrong.

Results

Port	Exact	Cosine Similarity	Valid?
OpenMP	267,302	100	✅
Cuda	267,302	100	✅
MPI	267,302	100	✅

Example output of validation script:

Comparing implementation (original.vtk) to parallel implementation (openmp.vtk):

WRONG: 0/267302 – 0.0000%
CLOSE: 0/267302 – 0.0000%
EXACT: 267302/267302 – 100.0000%

Note: Close values are determined using a tolerance value of 0.02. Percentages are calculated to 4 decimal places.

Cosine Similarity: 100.0
PASS: Both files are an exact match – successful parallel implementation.

Ports

OpenMP Approach

Various locations were found to include the following pragma examples to share loop iterations between threads:

#pragma omp parallel for collapse(2) reduction(+:p0)
#pragma omp parallel for collapse(2) reduction(+:res)
#pragma omp parallel for collapse(2) reduction(max:umax)
#pragma omp parallel for collapse(2) reduction(max:vmax)

CUDA Approach

MPI Approach

Benchmarks / Speedup

To ensure consistent conditions, all ports were evaluated on Viking, the University of Yorks super computer. The main evaluations measured the total time for the main loop to complete, using a code timer, across 20 problem sizes. Each size was tested multiple times and averaged to reduce outliers. CUDA experiments were conducted with and without checkpoints to assess overhead. An OpenMP experiment was also run to evaluate the effect of thread count. All profiling was performed on Viking.

Original Analysis

OpenMP Analysis

CUDA Analysis

MPI Analysis

Unfortunately while the MPI approach kept the validity of the solution, I was not able to successfully complete the approach mentioned previously, hence the lack of a significant speedup.

Comparative Analysis

Port	Average Time	Speedup
Original	135.26	-
OpenMP	13.96	x9.7
Cuda	22.84	x5.95
MPI	131.81	x1.02

All code submitted as part of a masters module at the University of York - High-Performance Parallel and Distributed Systems.

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
cuda		cuda
mpi		mpi
openmp		openmp
original		original
validation		validation
.gitattributes		.gitattributes
.gitignore		.gitignore
Justfile		Justfile
README.md		README.md
viking_run.sh.j2		viking_run.sh.j2
visualisations.ipynb		visualisations.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

high-performance-computing

Initial Investigation

Validation

Results

Ports

OpenMP Approach

CUDA Approach

MPI Approach

Benchmarks / Speedup

Original Analysis

OpenMP Analysis

CUDA Analysis

MPI Analysis

Comparative Analysis

About

Uh oh!

Releases

Packages

Languages

LucyIvatt/high-performance-computing

Folders and files

Latest commit

History

Repository files navigation

high-performance-computing

Initial Investigation

Validation

Results

Ports

OpenMP Approach

CUDA Approach

MPI Approach

Benchmarks / Speedup

Original Analysis

OpenMP Analysis

CUDA Analysis

MPI Analysis

Comparative Analysis

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages