Skip to content

piotr-skotnicki/tc-optimizer

Repository files navigation

TC Optimizing Compiler 0.4.2

Latest release

Introduction

TC is an automatic source-to-source optimizing compiler for affine loop nests, generating sequential or parallel tiled code based on the application of the transitive closure of a data dependence graph, and combining the Polyhedral Model and Iteration Space Slicing frameworks. TC utilizes a state-of-the-art polyhedral compilation toolchain, which is:

  • Polyhedral Extraction Tool [3] for extracting polyhedral representations of original loop nests,
  • Integer Set Library [1] for performing dependence analysis, manipulating sets and relations, as well as generating output code,
  • Barvinok library [2] for calculating set cardinality and processing its representation.

In order to optimize a loop nest, one should be surrounded by the #pragma scop and #pragma endscop directives:

int main()
{
  int N;
  int A[N+2][N+2];
#pragma scop
  for (int i = 1; i <= N; ++i) {
    for (int j = 1; j <= N; ++j) {
S1:   A[i][j] = A[i][j+1] + A[i+1][j] + A[i+1][j-1];
    }
  }
#pragma endscop
}

Important

The source file containing the loop nest should be valid C code, and simplified as much as possible. Array accesses must not exceed array bounds. Since version 0.3.0, iterators of a for loop must be declared inside that for loop itself, otherwise they will create a dependency for outer loops.

TC implements a number of tiling transformation algorithms as well as schedulers and code generators (including parallel generators utilizing OpenMP), all available to choose from through command line options. One is encouraged to experiment with various combinations of tiling schemes, schedulers and code generators, as well as tile sizes and transitive closure algorithms.

Note

TC is primarily used for studying algorithms utilizing transitive closure. Despite being able to generate efficient tiled code, some features are still in development.

For the example loop nest, the correction technique can be applied with a tile size of 32x32:

/* TC Optimizing Compiler 0.4.2 */
/* ./tc ../examples/other/correction.scop.c --correction-tiling --lex-scheduling --serial-codegen -b 32 */
#define min(x,y)    ((x) < (y) ? (x) : (y))
#define max(x,y)    ((x) > (y) ? (x) : (y))
#define floord(n,d) (((n)<0) ? -((-(n)+(d)-1)/(d)) : (n)/(d))
#pragma scop
for (int ii0 = 0; ii0 <= floord(N - 1, 32); ii0 += 1) {
  for (int ii1 = 0; ii1 <= (N - 1) / 32; ii1 += 1) {
    for (int i0 = 32 * ii0 + 1; i0 <= min(N, 32 * ii0 + 32); i0 += 1) {
      for (int i1 = max(1, 32 * ii0 + 32 * ii1 - i0 + 2); i1 <= 32 * ii1; i1 += 1) {
        A[i0][i1] = ((A[i0][i1 + 1] + A[i0 + 1][i1]) + A[i0 + 1][i1 - 1]);
      }
      if (32 * ii1 + 32 >= N) {
        for (int i1 = 32 * ii1 + 1; i1 <= N; i1 += 1) {
          A[i0][i1] = ((A[i0][i1 + 1] + A[i0 + 1][i1]) + A[i0 + 1][i1 - 1]);
        }
      } else {
        for (int i1 = 32 * ii1 + 1; i1 <= 32 * ii0 + 32 * ii1 - i0 + 33; i1 += 1) {
          A[i0][i1] = ((A[i0][i1 + 1] + A[i0 + 1][i1]) + A[i0 + 1][i1 - 1]);
        }
      }
    }
  }
}
#pragma endscop

Tip

It is recommended to enable the --debug command-line option, not only to inspect intermediate representations of the sets, but also to monitor the progress of the compilation pipeline.

The codes generated by TC for the studied kernels can be found in the results directory of the compiler’s repository.

See CHANGELOG to follow the latest changes.

Installation

Dependencies:

automake autoconf libtool pkg-config libgmp3-dev libclang-dev
llvm libntl-dev g++ make git clang zlib1g-dev libglpk-dev

Downloading:

git clone https://github.com/piotr-skotnicki/tc-optimizer.git tc
cd tc
git submodule update --init --recursive

Compilation:

./autogen.sh
./configure
make

Manual

Usage:

tc <input.c> <tiling> <scheduling> <codegen> [<closure>] [<options>...]

Tip

Use source scripts/tc-completion.bash to enable bash completions.

Tiling:

--diamond-tiling             Diamond tiling for stencils
--semi-diamond-tiling        Diamond tiling without tile expansion
--strip-tiling               Iterative layer removal from stencils
--rectangular-tiling         Classic hyper-rectangular tiling
--correction-tiling          Correction of lexicographically smaller tiles
--inv-correction-tiling      Correction of lexicographically greater tiles
--scc-correction-tiling      Correction of strongly connected components
--merge-tiling               Merging of strongly connected components
--split-tiling               Splitting based on problematic iterations

Scheduling:

--lex-scheduling               Lexicographic execution order
--isl-scheduling               Integer set library scheduler
--isl-wave-scheduling          Integer set library scheduler with wavefronting
--feautrier-scheduling         Integer set library scheduler (Feautrier scheduling)
--sfs-single-scheduling        Tiling of synchronization-free slices with single sources
--sfs-multiple-scheduling      Tiling of synchronization-free slices with multiple sources
--sfs-tile-scheduling          Tile-wise synchronization-free slices
--free-scheduling              Free scheduling based on R^+
--free-rk-scheduling           Free scheduling based on R^k
--free-finite-scheduling       Exact free scheduling for finite graphs
--dynamic-free-scheduling      Dynamic free scheduling

Code generator:

--serial-codegen       Serial code generator
--omp-for-codegen      OpenMP parallel for generator
--omp-task-codegen     OpenMP parallel task generator
--omp-gpu-codegen      OpenMP offloading to GPU target

Transitive closure:

--isl-map-tc           ISL normalized map transitive closure (default)
--isl-union-map-tc     ISL union map transitive closure
--floyd-warshall-tc    Floyd-Warshall algorithm
--iterative-tc         Iterative algorithm
--omega-map-tc         Omega normalized map transitive closure
--omega-union-map-tc   Omega union map transitive closure
--tarjan-tc            Tarjan algorithm for finite graphs

Options:

-b <value>           Tile size, e.g. -b 256 -b S1:128,128 (default: 32)
--out     | -o       Output file (default: stdout)
--debug   | -d       Verbose mode
--report             Generate tile statistics report (use -R for each parameter)
--inline             Always inline loop bounds expressions
-D <name>=<value>    Define parameter value, e.g. -D M=2000 -D N=2600
-R <name>=<value>    Set parameter value for report generation, e.g. --report -R M=2000 -R N=2600
--cache <value>      Cache line length in bytes (default: 64)
--use-macros         Use macro definitions in place of statements
--yes     | -y       Non-interactive mode
--version | -v       Print compiler info
--help    | -h       Print help

Examples

./tc ../examples/polybench/gesummv.scop.c --rectangular-tiling --sfs-single-scheduling --omp-for-codegen --isl-map-tc -b 32 --debug --inline
./tc ../examples/pluto/heat-1d.scop.c --semi-diamond-tiling --omp-for-codegen --iterative-tc -b 128 --debug --inline --drop-bounds
./tc ../examples/polybench/trisolv.scop.c --merge-tiling --free-scheduling --omp-task-codegen -y -b S1:16 -b S2:16,8 -b S3:16 --debug
./tc ../examples/polybench/mvt.scop.c --rectangular-tiling --lex-scheduling --serial-codegen -b 32 --report -R _PB_N=128

User guide

Getting started

Effective use of the offered capabilities requires expertise in identifying loop nest classes or a series of trial-and-error attempts using the available combinations. The below suggestions should facilitate the process of finding an optimal transformation:

  1. Initially, use tile sizes -b 32 or -b 64 or -b 16. This order of evaluation should indicate if the tile size should increase or decrease.
  2. Start with --correction-tiling; the algorithm corrects the rectangular tiles if necessary, without the overhead of identifying strongly connected components. Experiment with other options only after generating the code for this scheme and finding the results suboptimal.
  3. Check the number of synchronization-free slices found by --sfs-single-scheduling/--sfs-multiple-scheduling/--sfs-tile-scheduling. If the degree of parallelism is not satisfactory, try to find a --free-scheduling for tiles execution. --isl-wave-scheduling is a viable option as well.
  4. Prefer the --iterative-tc method over other algorithms. --isl-map-tc is the secondary option if the iterative approach fails to produce a result within a reasonable time frame.
  5. If any of the transitive closure algorithms returns an inexact result, another algorithm may succeed instead. Note that --omega-map-tc may result in an under-approximation and invalid results.

Troubleshooting

./tc: No such file or directory

Meaning: The compiler was not built or the working directory is invalid.

Solution:

  • Make sure you have followed the Installation guide and the build was successful.
  • Change the working directory to the src folder.
error: no such file or directory

Meaning: The input file does not exist or the path is invalid.

Solution:

  • Check the path to the input file.
Error: No SCoP was found in file

Meaning: The input file exists, but a loop nest was not found.

Solution:

  • Verify if the input file is valid C code.
  • Identify the loop nest and add #pragma scop/#pragma endscop directives around it.
Segmentation fault (core dumped)

Meaning: A fatal error has caused the process to crash.

Solution:

  • Reconfigure TC with the --enable-debug=yes parameter and rebuild the project, or add CFLAGS="-O0 -g" to configuration flags.
  • Attach a debugger or analyze the core file.
Error: Backward relation detected.

Meaning: The inter-tile relation is not lexicographically forward, therefore the resultant tiling scheme does not allow for lexicographic execution of tiles, or the hyper-rectangular tiling is not valid for the dependence pattern.

Solution:

  • Use another scheduling algorithm, i.e., --free-scheduling, --free-rk-scheduling, --isl-wave-scheduling, --isl-scheduling.
  • Choose a tiling algorithm other than --rectangular-tiling.
  • Use a tiling algorithm that ensures lexicographic execution order, i.e., --correction-tiling, --inv-correction-tiling, --split-tiling.
Warning: Inexact R^+. The results can be non-optimal. Restart TC with a different transitive closure method.

Meaning: Computing the transitive closure has resulted in an overapproximation that may degrade tiling quality or limit parallelism.

Solution:

  • Continue the compilation pipeline typing y in the console.
  • or, restart the compilation with another transitive closure algorithm.
Error: Missing required option

Meaning: One of the command-line parameters is missing an associated value.

Solution:

  • Check the proper syntax consulting the ./tc --help output.

Exit codes

Exit code Meaning
0 Compilation successful
1 Computational failure
2 Assertion failure (debug build only)
3 Internal error
4 Invalid parameter or missing parameter value
5 Inexact transitive closure
6 Invalid input file / SCoP extraction failed
7 Inter-tile relation is not lexicographically forward

Contact

In case of questions/problems/bugs, please contact:

Piotr Skotnicki <[email protected]>

West Pomeranian University of Technology
Faculty of Computer Science and Information Technology
ul. Zolnierska 49, 71-210 Szczecin, Poland

References

[1] Verdoolaege S (2010) ISL: an integer set library for the polyhedral model. In: Mathematical software--ICMS 2010, Lecture notes in computer science. vol 6327. Springer, Berlin, pp 299--302

[2] Verdoolaege S, Seghir R, Beyls K et al. Counting integer points in parametric polytopes using barvinok’s rational functions. Algorithmica (2007) 48: 37

[3] Verdoolaege S, Grosser T (2012) Polyhedral extraction tool. In: Proceedings of the 2nd international workshop on polyhedral compilation techniques. Paris, France