TC is an automatic source-to-source optimizing compiler for affine loop nests, generating sequential or parallel tiled code based on the application of the transitive closure of a data dependence graph, and combining the Polyhedral Model and Iteration Space Slicing frameworks. TC utilizes a state-of-the-art polyhedral compilation toolchain, which is:
- Polyhedral Extraction Tool [3] for extracting polyhedral representations of original loop nests,
- Integer Set Library [1] for performing dependence analysis, manipulating sets and relations, as well as generating output code,
- Barvinok library [2] for calculating set cardinality and processing its representation.
In order to optimize a loop nest, one should be surrounded by the #pragma scop
and #pragma endscop
directives:
int main()
{
int N;
int A[N+2][N+2];
#pragma scop
for (int i = 1; i <= N; ++i) {
for (int j = 1; j <= N; ++j) {
S1: A[i][j] = A[i][j+1] + A[i+1][j] + A[i+1][j-1];
}
}
#pragma endscop
}
Important
The source file containing the loop nest should be valid C code, and simplified as much as possible. Array accesses must not exceed array bounds. Since version 0.3.0, iterators of a for loop must be declared inside that for loop itself, otherwise they will create a dependency for outer loops.
TC implements a number of tiling transformation algorithms as well as schedulers and code generators (including parallel generators utilizing OpenMP), all available to choose from through command line options. One is encouraged to experiment with various combinations of tiling schemes, schedulers and code generators, as well as tile sizes and transitive closure algorithms.
Note
TC is primarily used for studying algorithms utilizing transitive closure. Despite being able to generate efficient tiled code, some features are still in development.
For the example loop nest, the correction technique can be applied with a tile size of 32x32:
/* TC Optimizing Compiler 0.4.2 */
/* ./tc ../examples/other/correction.scop.c --correction-tiling --lex-scheduling --serial-codegen -b 32 */
#define min(x,y) ((x) < (y) ? (x) : (y))
#define max(x,y) ((x) > (y) ? (x) : (y))
#define floord(n,d) (((n)<0) ? -((-(n)+(d)-1)/(d)) : (n)/(d))
#pragma scop
for (int ii0 = 0; ii0 <= floord(N - 1, 32); ii0 += 1) {
for (int ii1 = 0; ii1 <= (N - 1) / 32; ii1 += 1) {
for (int i0 = 32 * ii0 + 1; i0 <= min(N, 32 * ii0 + 32); i0 += 1) {
for (int i1 = max(1, 32 * ii0 + 32 * ii1 - i0 + 2); i1 <= 32 * ii1; i1 += 1) {
A[i0][i1] = ((A[i0][i1 + 1] + A[i0 + 1][i1]) + A[i0 + 1][i1 - 1]);
}
if (32 * ii1 + 32 >= N) {
for (int i1 = 32 * ii1 + 1; i1 <= N; i1 += 1) {
A[i0][i1] = ((A[i0][i1 + 1] + A[i0 + 1][i1]) + A[i0 + 1][i1 - 1]);
}
} else {
for (int i1 = 32 * ii1 + 1; i1 <= 32 * ii0 + 32 * ii1 - i0 + 33; i1 += 1) {
A[i0][i1] = ((A[i0][i1 + 1] + A[i0 + 1][i1]) + A[i0 + 1][i1 - 1]);
}
}
}
}
}
#pragma endscop
Tip
It is recommended to enable the --debug
command-line option, not only to inspect intermediate representations of the sets, but also to monitor the progress of the compilation pipeline.
The codes generated by TC for the studied kernels can be found in the results
directory of the compiler’s repository.
See CHANGELOG
to follow the latest changes.
automake autoconf libtool pkg-config libgmp3-dev libclang-dev
llvm libntl-dev g++ make git clang zlib1g-dev libglpk-dev
git clone https://github.com/piotr-skotnicki/tc-optimizer.git tc
cd tc
git submodule update --init --recursive
./autogen.sh
./configure
make
tc <input.c> <tiling> <scheduling> <codegen> [<closure>] [<options>...]
Tip
Use source scripts/tc-completion.bash
to enable bash completions.
--diamond-tiling Diamond tiling for stencils
--semi-diamond-tiling Diamond tiling without tile expansion
--strip-tiling Iterative layer removal from stencils
--rectangular-tiling Classic hyper-rectangular tiling
--correction-tiling Correction of lexicographically smaller tiles
--inv-correction-tiling Correction of lexicographically greater tiles
--scc-correction-tiling Correction of strongly connected components
--merge-tiling Merging of strongly connected components
--split-tiling Splitting based on problematic iterations
--lex-scheduling Lexicographic execution order
--isl-scheduling Integer set library scheduler
--isl-wave-scheduling Integer set library scheduler with wavefronting
--feautrier-scheduling Integer set library scheduler (Feautrier scheduling)
--sfs-single-scheduling Tiling of synchronization-free slices with single sources
--sfs-multiple-scheduling Tiling of synchronization-free slices with multiple sources
--sfs-tile-scheduling Tile-wise synchronization-free slices
--free-scheduling Free scheduling based on R^+
--free-rk-scheduling Free scheduling based on R^k
--free-finite-scheduling Exact free scheduling for finite graphs
--dynamic-free-scheduling Dynamic free scheduling
--serial-codegen Serial code generator
--omp-for-codegen OpenMP parallel for generator
--omp-task-codegen OpenMP parallel task generator
--omp-gpu-codegen OpenMP offloading to GPU target
--isl-map-tc ISL normalized map transitive closure (default)
--isl-union-map-tc ISL union map transitive closure
--floyd-warshall-tc Floyd-Warshall algorithm
--iterative-tc Iterative algorithm
--omega-map-tc Omega normalized map transitive closure
--omega-union-map-tc Omega union map transitive closure
--tarjan-tc Tarjan algorithm for finite graphs
-b <value> Tile size, e.g. -b 256 -b S1:128,128 (default: 32)
--out | -o Output file (default: stdout)
--debug | -d Verbose mode
--report Generate tile statistics report (use -R for each parameter)
--inline Always inline loop bounds expressions
-D <name>=<value> Define parameter value, e.g. -D M=2000 -D N=2600
-R <name>=<value> Set parameter value for report generation, e.g. --report -R M=2000 -R N=2600
--cache <value> Cache line length in bytes (default: 64)
--use-macros Use macro definitions in place of statements
--yes | -y Non-interactive mode
--version | -v Print compiler info
--help | -h Print help
./tc ../examples/polybench/gesummv.scop.c --rectangular-tiling --sfs-single-scheduling --omp-for-codegen --isl-map-tc -b 32 --debug --inline
./tc ../examples/pluto/heat-1d.scop.c --semi-diamond-tiling --omp-for-codegen --iterative-tc -b 128 --debug --inline --drop-bounds
./tc ../examples/polybench/trisolv.scop.c --merge-tiling --free-scheduling --omp-task-codegen -y -b S1:16 -b S2:16,8 -b S3:16 --debug
./tc ../examples/polybench/mvt.scop.c --rectangular-tiling --lex-scheduling --serial-codegen -b 32 --report -R _PB_N=128
Effective use of the offered capabilities requires expertise in identifying loop nest classes or a series of trial-and-error attempts using the available combinations. The below suggestions should facilitate the process of finding an optimal transformation:
- Initially, use tile sizes
-b 32
or-b 64
or-b 16
. This order of evaluation should indicate if the tile size should increase or decrease. - Start with
--correction-tiling
; the algorithm corrects the rectangular tiles if necessary, without the overhead of identifying strongly connected components. Experiment with other options only after generating the code for this scheme and finding the results suboptimal. - Check the number of synchronization-free slices found by
--sfs-single-scheduling
/--sfs-multiple-scheduling
/--sfs-tile-scheduling
. If the degree of parallelism is not satisfactory, try to find a--free-scheduling
for tiles execution.--isl-wave-scheduling
is a viable option as well. - Prefer the
--iterative-tc
method over other algorithms.--isl-map-tc
is the secondary option if the iterative approach fails to produce a result within a reasonable time frame. - If any of the transitive closure algorithms returns an inexact result, another algorithm may succeed instead. Note that
--omega-map-tc
may result in an under-approximation and invalid results.
./tc: No such file or directory
Meaning: The compiler was not built or the working directory is invalid.
Solution:
- Make sure you have followed the Installation guide and the build was successful.
- Change the working directory to the
src
folder.
error: no such file or directory
Meaning: The input file does not exist or the path is invalid.
Solution:
- Check the path to the input file.
Error: No SCoP was found in file
Meaning: The input file exists, but a loop nest was not found.
Solution:
- Verify if the input file is valid C code.
- Identify the loop nest and add
#pragma scop
/#pragma endscop
directives around it.
Segmentation fault (core dumped)
Meaning: A fatal error has caused the process to crash.
Solution:
- Reconfigure TC with the
--enable-debug=yes
parameter and rebuild the project, or addCFLAGS="-O0 -g"
to configuration flags. - Attach a debugger or analyze the core file.
Error: Backward relation detected.
Meaning: The inter-tile relation is not lexicographically forward, therefore the resultant tiling scheme does not allow for lexicographic execution of tiles, or the hyper-rectangular tiling is not valid for the dependence pattern.
Solution:
- Use another scheduling algorithm, i.e.,
--free-scheduling
,--free-rk-scheduling
,--isl-wave-scheduling
,--isl-scheduling
. - Choose a tiling algorithm other than
--rectangular-tiling
. - Use a tiling algorithm that ensures lexicographic execution order, i.e.,
--correction-tiling
,--inv-correction-tiling
,--split-tiling
.
Warning: Inexact R^+. The results can be non-optimal. Restart TC with a different transitive closure method.
Meaning: Computing the transitive closure has resulted in an overapproximation that may degrade tiling quality or limit parallelism.
Solution:
- Continue the compilation pipeline typing
y
in the console. - or, restart the compilation with another transitive closure algorithm.
Error: Missing required option
Meaning: One of the command-line parameters is missing an associated value.
Solution:
- Check the proper syntax consulting the
./tc --help
output.
Exit code | Meaning |
---|---|
0 | Compilation successful |
1 | Computational failure |
2 | Assertion failure (debug build only) |
3 | Internal error |
4 | Invalid parameter or missing parameter value |
5 | Inexact transitive closure |
6 | Invalid input file / SCoP extraction failed |
7 | Inter-tile relation is not lexicographically forward |
In case of questions/problems/bugs, please contact:
Piotr Skotnicki <[email protected]>
West Pomeranian University of Technology
Faculty of Computer Science and Information Technology
ul. Zolnierska 49, 71-210 Szczecin, Poland
[1] Verdoolaege S (2010) ISL: an integer set library for the polyhedral model. In: Mathematical software--ICMS 2010, Lecture notes in computer science. vol 6327. Springer, Berlin, pp 299--302
[2] Verdoolaege S, Seghir R, Beyls K et al. Counting integer points in parametric polytopes using barvinok’s rational functions. Algorithmica (2007) 48: 37
[3] Verdoolaege S, Grosser T (2012) Polyhedral extraction tool. In: Proceedings of the 2nd international workshop on polyhedral compilation techniques. Paris, France