How can load imbalance be addressed with an OpenMP?
How c an a n O penMP i mprove t he s caling o f a n a ll-MPI
Many applications cannot always be written to access the innermost dimension in the inner DO loop. For example, consider an
application that performs FFTs across the vertical lines of a grid
and then performs the transform across the horizontal lines of
a g rid. I n t he FFT example, code de velopers may u se a t ranspose to reorder the data for better performance when doing the
FFT. Th e extra work introduced by the transform is only worth
the eff ort if the FFT is large enough to b enefi t from the ability
to work on local contiguous data. Using a l ibrary FFT call that
allows strides in the data, measure the performance of the two
approaches (a) not using a t ranspose and (b) using a t ranspose
and determine the size of the FFT where (b) beats (a).