Next: Advanced distributed-transpose interface, Previous: FFTW MPI Transposes, Up: FFTW MPI Transposes [Contents][Index]
In particular, suppose that we have an n0 by n1 array in
row-major order, block-distributed across the n0 dimension. To
transpose this into an n1 by n0 array block-distributed
across the n1 dimension, we would create a plan by calling the
following function:
fftw_plan fftw_mpi_plan_transpose(ptrdiff_t n0, ptrdiff_t n1,
double *in, double *out,
MPI_Comm comm, unsigned flags);
The input and output arrays (in and out) can be the
same. The transpose is actually executed by calling
fftw_execute on the plan, as usual.
The flags are the usual FFTW planner flags, but support
two additional flags: FFTW_MPI_TRANSPOSED_OUT and/or
FFTW_MPI_TRANSPOSED_IN. What these flags indicate, for
transpose plans, is that the output and/or input, respectively, are
locally transposed. That is, on each process input data is
normally stored as a local_n0 by n1 array in row-major
order, but for an FFTW_MPI_TRANSPOSED_IN plan the input data is
stored as n1 by local_n0 in row-major order. Similarly,
FFTW_MPI_TRANSPOSED_OUT means that the output is n0 by
local_n1 instead of local_n1 by n0.
To determine the local size of the array on each process before and
after the transpose, as well as the amount of storage that must be
allocated, one should call fftw_mpi_local_size_2d_transposed,
just as for a 2d DFT as described in the previous section:
ptrdiff_t fftw_mpi_local_size_2d_transposed
(ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm,
ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
Again, the return value is the local storage to allocate, which in
this case is the number of real (double) values rather
than complex numbers as in the previous examples.
Next: Advanced distributed-transpose interface, Previous: FFTW MPI Transposes, Up: FFTW MPI Transposes [Contents][Index]