USE fastTranspo integer :: N, numUnit, verbosity real*8 :: a(N,N), b(N,N) ! ... call initFastTranspose(N, numUnit, verbosity) ! ... call fastTranspose(b, a, N)This calculates b = transpose(a). The subroutine initFastTranspose needs to be called prior to fastTranspose as initialisation (once for the whole program is sufficient); the greater verbosity is, the more output will be echoed to the unit numUnit (6 for screen). Subroutine initFastTranspose checks what block transpose is the fastest for your system and matrix size (this could take some time) and saves its results in a file (.offtWisdom). Transpose verification is also done. Every fastTranspose subroutine called in the rest of the program will use these optimized values. If you run the program a second time, no new long optimization will be done, as the values will be directly read from the .offtWisdom file. And the more you use OFFT, the smarter it will be, as more and more sizes were added. Delete .offtWisdom if you want to erase wisdom data.
OFFT is distributed under the GNU General Public License, you can use it, modify it and redistribute it under the conditions as mentioned in the license.