1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396
|
FFTW 3.2.2
* Improve performance of some copy operations of complex arrays on
x86 machines.
* Add configure flag to disable alloca(), which is broken in mingw64.
* Planning in FFTW_ESTIMATE mode for r2r transforms became slower
between fftw-3.1.3 and 3.2. This regression has now been fixed.
FFTW 3.2.1
* Performance improvements for some multidimensional r2c/c2r transforms;
thanks to Eugene Miloslavsky for his benchmark reports.
* Compile with icc on MacOS X, use better icc compiler flags.
* Compilation fixes for systems where snprintf is defined as a macro;
thanks to Marcus Mae for the bug report.
* Fortran documentation now recommends not using dfftw_execute,
because of reports of problems with various Fortran compilers;
it is better to use dfftw_execute_dft etcetera.
* Some documentation clarifications, e.g. of fact that --enable-openmp
and --enable-threads are mutually exclusive (thanks to Long To),
and document slightly odd behavior of plan_guru_r2r in Fortran
(thanks to Alexander Pozdneev).
* FAQ was accidentally omitted from 3.2 tarball.
* Remove some extraneous (harmless) files accidentally included in
a subdirectory of the 3.2 tarball.
FFTW 3.2
* Worked around apparent glibc bug that leads to rare hangs when freeing
semaphores.
* Fixed segfault due to unaligned access in certain obscure problems
that use SSE and multiple threads.
* MPI transforms not included, as they are still in alpha; the alpha
versions of the MPI transforms have been moved to FFTW 3.3alpha1.
FFTW 3.2alpha3
* Performance improvements for sizes with factors of 5 and 10.
* Documented FFTW_WISDOM_ONLY flag, at the suggestion of Mario
Emmenlauer and Phil Dumont.
* Port Cell code to SDK2.1 (libspe2), as opposed to the old libspe1 code.
* Performance improvements in Cell code for N < 32k, thanks to Jan Wagner
for the suggestions.
* Cycle counter for Sun x86_64 compiler, and compilation fix in cycle
counter for AIX/xlc (thanks to Jeff Haferman for the bug report).
* Fixed incorrect type prefix in MPI code that prevented wisdom routines
from working in single precision (thanks to Eric A. Borisch for the report).
* Added 'make check' for MPI code (which still fails in a couple corner
cases, but should be much better than in alpha2).
* Many other small fixes.
FFTW 3.2alpha2
* Support for the Cell processor, donated by IBM Research; see README.Cell
and the Cell section of the manual.
* New 64-bit API: for every "plan_guru" function there is a new "plan_guru64"
function with the same semantics, but which takes fftw_iodim64 instead of
fftw_iodim. fftw_iodim64 is the same as fftw_iodim, except that it takes
ptrdiff_t integer types as parameters, which is a 64-bit type on
64-bit machines. This is only useful for specifying very large transforms
on 64-bit machines. (Internally, FFTW uses ptrdiff_t everywhere
regardless of what API you choose.)
* Experimental MPI support. Complex one- and multi-dimensional FFTs,
multi-dimensional r2r, multi-dimensional r2c/c2r transforms, and
distributed transpose operations, with 1d block distributions.
(This is an alpha preview: routines have not been exhaustively
tested, documentation is incomplete, and some functionality is
missing, e.g. Fortran support.) See mpi/README and also the MPI
section of the manual.
* Significantly faster r2c/c2r transforms, especially on machines with SIMD.
* Rewritten multi-threaded support for better performance by
re-using a fixed pool of threads rather than continually
respawning and joining (which nowadays is much slower).
* Support for MIPS paired-single SIMD instructions, donated by
Codesourcery.
* FFTW_WISDOM_ONLY planner flag, to create plan only if wisdom is
available and return NULL otherwise.
* Removed k7 support, which only worked in 32-bit mode and is
becoming obsolete. Use --enable-sse instead.
* Added --with-g77-wrappers configure option to force inclusion
of g77 wrappers, in addition to whatever is needed for the
detected Fortran compilers. This is mainly intended for GNU/Linux
distros switching to gfortran that wish to include both
gfortran and g77 support in FFTW.
* In manual, renamed "guru execute" functions to "new-array execute"
functions, to reduce confusion with the guru planner interface.
(The programming interface is unchanged.)
* Add missing __declspec attribute to threads API functions when compiling
for Windows; thanks to Robert O. Morris for the bug report.
* Fixed missing return value from dfftw_init_threads in Fortran;
thanks to Markus Wetzstein for the bug report.
FFTW 3.1.1
* Performance improvements for Intel EMT64.
* Performance improvements for large-size transforms with SIMD.
* Cycle counter support for Intel icc and Visual C++ on x86-64.
* In fftw-wisdom tool, replaced obsolete --impatient with --measure.
* Fixed compilation failure with AIX/xlc; thanks to Joseph Thomas.
* Windows DLL support for Fortran API (added missing __declspec(dllexport)).
* SSE/SSE2 code works properly (i.e. disables itself) on older 386 and 486
CPUs lacking a CPUID instruction; thanks to Eric Korpela.
FFTW 3.1
* Faster FFTW_ESTIMATE planner.
* New (faster) algorithm for REDFT00/RODFT00 (type-I DCT/DST) of odd size.
* "4-step" algorithm for faster FFTs of very large sizes (> 2^18).
* Faster in-place real-data DFTs (for R2HC and HC2R r2r formats).
* Faster in-place non-square transpositions (FFTW uses these internally
for in-place FFTs, and you can also perform them explicitly using
the guru interface).
* Faster prime-size DFTs: implemented Bluestein's algorithm, as well
as a zero-padded Rader variant to limit recursive use of Rader's algorithm.
* SIMD support for split complex arrays.
* Much faster Altivec/VMX performance.
* New fftw_set_timelimit function to specify a (rough) upper bound to the
planning time (does not affect ESTIMATE mode).
* Removed --enable-3dnow support; use --enable-k7 instead.
* FMA (fused multiply-add) version is now included in "standard" FFTW,
and is enabled with --enable-fma (the default on PowerPC and Itanium).
* Automatic detection of native architecture flag for gcc. New
configure options: --enable-portable-binary and --with-gcc-arch=<arch>,
for people distributing compiled binaries of FFTW (see manual).
* Automatic detection of Altivec under Linux with gcc 3.4 (so that
same binary should work on both Altivec and non-Altivec PowerPCs).
* Compiler-specific tweaks/flags/workarounds for gcc 3.4, xlc, HP/UX,
Solaris/Intel.
* Various documentation clarifications.
* 64-bit clean. (Fixes a bug affecting the split guru planner on
64-bit machines, reported by David Necas.)
* Fixed Debian bug #259612: inadvertent use of SSE instructions on
non-SSE machines (causing a crash) for --enable-sse binaries.
* Fixed bug that caused HC2R transforms to destroy the input in
certain cases, even if the user specified FFTW_PRESERVE_INPUT.
* Fixed bug where wisdom would be lost under rare circumstances,
causing excessive planning time.
* FAQ notes bug in gcc-3.4.[1-3] that causes FFTW to crash with SSE/SSE2.
* Fixed accidentally exported symbol that prohibited simultaneous
linking to double/single multithreaded FFTW (thanks to Alessio Massaro).
* Support Win32 threads under MinGW (thanks to Alessio Massaro).
* Fixed problem with building DLL under Cygwin; thanks to Stephane Fillod.
* Fix build failure if no Fortran compiler is found (thanks to Charles
Radley for the bug report).
* Fixed compilation failure with icc 8.0 and SSE/SSE2. Automatic
detection of icc architecture flag (e.g. -xW).
* Fixed compilation with OpenMP on AIX (thanks to Greg Bauer).
* Fixed compilation failure on x86-64 with gcc (thanks to Orion Poplawski).
* Incorporated patch from FreeBSD ports (FreeBSD does not have memalign,
but its malloc is 16-byte aligned).
* Cycle-counter compilation fixes for Itanium, Alpha, x86-64, Sparc,
MacOS (thanks to Matt Boman, John Bowman, and James A. Treacy for
reports/fixes). Added x86-64 cycle counter for PGI compilers,
courtesy Cristiano Calonaci.
* Fix compilation problem in test program due to C99 conflict.
* Portability fix for import_system_wisdom with djgpp (thanks to Juan
Manuel Guerrero).
* Fixed compilation failure on MacOS 10.3 due to getopt conflict.
* Work around Visual C++ (version 6/7) bug in SSE compilation;
thanks to Eddie Yee for his detailed report.
Changes from FFTW 3.1 beta 2:
* Several minor compilation fixes.
* Eliminate FFTW_TIMELIMIT flag and replace fftw_timelimit global with
fftw_set_timelimit function. Make wisdom work with time-limited plans.
Changes from FFTW 3.1 beta 1:
* Fixes for creating DLLs under Windows; thanks to John Pavel for his feedback.
* Fixed more 64-bit problems, thanks to John Pavel for the bug report.
* Further speed improvements for Altivec/VMX.
* Further speed improvements for non-square transpositions.
* Many minor tweaks.
FFTW 3.0.1
* Some speed improvements in SIMD code.
* --without-cycle-counter option is removed. If no cycle counter is found,
then the estimator is always used. A --with-slow-timer option is provided
to force the use of lower-resolution timers.
* Several fixes for compilation under Visual C++, with help from Stefane Ruel.
* Added x86 cycle counter for Visual C++, with help from Morten Nissov.
* Added S390 cycle counter, courtesy of James Treacy.
* Added missing static keyword that prevented simultaneous linkage
of different-precision versions; thanks to Rasmus Larsen for the bug report.
* Corrected accidental omission of f77_wisdom.f file; thanks to Alan Watson.
* Support -xopenmp flag for SunOS; thanks to John Lou for the bug report.
* Compilation with HP/UX cc requires -Wp,-H128000 flag to increase
preprocessor limits; thanks to Peter Vouras for the bug report.
* Removed non-portable use of 'tempfile' in fftw-wisdom-to-conf script;
thanks to Nicolas Decoster for the patch.
* Added 'make smallcheck' target in tests/ directory, at the request of
James Treacy.
FFTW 3.0
Major goals of this release:
* Speed: often 20% or more faster than FFTW 2.x, even without SIMD (see below).
* Complete rewrite, to make it easier to add new algorithms and transforms.
* New API, to support more general semantics.
Other enhancements:
* SIMD acceleration on supporting CPUs (SSE, SSE2, 3DNow!, and AltiVec).
(With special thanks to Franz Franchetti for many experimental prototypes
and to Stefan Kral for the vectorizing generator from fftwgel.)
* True in-place 1d transforms of large sizes (as well as compressed
twiddle tables for additional memory/cache savings).
* More arbitrary placement of real & imaginary data, e.g. including
interleaved (as in FFTW 2.x) as well as separate real/imag arrays.
* Efficient prime-size transforms of real data.
* Multidimensional transforms can operate on a subset of a larger matrix,
and/or transform selected dimensions of a multidimensional array.
* By popular demand, simultaneous linking to double precision (fftw),
single precision (fftwf), and long-double precision (fftwl) versions
of FFTW is now supported.
* Cycle counters (on all modern CPUs) are exploited to speed planning.
* Efficient transforms of real even/odd arrays, a.k.a. discrete
cosine/sine transforms (types I-IV). (Currently work via pre/post
processing of real transforms, ala FFTPACK, so are not optimal.)
* DHTs (Discrete Hartley Transforms), again via post-processing
of real transforms (and thus suboptimal, for now).
* Support for linking to just those parts of FFTW that you need,
greatly reducing the size of statically linked programs when
only a limited set of transform sizes/types are required.
* Canonical global wisdom file (/etc/fftw/wisdom) on Unix, along
with a command-line tool (fftw-wisdom) to generate/update it.
* Fortran API can be used with both g77 and non-g77 compilers
simultaneously.
* Multi-threaded version has optional OpenMP support.
* Authors' good looks have greatly improved with age.
Changes from 3.0beta3:
* Separate FMA distribution to better exploit fused multiply-add instructions
on PowerPC (and possibly other) architectures.
* Performance improvements via some inlining tweaks.
* fftw_flops now returns double arguments, not int, to avoid overflows
for large sizes.
* Workarounds for automake bugs.
Changes from 3.0beta2:
* The standard REDFT00/RODFT00 (DCT-I/DST-I) algorithm (used in
FFTPACK, NR, etcetera) turns out to have poor numerical accuracy, so
we replaced it with a slower routine that is more accurate.
* The guru planner and execute functions now have two variants, one that
takes complex arguments and one that takes separate real/imag pointers.
* Execute and planner routines now automatically align the stack on x86,
in case the calling program is misaligned.
* README file for test program.
* Fixed bugs in the combination of SIMD with multi-threaded transforms.
* Eliminated internal fftw_threads_init function, which some people were
calling accidentally instead of the fftw_init_threads API function.
* Check for -openmp flag (Intel C compiler) when --enable-openmp is used.
* Support AMD x86-64 SIMD and cycle counter.
* Support SSE2 intrinsics in forthcoming gcc 3.3.
Changes from 3.0beta1:
* Faster in-place 1d transforms of non-power-of-two sizes.
* SIMD improvements for in-place, multi-dimensional, and/or non-FFTW_PATIENT
transforms.
* Added support for hard-coded DCT/DST/DHT codelets of small sizes; the
default distribution only includes hard-coded size-8 DCT-II/III, however.
* Many minor improvements to the manual. Added section on using the
codelet generator to customize and enhance FFTW.
* The default 'make check' should now only take a few minutes; for more
strenuous tests (which may take a day or so), do 'cd tests; make bigcheck'.
* fftw_print_plan is split into fftw_fprint_plan and fftw_print_plan, where
the latter uses stdout.
* Fixed ability to compile with a C++ compiler.
* Fixed support for C99 complex type under glibc.
* Fixed problems with alloca under MinGW, AIX.
* Workaround for gcc/SPARC bug.
* Fixed multi-threaded initialization failure on IRIX due to lack of
user-accessible PTHREAD_SCOPE_SYSTEM there.
|