IronHollow 0.5: SIMD FFT and convolution reverb
2026-03-18 · Sam Redmond
IronHollow 0.5 is the most significant release since the project started — SIMD-accelerated FFT and a full convolution reverb implementation.
SIMD FFT
The previous FFT was a clean Cooley-Tukey implementation with no platform-specific code. It was correct and readable but left a lot of performance on the table. Version 0.5 adds AVX2 and SSE4.1 kernels dispatched at runtime via CPUID, with a fallback to the scalar path for older hardware.
On a Zen 3 machine, a 1024-point complex FFT goes from 4.2 µs to 0.9 µs. The benchmarks are in the repository; they run with cargo bench.
Convolution reverb
Partitioned overlap-add convolution lets you apply an impulse response of arbitrary length without the latency that block convolution would otherwise require. The implementation follows Gardener's 1995 partitioning scheme, adapted for Rust's ownership model. Partition sizes default to the FFT size used for analysis; they're configurable if you're targeting a specific hardware latency budget.
A small set of test impulse responses is bundled in the repository under test_data/irs/.
Also: Zero-copy audio pipelines in Rust: the Arc<[f32]> pattern