Research
Prompt
I have a Raspberry Pi Pico 2 that has a dual core RP2350 system on a chip. Each core is a Cortex-M33 with floating point and DSP support.
I want to run sound signal processing on these devices and run algorithms like the FFT that leverage the new floating point and the new DSP support.
Can you fine any documentation and sample FFT code that has been finely tuned for the Pico 2?
I am already aware of Peter Hinch's implementation of FFT in ARM assembly from his GitHub repo. However, his code is over 10 years old and clearly does not use any of the new features on the Pico 2.
Note that Peter's FFT in ARM assembly is here: https://github.com/peterhinch/micropython-fourier/blob/master/dft.py Note that it has not changed since 2015.
I am looking for authoritative information and working code that will take advantage of every feature of the Pico 2 including using dual-cores to read in sound and write FFT spectrums to a display. I would prefer code that also has a test suite and benchmarking tools on synthetic data.
Great, I’ll look for documentation and optimized FFT code for the Raspberry Pi Pico 2 (RP2350) that is compatible with MicroPython and designed to leverage its dual-core Cortex-M33 architecture, FPU, and DSP support. I’ll also search for examples that include performance benchmarks or testing tools, ideally with MicroPython calling into assembly or C. I’ll get back to you shortly with what I find.
Raspberry Pi Pico 2 FFT: Documentation & Optimized Implementations
Hardware DSP Capabilities of RP2350 (Pico 2)
The Raspberry Pi Pico 2 is built on the RP2350 microcontroller, which features dual Arm Cortex-M33 cores running up to 150 MHz. In contrast to the original Pico’s M0+ cores, the M33 cores include a single-precision FPU (Floating Point Unit) and Armv8-M DSP instructions (Introducing the RP2350 | Stephen Smith's Blog). This means the Pico 2 can perform floating-point math in hardware and use SIMD-like DSP operations (e.g. single-cycle multiply-accumulate) for faster signal processing. Official datasheets and ARM’s documentation note these enhancements, which significantly improve algorithms like FFT on this platform (Introducing the RP2350 | Stephen Smith's Blog). (The RP2350 also has a dual-core RISC-V cluster, but only one of the two architectures can run at a time.)
Authoritative references
- RP2350 Datasheet / Tech Docs: The Raspberry Pi Foundation provides documentation for the RP2350, detailing its Cortex-M33 cores, FPU support, and DSP instruction set. These instructions accelerate common DSP operations (fixed-point arithmetic, saturating Q15/Q31 math, etc.), benefiting FFT and filtering routines.
- ARM Cortex-M33 Docs: ARM’s official manuals describe the M33’s instruction set and extensions. Notably, the M33’s DSP extension adds optimized arithmetic instructions, and the single-precision FPU executes IEEE754 float operations in hardware (Introducing the RP2350 | Stephen Smith's Blog). Together, these enable much faster FFT computation than on the original Pico (which lacked an FPU or DSP extensions).
Optimized FFT Libraries for Cortex-M33 (RP2350)
ARM CMSIS-DSP Library: The CMSIS-DSP library is an ARM-developed collection of highly optimized DSP functions for Cortex-M cores (GitHub - jptrainor/cmsis-sandbox: Benchmarking CMSIS-DSP on Rasberry Pi Pico). It includes FFT implementations for various data types (32-bit float, 64-bit float, Q15, Q31 fixed-point) that automatically leverage the hardware FPU and DSP instructions when available (GitHub - jptrainor/cmsis-sandbox: Benchmarking CMSIS-DSP on Rasberry Pi Pico) (GitHub - jptrainor/cmsis-sandbox: Benchmarking CMSIS-DSP on Rasberry Pi Pico). For example, CMSIS-DSP provides functions like arm_rfft_fast_f32
(for real FFT on floats) and arm_cfft_q15
(complex FFT on Q15 fixed-point) among many others (GitHub - jptrainor/cmsis-sandbox: Benchmarking CMSIS-DSP on Rasberry Pi Pico). On the Pico 2’s M33, these routines run very efficiently – using the FPU for floating-point FFTs or SIMD fixed-point arithmetic for Q15/Q31 FFTs. ARM’s documentation explains that the “Fast” real-FFT algorithms exploit symmetry to halve the computation versus naive complex FFTs (CMSIS-DSP: Real FFT Functions) (CMSIS-DSP: Real FFT Functions).
Key resources:
- CMSIS-DSP Documentation – The user manual (on ARM’s GitHub pages) details each FFT function and usage (CMSIS-DSP: Real FFT Functions). It covers initialization (e.g. arm_rfft_fast_init_f32
), input/output data formats, and optimized algorithms for real data (CMSIS-DSP: Real FFT Functions) (CMSIS-DSP: Real FFT Functions). This is an authoritative reference for how to use these functions and what hardware features they exploit.
- CMSIS-DSP Example Code – There are many examples of using CMSIS-DSP FFT on microcontrollers (for instance, STM32 examples and application notes (STM32 Fast Fourier Transform (CMSIS DSP FFT) - Phil's Lab #111)). These can be adapted to Pico 2. A community project by jptrainor benchmarked CMSIS-DSP on the original Pico, demonstrating how to integrate the library (in C/C++) and measure performance (GitHub - jptrainor/cmsis-sandbox: Benchmarking CMSIS-DSP on Rasberry Pi Pico) (GitHub - jptrainor/cmsis-sandbox: Benchmarking CMSIS-DSP on Rasberry Pi Pico). (On an M0+ Pico, float FFTs had to rely on software, but on the M33 Pico 2 the same code will use the hardware FPU for a big speedup.)
KISS FFT (Pico FFT Library): Another option is the lightweight KISS FFT (“Keep It Simple FFT”) library, which is written in portable C. A community member created a Pico-oriented wrapper called pico_fft based on KISS FFT (GitHub - Googool/pico_fft: A lightweight and efficient FFT (Fast Fourier Transform) library for the Raspberry Pi Pico, based on the KISS FFT library.). It simplifies capturing ADC data and computing an FFT on the Pico. While KISS FFT isn’t as optimized as CMSIS-DSP, it’s easier to understand and has a small footprint. The pico_fft project includes documentation and examples of connecting an analog microphone to the Pico and performing an FFT on the samples (GitHub - Googool/pico_fft: A lightweight and efficient FFT (Fast Fourier Transform) library for the Raspberry Pi Pico, based on the KISS FFT library.). All examples (in C using the Pico SDK) have been tested, and they even provide a quick-start guide for hardware setup and code building (GitHub - Googool/pico_fft: A lightweight and efficient FFT (Fast Fourier Transform) library for the Raspberry Pi Pico, based on the KISS FFT library.) (GitHub - Googool/pico_fft: A lightweight and efficient FFT (Fast Fourier Transform) library for the Raspberry Pi Pico, based on the KISS FFT library.). This could be useful if you prefer a self-contained library. Keep in mind that on the RP2350, a KISS FFT (written in C) will still benefit from the M33’s speed and can use the FPU for float math (via the compiler) – but it won’t explicitly use SIMD DSP instructions unless manually optimized.
Performance notes: On the Pico 2, a well-optimized FFT is quite fast. For instance, Peter Hinch’s assembly FFT (discussed below) can compute a 1024-point real FFT in about 7 ms on a Pico 2 (GitHub - peterhinch/micropython-fourier: Fast Fourier transform in MicroPython's inline ARM assembler.). In comparison, the same on a classic Pico M0+ might take tens of milliseconds or more since it lacks the FPU (python - Raspberry Pi Pico(RP2040 or RP2350) ASM PIO microPython for FFT, DSP - Signal Processing Stack Exchange) (python - Raspberry Pi Pico(RP2040 or RP2350) ASM PIO microPython for FFT, DSP - Signal Processing Stack Exchange). CMSIS-DSP’s routines should achieve similar performance to the assembly approach, as they are built to leverage the same hardware features. The CMSIS library also supports smaller FFT sizes efficiently (e.g. 256-point, 512-point), often using mixed-radix algorithms for speed. ARM’s benchmarks (in CMSIS docs) show orders of magnitude speedup vs. naive FFT implementations thanks to these optimizations (GitHub - jptrainor/cmsis-sandbox: Benchmarking CMSIS-DSP on Rasberry Pi Pico).
Dual-Core Audio Processing Examples
One advantage of the RP2350 (Pico 2) is its dual-core CPU, which you can use to parallelize tasks – ideal for real-time audio capture and FFT display. A great example is Iwatake’s dual-core spectrum analyzer project. It runs on the original Pico (RP2040) but demonstrates the concept well: Core0 handles data acquisition from a microphone via ADC (using DMA) and driving an SPI display, while Core1 performs the FFT calculations on the incoming data stream (Iwatake Turns the Raspberry Pi Pico Into a Dual-Core, FFT-Calculating Live Spectrum Analyzer - Hackster.io). This way, the heavy math on Core1 doesn’t stall sampling or screen updates on Core0. The project was written in C++ using the official Pico SDK, and Iwatake has released the full source code on GitHub (Iwatake Turns the Raspberry Pi Pico Into a Dual-Core, FFT-Calculating Live Spectrum Analyzer - Hackster.io) (Iwatake Turns the Raspberry Pi Pico Into a Dual-Core, FFT-Calculating Live Spectrum Analyzer - Hackster.io). (The code uses ADC + DMA to fill buffers, triggers an IRQ or flag when a buffer is ready, then Core1 processes the buffer with an FFT and sends the result to Core0 for display.) This design achieved a functional real-time spectrum display, although the author noted a occasional bug causing freezes (likely a synchronization issue) (Iwatake Turns the Raspberry Pi Pico Into a Dual-Core, FFT-Calculating Live Spectrum Analyzer - Hackster.io). It’s a useful reference for structuring dual-core tasks and using the Pico’s multicore FIFO or interrupts to communicate between cores.
Another community example is a project by Van Hunter Adams at Cornell: Realtime Audio FFT to VGA on RP2040. While it ran on a single-core (RP2040) with heavy use of DMA and PIO, it illustrates high-speed ADC capture (10 kHz) with DMA, and performing a 1024-point fixed-point FFT on the fly (FFT) (FFT). On the Pico 2, one could adapt this idea and split the work across cores for even better throughput (e.g., Core0 managing continuous DMA ADC sampling, Core1 doing the FFT and sending results to a VGA or LCD driver). The code and write-up for that project are available on Adams’s site (FFT) (FFT) and could serve as a starting point for handling real-time data streams.
For a more hobbyist-friendly approach, there’s an Arduino-based spectrum analyzer for RP2040 by Bodmer that uses one core but cleverly combines DMA and an SPI TFT update. It samples ADC at ~14 kHz, computes an FFT (64–512 points), and draws the spectrum on a 320x240 TFT. The code uses the Earle Philhower Arduino core (which has built-in support for the RP2040) and likely leverages the CMSIS-DSP library under the hood (the Arduino mbed_rp2040
core includes CMSIS-DSP) or a simple FFT routine. It achieves about 54 FPS for a 256-point FFT visualization (and up to 81 FPS for 64-point) by using DMA for sampling and fast SPI for drawing (GitHub - Bodmer/ADC_DMA_FFT: RP2040: sample ADC, run FFT, display on TFT) (GitHub - Bodmer/ADC_DMA_FFT: RP2040: sample ADC, run FFT, display on TFT). The source is available on GitHub as “ADC_DMA_FFT” with documentation in the README (GitHub - Bodmer/ADC_DMA_FFT: RP2040: sample ADC, run FFT, display on TFT) (GitHub - Bodmer/ADC_DMA_FFT: RP2040: sample ADC, run FFT, display on TFT). While this was tested on RP2040, the same code on an RP2350 could be tweaked to use both cores or simply enjoy the extra performance headroom (e.g. to increase FFT size or sample rate). It’s a good, well-documented reference for implementing an end-to-end audio FFT pipeline on the Pico family.
Key takeaways for dual-core design: Use DMA to offload data movement (e.g. ADC to memory), use one core exclusively for I/O (sensor reads, drawing to display) and the other for computation, and use thread-safe queues or interrupts to hand off data between cores. The Pico SDK provides multicore_fifo_push
/pop
functions and even higher-level primitives to coordinate the two cores. In FreeRTOS or other RTOS environments, you could assign tasks to cores and use mutexes/queues. The examples above show that a dual-core Pico can comfortably handle real-time audio FFT at audio-rate sampling (8–44 kHz) and update a display, especially when each core’s workload is optimized.
MicroPython and Assembly Integration
MicroPython on the Pico 2 lets you write high-level code, but Python by itself is too slow for real-time FFT on microcontrollers. The good news is MicroPython supports calling assembly-optimized routines for performance-critical parts. There are a few ways to do this:
-
Inline ARM Thumb Assembly: MicroPython has an inline assembler for ARM Cortex-M. By decorating a function with
@micropython.asm_thumb
, you can write pure assembly instructions that MicroPython will assemble and execute natively (Adding Assembly Language to MicroPython | Stephen Smith's Blog) (Adding Assembly Language to MicroPython | Stephen Smith's Blog). You can pass a few arguments in registers (r0–r3) and use any ARMv7E-M Thumb-2 instructions – including those for floating point on M33 (MicroPython’s assembler includes opcodes likevadd.f32
, etc., which will use the FPU) (Adding Assembly Language to MicroPython | Stephen Smith's Blog). This is ideal for writing a custom FFT inner loop or other DSP routines. The official docs and tutorials (e.g. Damien George’s hints, or Stephen Smith’s blog (Adding Assembly Language to MicroPython | Stephen Smith's Blog)) cover how to structure these functions. Keep in mind you must manage registers and follow the MicroPython calling convention, but you can mix Python and assembly seamlessly (call the asm function from Python with normal arguments). -
FFI / Native Modules: Another approach is to compile C or assembly code as a native module and import it in MicroPython. MicroPython’s build system allows adding C extensions that become importable as Python modules. For example, one could compile the CMSIS-DSP library (or just the needed FFT function) into the firmware and expose a Python-callable function. This requires customizing the firmware build, but there are community discussions and examples on doing this (for instance, wrapping CMSIS-DSP functions in a MicroPython module) (CMSIS DSP and NN micropython wrappers · micropython · Discussion #10200 · GitHub). There isn’t an off-the-shelf MicroPython “numpy” with DSP yet (though it’s been contemplated (DSP CMSIS - MicroPython Forum (Archive)) (DSP CMSIS - MicroPython Forum (Archive))), but the ulab module is a notable alternative. ulab is a NumPy-like module for MicroPython (written in C) that provides vectorized operations and an FFT routine (
ulab.numpy.fft.fft
) in a Pythonic way (GitHub - peterhinch/micropython-fourier: Fast Fourier transform in MicroPython's inline ARM assembler.). It’s not specific to the RP2350, but it’s optimized C code, so it runs much faster than Python loops and can handle moderate-sized FFTs. If using MicroPython, you might consider using ulab’s FFT for simplicity – it won’t explicitly use the M33 DSP instructions, but in C it will still be fast, and you avoid writing assembly by hand. -
Peter Hinch’s Assembly FFT Library: For an authoritative, ready-made solution, Peter Hinch’s
micropython-fourier
library is highly recommended. This is a MicroPython library implementing a fast FFT (actually a DFT class) almost entirely in ARM assembly, specifically tuned for boards with an FPU (GitHub - peterhinch/micropython-fourier: Fast Fourier transform in MicroPython's inline ARM assembler.). It was written for the Pyboard (STM32F7) and has been updated to support the Pico 2’s M33 core. The library computes single-precision FFTs in-place, uses precomputed twiddle factors, and does no heap allocation, meaning it can even be called in an interrupt handler for real-time use (GitHub - peterhinch/micropython-fourier: Fast Fourier transform in MicroPython's inline ARM assembler.). It also offers conveniences like window functions and conversion to magnitude or dB. According to the author’s benchmarks, a 1024-point FFT takes ~6.97 ms on the Pico 2 (GitHub - peterhinch/micropython-fourier: Fast Fourier transform in MicroPython's inline ARM assembler.) (GitHub - peterhinch/micropython-fourier: Fast Fourier transform in MicroPython's inline ARM assembler.). Even a 256-point FFT would be only a fraction of that (the Reddit post below measured ~17 ms for 256-point including overhead in Python) (FFT sound spectrum analyzer running on a Raspberry Pi Pico 2. : r/raspberrypipico). Hinch’s code uses the M33’s FPU for arithmetic, so it is floating-point based (more dynamic range than fixed-point). The repository includes documentation (README with usage notes) and a test script. To use it, you can copy the.py
files to your Pico and import theDFT
class in MicroPython. This gives you an object where you can populate data and runfft()
to get the spectrum. (If you want to integrate with live ADC data, you’d sample into a Python array orarray.array('f')
, then call the assembly routine to transform it.)Reference: “Fast Fourier transform in MicroPython’s inline ARM assembler” by Peter Hinch (GitHub - peterhinch/micropython-fourier: Fast Fourier transform in MicroPython's inline ARM assembler.) (GitHub - peterhinch/micropython-fourier: Fast Fourier transform in MicroPython's inline ARM assembler.) – This is the library’s README, which explains the design and usage. It explicitly notes it “requires an ARM platform with FPU supporting Arm Thumb V7 assembler (e.g. Pyboard D, Pico 2)” (GitHub - peterhinch/micropython-fourier: Fast Fourier transform in MicroPython's inline ARM assembler.). In other words, it’s tailored for the Cortex-M4/M7/M33 class of devices. The README also provides a performance section comparing runtime on different boards (GitHub - peterhinch/micropython-fourier: Fast Fourier transform in MicroPython's inline ARM assembler.), confirming the Pico 2’s advantage thanks to its 2× clock and FPU/DSP support.
-
MicroPython Dual-Core: It’s worth noting that MicroPython on RP2040 currently runs on a single core by default. Using the second core is non-trivial in MicroPython (there is no built-in thread or task offload to the second core in the high-level API as of now). Advanced users have experimented with launching assembly code on the second core (via the
machine.mem
and writing to the second core’s start address), but this is complex. If dual-core operation is a requirement, writing that portion in C (and maybe invoking it via MicroPython’s FFI) or using an RTOS might be easier. However, given the Pico 2’s speed, many audio FFT applications can be done on one core in MicroPython if the heavy math is in optimized assembly. For instance, user Dan McCreary demonstrated a MicroPython spectrum analyzer on the Pico 2: using an I2S microphone at 8 kHz, a 256-point FFT in assembly (~17 ms), and updating an OLED display (FFT sound spectrum analyzer running on a Raspberry Pi Pico 2. : r/raspberrypipico) (FFT sound spectrum analyzer running on a Raspberry Pi Pico 2. : r/raspberrypipico). His results showed the FFT using only ~22% of the frame time (with the display update taking more time), so the optimized routine left plenty of CPU headroom (FFT sound spectrum analyzer running on a Raspberry Pi Pico 2. : r/raspberrypipico). This suggests that offloading the math to assembly was enough to meet real-time requirements without needing the second core in MicroPython. Documentation for his project is available on his “Learning MicroPython” site, and it specifically credits Peter Hinch’s FFT library for the speedup (FFT sound spectrum analyzer running on a Raspberry Pi Pico 2. : r/raspberrypipico) (FFT sound spectrum analyzer running on a Raspberry Pi Pico 2. : r/raspberrypipico).
Conclusion and Further Resources
In summary, the Raspberry Pi Pico 2’s RP2350 chip brings substantial DSP capability through its dual M33 cores with FPU/DSP extensions. To exploit this for fast FFTs:
- Use optimized libraries like CMSIS-DSP for C/C++ projects – these give you proven, assembly-tuned FFT functions that harness the hardware (floating or fixed-point) (GitHub - jptrainor/cmsis-sandbox: Benchmarking CMSIS-DSP on Rasberry Pi Pico) (GitHub - jptrainor/cmsis-sandbox: Benchmarking CMSIS-DSP on Rasberry Pi Pico). The official ARM docs and Raspberry Pi’s examples (e.g. CMSIS integration in the SDK (New RP2040 CMSIS Pack - Raspberry Pi) or Arduino’s CMSIS-DSP package) can guide you through setup.
- Study real-time examples (like Iwatake’s dual-core analyzer (Iwatake Turns the Raspberry Pi Pico Into a Dual-Core, FFT-Calculating Live Spectrum Analyzer - Hackster.io) or Bodmer’s DMA FFT display (GitHub - Bodmer/ADC_DMA_FFT: RP2040: sample ADC, run FFT, display on TFT)) to understand how to pipeline data acquisition and processing. They illustrate techniques like double-buffering ADC data with DMA, using one core per task, and optimizing I/O throughput – all relevant to high-performance audio applications on the Pico 2.
- Leverage MicroPython’s extensibility if you prefer Python at a high level. MicroPython by itself is not fast enough for DSP, but by injecting assembly (either inline or via a library) you can achieve performance close to a C implementation. Peter Hinch’s FFT library is an excellent, well-documented starting point for MicroPython (GitHub - peterhinch/micropython-fourier: Fast Fourier transform in MicroPython's inline ARM assembler.) (GitHub - peterhinch/micropython-fourier: Fast Fourier transform in MicroPython's inline ARM assembler.). Additionally, the
ulab
module can handle many numeric tasks in C for you, including FFT, which might simplify development for prototyping.
By combining these resources, you can capture audio in real-time, compute an FFT with minimal latency, and display the spectrum – fully utilizing the Pico 2’s hardware acceleration. Be sure to refer to the official Raspberry Pi Pico C/C++ SDK documentation for details on using the ADC, DMA, PIO, and multicore features, as those will be the backbone for any high-speed real-time processing on this platform. With the right approach, the tiny $5 Pico 2 can perform serious signal processing tasks that were previously possible only on much more expensive or power-hungry systems.
Sources: Documentation and code from Raspberry Pi Foundation and ARM, and community projects and libraries on GitHub: RP2350 hardware specs (Introducing the RP2350 | Stephen Smith's Blog), ARM CMSIS-DSP library API (GitHub - jptrainor/cmsis-sandbox: Benchmarking CMSIS-DSP on Rasberry Pi Pico) (GitHub - jptrainor/cmsis-sandbox: Benchmarking CMSIS-DSP on Rasberry Pi Pico), dual-core FFT example (iwatake) (Iwatake Turns the Raspberry Pi Pico Into a Dual-Core, FFT-Calculating Live Spectrum Analyzer - Hackster.io), MicroPython assembly FFT by Peter Hinch (GitHub - peterhinch/micropython-fourier: Fast Fourier transform in MicroPython's inline ARM assembler.) (GitHub - peterhinch/micropython-fourier: Fast Fourier transform in MicroPython's inline ARM assembler.), and others as cited above.