Skip to content

Research

Prompt

I have a Raspberry Pi Pico 2 that has a dual core RP2350 system on a chip. Each core is a Cortex-M33 with floating point and DSP support.

I want to run sound signal processing on these devices and run algorithms like the FFT that leverage the new floating point and the new DSP support.

Can you fine any documentation and sample FFT code that has been finely tuned for the Pico 2?

I am already aware of Peter Hinch's implementation of FFT in ARM assembly from his GitHub repo. However, his code is over 10 years old and clearly does not use any of the new features on the Pico 2.

Note that Peter's FFT in ARM assembly is here: https://github.com/peterhinch/micropython-fourier/blob/master/dft.py Note that it has not changed since 2015.

I am looking for authoritative information and working code that will take advantage of every feature of the Pico 2 including using dual-cores to read in sound and write FFT spectrums to a display. I would prefer code that also has a test suite and benchmarking tools on synthetic data.

Great, I’ll look for documentation and optimized FFT code for the Raspberry Pi Pico 2 (RP2350) that is compatible with MicroPython and designed to leverage its dual-core Cortex-M33 architecture, FPU, and DSP support. I’ll also search for examples that include performance benchmarks or testing tools, ideally with MicroPython calling into assembly or C. I’ll get back to you shortly with what I find.

Raspberry Pi Pico 2 FFT: Documentation & Optimized Implementations

Hardware DSP Capabilities of RP2350 (Pico 2)

The Raspberry Pi Pico 2 is built on the RP2350 microcontroller, which features dual Arm Cortex-M33 cores running up to 150 MHz. In contrast to the original Pico’s M0+ cores, the M33 cores include a single-precision FPU (Floating Point Unit) and Armv8-M DSP instructions (Introducing the RP2350 | Stephen Smith's Blog). This means the Pico 2 can perform floating-point math in hardware and use SIMD-like DSP operations (e.g. single-cycle multiply-accumulate) for faster signal processing. Official datasheets and ARM’s documentation note these enhancements, which significantly improve algorithms like FFT on this platform (Introducing the RP2350 | Stephen Smith's Blog). (The RP2350 also has a dual-core RISC-V cluster, but only one of the two architectures can run at a time.)

Authoritative references

  • RP2350 Datasheet / Tech Docs: The Raspberry Pi Foundation provides documentation for the RP2350, detailing its Cortex-M33 cores, FPU support, and DSP instruction set. These instructions accelerate common DSP operations (fixed-point arithmetic, saturating Q15/Q31 math, etc.), benefiting FFT and filtering routines.
  • ARM Cortex-M33 Docs: ARM’s official manuals describe the M33’s instruction set and extensions. Notably, the M33’s DSP extension adds optimized arithmetic instructions, and the single-precision FPU executes IEEE754 float operations in hardware (Introducing the RP2350 | Stephen Smith's Blog). Together, these enable much faster FFT computation than on the original Pico (which lacked an FPU or DSP extensions).

Optimized FFT Libraries for Cortex-M33 (RP2350)

ARM CMSIS-DSP Library: The CMSIS-DSP library is an ARM-developed collection of highly optimized DSP functions for Cortex-M cores (GitHub - jptrainor/cmsis-sandbox: Benchmarking CMSIS-DSP on Rasberry Pi Pico). It includes FFT implementations for various data types (32-bit float, 64-bit float, Q15, Q31 fixed-point) that automatically leverage the hardware FPU and DSP instructions when available (GitHub - jptrainor/cmsis-sandbox: Benchmarking CMSIS-DSP on Rasberry Pi Pico) (GitHub - jptrainor/cmsis-sandbox: Benchmarking CMSIS-DSP on Rasberry Pi Pico). For example, CMSIS-DSP provides functions like arm_rfft_fast_f32 (for real FFT on floats) and arm_cfft_q15 (complex FFT on Q15 fixed-point) among many others (GitHub - jptrainor/cmsis-sandbox: Benchmarking CMSIS-DSP on Rasberry Pi Pico). On the Pico 2’s M33, these routines run very efficiently – using the FPU for floating-point FFTs or SIMD fixed-point arithmetic for Q15/Q31 FFTs. ARM’s documentation explains that the “Fast” real-FFT algorithms exploit symmetry to halve the computation versus naive complex FFTs (CMSIS-DSP: Real FFT Functions) (CMSIS-DSP: Real FFT Functions).

Key resources:
- CMSIS-DSP Documentation – The user manual (on ARM’s GitHub pages) details each FFT function and usage (CMSIS-DSP: Real FFT Functions). It covers initialization (e.g. arm_rfft_fast_init_f32), input/output data formats, and optimized algorithms for real data (CMSIS-DSP: Real FFT Functions) (CMSIS-DSP: Real FFT Functions). This is an authoritative reference for how to use these functions and what hardware features they exploit.
- CMSIS-DSP Example Code – There are many examples of using CMSIS-DSP FFT on microcontrollers (for instance, STM32 examples and application notes (STM32 Fast Fourier Transform (CMSIS DSP FFT) - Phil's Lab #111)). These can be adapted to Pico 2. A community project by jptrainor benchmarked CMSIS-DSP on the original Pico, demonstrating how to integrate the library (in C/C++) and measure performance (GitHub - jptrainor/cmsis-sandbox: Benchmarking CMSIS-DSP on Rasberry Pi Pico) (GitHub - jptrainor/cmsis-sandbox: Benchmarking CMSIS-DSP on Rasberry Pi Pico). (On an M0+ Pico, float FFTs had to rely on software, but on the M33 Pico 2 the same code will use the hardware FPU for a big speedup.)

KISS FFT (Pico FFT Library): Another option is the lightweight KISS FFT (“Keep It Simple FFT”) library, which is written in portable C. A community member created a Pico-oriented wrapper called pico_fft based on KISS FFT (GitHub - Googool/pico_fft: A lightweight and efficient FFT (Fast Fourier Transform) library for the Raspberry Pi Pico, based on the KISS FFT library.). It simplifies capturing ADC data and computing an FFT on the Pico. While KISS FFT isn’t as optimized as CMSIS-DSP, it’s easier to understand and has a small footprint. The pico_fft project includes documentation and examples of connecting an analog microphone to the Pico and performing an FFT on the samples (GitHub - Googool/pico_fft: A lightweight and efficient FFT (Fast Fourier Transform) library for the Raspberry Pi Pico, based on the KISS FFT library.). All examples (in C using the Pico SDK) have been tested, and they even provide a quick-start guide for hardware setup and code building (GitHub - Googool/pico_fft: A lightweight and efficient FFT (Fast Fourier Transform) library for the Raspberry Pi Pico, based on the KISS FFT library.) (GitHub - Googool/pico_fft: A lightweight and efficient FFT (Fast Fourier Transform) library for the Raspberry Pi Pico, based on the KISS FFT library.). This could be useful if you prefer a self-contained library. Keep in mind that on the RP2350, a KISS FFT (written in C) will still benefit from the M33’s speed and can use the FPU for float math (via the compiler) – but it won’t explicitly use SIMD DSP instructions unless manually optimized.

Performance notes: On the Pico 2, a well-optimized FFT is quite fast. For instance, Peter Hinch’s assembly FFT (discussed below) can compute a 1024-point real FFT in about 7 ms on a Pico 2 (GitHub - peterhinch/micropython-fourier: Fast Fourier transform in MicroPython's inline ARM assembler.). In comparison, the same on a classic Pico M0+ might take tens of milliseconds or more since it lacks the FPU (python - Raspberry Pi Pico(RP2040 or RP2350) ASM PIO microPython for FFT, DSP - Signal Processing Stack Exchange) (python - Raspberry Pi Pico(RP2040 or RP2350) ASM PIO microPython for FFT, DSP - Signal Processing Stack Exchange). CMSIS-DSP’s routines should achieve similar performance to the assembly approach, as they are built to leverage the same hardware features. The CMSIS library also supports smaller FFT sizes efficiently (e.g. 256-point, 512-point), often using mixed-radix algorithms for speed. ARM’s benchmarks (in CMSIS docs) show orders of magnitude speedup vs. naive FFT implementations thanks to these optimizations (GitHub - jptrainor/cmsis-sandbox: Benchmarking CMSIS-DSP on Rasberry Pi Pico).

Dual-Core Audio Processing Examples

One advantage of the RP2350 (Pico 2) is its dual-core CPU, which you can use to parallelize tasks – ideal for real-time audio capture and FFT display. A great example is Iwatake’s dual-core spectrum analyzer project. It runs on the original Pico (RP2040) but demonstrates the concept well: Core0 handles data acquisition from a microphone via ADC (using DMA) and driving an SPI display, while Core1 performs the FFT calculations on the incoming data stream (Iwatake Turns the Raspberry Pi Pico Into a Dual-Core, FFT-Calculating Live Spectrum Analyzer - Hackster.io). This way, the heavy math on Core1 doesn’t stall sampling or screen updates on Core0. The project was written in C++ using the official Pico SDK, and Iwatake has released the full source code on GitHub (Iwatake Turns the Raspberry Pi Pico Into a Dual-Core, FFT-Calculating Live Spectrum Analyzer - Hackster.io) (Iwatake Turns the Raspberry Pi Pico Into a Dual-Core, FFT-Calculating Live Spectrum Analyzer - Hackster.io). (The code uses ADC + DMA to fill buffers, triggers an IRQ or flag when a buffer is ready, then Core1 processes the buffer with an FFT and sends the result to Core0 for display.) This design achieved a functional real-time spectrum display, although the author noted a occasional bug causing freezes (likely a synchronization issue) (Iwatake Turns the Raspberry Pi Pico Into a Dual-Core, FFT-Calculating Live Spectrum Analyzer - Hackster.io). It’s a useful reference for structuring dual-core tasks and using the Pico’s multicore FIFO or interrupts to communicate between cores.

Another community example is a project by Van Hunter Adams at Cornell: Realtime Audio FFT to VGA on RP2040. While it ran on a single-core (RP2040) with heavy use of DMA and PIO, it illustrates high-speed ADC capture (10 kHz) with DMA, and performing a 1024-point fixed-point FFT on the fly (FFT) (FFT). On the Pico 2, one could adapt this idea and split the work across cores for even better throughput (e.g., Core0 managing continuous DMA ADC sampling, Core1 doing the FFT and sending results to a VGA or LCD driver). The code and write-up for that project are available on Adams’s site (FFT) (FFT) and could serve as a starting point for handling real-time data streams.

For a more hobbyist-friendly approach, there’s an Arduino-based spectrum analyzer for RP2040 by Bodmer that uses one core but cleverly combines DMA and an SPI TFT update. It samples ADC at ~14 kHz, computes an FFT (64–512 points), and draws the spectrum on a 320x240 TFT. The code uses the Earle Philhower Arduino core (which has built-in support for the RP2040) and likely leverages the CMSIS-DSP library under the hood (the Arduino mbed_rp2040 core includes CMSIS-DSP) or a simple FFT routine. It achieves about 54 FPS for a 256-point FFT visualization (and up to 81 FPS for 64-point) by using DMA for sampling and fast SPI for drawing (GitHub - Bodmer/ADC_DMA_FFT: RP2040: sample ADC, run FFT, display on TFT) (GitHub - Bodmer/ADC_DMA_FFT: RP2040: sample ADC, run FFT, display on TFT). The source is available on GitHub as “ADC_DMA_FFT” with documentation in the README (GitHub - Bodmer/ADC_DMA_FFT: RP2040: sample ADC, run FFT, display on TFT) (GitHub - Bodmer/ADC_DMA_FFT: RP2040: sample ADC, run FFT, display on TFT). While this was tested on RP2040, the same code on an RP2350 could be tweaked to use both cores or simply enjoy the extra performance headroom (e.g. to increase FFT size or sample rate). It’s a good, well-documented reference for implementing an end-to-end audio FFT pipeline on the Pico family.

Key takeaways for dual-core design: Use DMA to offload data movement (e.g. ADC to memory), use one core exclusively for I/O (sensor reads, drawing to display) and the other for computation, and use thread-safe queues or interrupts to hand off data between cores. The Pico SDK provides multicore_fifo_push/pop functions and even higher-level primitives to coordinate the two cores. In FreeRTOS or other RTOS environments, you could assign tasks to cores and use mutexes/queues. The examples above show that a dual-core Pico can comfortably handle real-time audio FFT at audio-rate sampling (8–44 kHz) and update a display, especially when each core’s workload is optimized.

MicroPython and Assembly Integration

MicroPython on the Pico 2 lets you write high-level code, but Python by itself is too slow for real-time FFT on microcontrollers. The good news is MicroPython supports calling assembly-optimized routines for performance-critical parts. There are a few ways to do this:

Conclusion and Further Resources

In summary, the Raspberry Pi Pico 2’s RP2350 chip brings substantial DSP capability through its dual M33 cores with FPU/DSP extensions. To exploit this for fast FFTs:

By combining these resources, you can capture audio in real-time, compute an FFT with minimal latency, and display the spectrum – fully utilizing the Pico 2’s hardware acceleration. Be sure to refer to the official Raspberry Pi Pico C/C++ SDK documentation for details on using the ADC, DMA, PIO, and multicore features, as those will be the backbone for any high-speed real-time processing on this platform. With the right approach, the tiny $5 Pico 2 can perform serious signal processing tasks that were previously possible only on much more expensive or power-hungry systems.

Sources: Documentation and code from Raspberry Pi Foundation and ARM, and community projects and libraries on GitHub: RP2350 hardware specs (Introducing the RP2350 | Stephen Smith's Blog), ARM CMSIS-DSP library API (GitHub - jptrainor/cmsis-sandbox: Benchmarking CMSIS-DSP on Rasberry Pi Pico) (GitHub - jptrainor/cmsis-sandbox: Benchmarking CMSIS-DSP on Rasberry Pi Pico), dual-core FFT example (iwatake) (Iwatake Turns the Raspberry Pi Pico Into a Dual-Core, FFT-Calculating Live Spectrum Analyzer - Hackster.io), MicroPython assembly FFT by Peter Hinch (GitHub - peterhinch/micropython-fourier: Fast Fourier transform in MicroPython's inline ARM assembler.) (GitHub - peterhinch/micropython-fourier: Fast Fourier transform in MicroPython's inline ARM assembler.), and others as cited above.