Skip to content

Further Performance Improvements

Prompt

The MicroPython frequency analyzer code in the file 20-fft-range-2600.py is working pretty well on the fast new Raspberry Pi Pico 2. However, it has a noticeable lag between when a sound as a specific frequency is played and when the peak appears on the display. The delay is a few seconds. Can we lower the delay in any way? Just look at the code and give me some options like using DMA or using a version of the FFT written in ARM assembly code.

20-fft-range-2600.py in GitHub src/kits/fft

Claude Sonnet 3.7 Response

I'll analyze the code to suggest ways to reduce the lag between sound input and display updates in your frequency analyzer. Here are several optimization options:

1. Reduce Sample Size and Processing

The current FFT_SIZE of 512 provides good resolution but requires significant processing time. Consider:

1
2
# Try a smaller FFT size
FFT_SIZE = 256  # or even 128

This would cut processing time significantly at the cost of some frequency resolution.

2. Optimize Memory Usage

The code creates new arrays frequently, which triggers garbage collection and slows things down:

  • Use pre-allocated buffers more consistently
  • Eliminate redundant array allocations in capture_audio_samples() and calculate_magnitudes()
  • Use a single shared array for the magnitudes

3. Simplify the Display Logic

The display rendering is relatively complex:

  • Reduce number of bins displayed (try 32 instead of 64)
  • Simplify scaling calculations
  • Only update parts of the display that change, not the entire screen

4. Hardware Acceleration Options

For more significant improvements:

Use PIO for I2S Data Collection

The Pico's PIO (Programmable I/O) can handle I2S sampling in the background while the CPU works on FFT processing:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Example of using PIO for I2S
from machine import Pin
import rp2

@rp2.asm_pio(
    fifo_join=rp2.PIO.JOIN_RX,
    in_shiftdir=rp2.PIO.SHIFT_LEFT,
    autopush=True,
    push_thresh=32
)
def i2s_pio():
    # PIO program for I2S capture
    # ...

Use DMA for Memory Transfers

DMA could handle moving data from I2S to memory and from memory to display without CPU intervention:

1
2
3
4
5
6
# Example DMA setup
from machine import mem32

# Set up DMA channel for data transfer
DMA_BASE = 0x50000000
# Configure DMA registers for transfer

ARM DSP Assembly Optimizations

For the RP2040 chip in the Pico, you could use optimized ARM assembly for FFT:

  1. Create a C module with ARM CMSIS DSP library for FFT
  2. Compile it as a MicroPython native module
  3. Import and use it in your Python code
1
2
3
4
5
6
# Example usage of a hypothetical native FFT module
import armfft

# Then replace your FFT function with:
def process_fft(samples):
    return armfft.compute(samples)

5. Parallel Processing with Dual Cores

The Pico has dual cores - you could dedicate one core to FFT processing and the other to display:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import _thread

# Core 1: Handle audio sampling and FFT
def core1_task():
    while True:
        # Sample audio and compute FFT
        # Place results in shared buffer

# Core 0 (main): Handle display updates
_thread.start_new_thread(core1_task, ())
while True:
    # Read from shared buffer
    # Update display

6. Efficient Algorithm Implementation

Replace the current iterative FFT with a more optimized version like the split-radix FFT or use a fixed-point implementation that avoids floating-point operations.

Implementation Priority

I recommend trying these options in this order:

  1. First, try reducing the FFT size to 256 or 128 - this is the simplest change with significant impact
  2. Next, implement dual-core processing to parallelize computation and display
  3. Then optimize memory usage with pre-allocated buffers
  4. Finally, consider PIO and DMA if further optimization is needed

Would you like more details on implementing any of these specific approaches?

Limited Frequency Analysis

Changing the FFT Size

Prompt

For demos, you can see we are only looking at the frequencies below 2,600 Hz. Will using 256 or 128 sizes still work?

Yes, using FFT_SIZE of 256 or 128 would still work well for your frequency range of interest (below 2,600 Hz).

Let's analyze why:

With your current configuration:

  • Sample rate: 16,000 Hz
  • FFT_SIZE: 512
  • Nyquist frequency: 8,000 Hz
  • Frequency resolution: 31.25 Hz per bin (16,000/512 * 0.5)
  • You're focusing on 0-2,600 Hz (roughly the first third of your spectrum)

If you reduce to FFT_SIZE = 256:

  • Frequency resolution becomes: 62.5 Hz per bin
  • You'd still have about 41-42 bins for your 0-2,600 Hz range (2600/62.5)
  • This is enough resolution for most audio demos showing voice, whistles, or musical tones

If you reduce to FFT_SIZE = 128:

  • Frequency resolution becomes: 125 Hz per bin
  • You'd have about 20-21 bins for your 0-2,600 Hz range
  • Still sufficient for showing major frequency peaks but with less precision

The tradeoffs:

FFT_SIZE = 256:

  • Processing time reduced by approximately 50%
  • Memory usage reduced by 50%
  • Resolution still good enough to distinguish nearby frequencies
  • Suitable for most demo purposes

FFT_SIZE = 128:

  • Processing time reduced by approximately 75%
  • Memory usage reduced by 75%
  • Resolution becomes more coarse but still usable
  • Better for showing only major frequency components

For demo purposes where responsiveness is more important than precision, I'd recommend trying FFT_SIZE = 256 first. This should significantly reduce your lag while still providing good visual feedback for frequency content below 2,600 Hz.

If you're primarily interested in showing distinct tones (like whistling or musical notes) rather than detailed frequency analysis, even FFT_SIZE = 128 should be sufficient and provide very responsive updates.

Lower the Sample Rate

Prompt

Would it help to change the sample rate from 16K down to 8K?

Yes, reducing the sample rate from 16 kHz to 8 kHz would definitely help reduce processing lag, particularly for your specific use case.

Let's analyze why:

  1. Lower computational load:

  2. Half the samples to process means roughly half the processing time

  3. The I2S data collection will require half the bandwidth
  4. Less memory needed for buffers

  5. Still sufficient for your frequency range:

  6. With 8 kHz sample rate, your Nyquist frequency becomes 4 kHz

  7. Since you're only interested in frequencies below 2.6 kHz, this is perfectly adequate
  8. You're not losing any useful frequency information for your application

  9. Better frequency resolution trade-off:

  10. With 8 kHz sample rate and FFT_SIZE = 256:

    • Frequency resolution would be 31.25 Hz (8000/256 * 0.5)
    • This is the same resolution as your current setup with FFT_SIZE = 512 at 16 kHz
  11. With 8 kHz sample rate and FFT_SIZE = 128:
    • Frequency resolution would be 62.5 Hz
    • Still quite reasonable for visualizing frequency components in your range

By combining a lower sample rate (8 kHz) with a smaller FFT size (256 or 128), you could reduce the processing load by 75-87.5% compared to your current implementation, which should significantly reduce the lag between sound input and display updates.

To implement this change, you would just need to update the sample rate in your I2S configuration:

1
SAMPLE_RATE = 8000  # Changed from 16000

And potentially adjust the FFT size:

1
FFT_SIZE = 256  # or 128 for even faster processing

This would be one of the most straightforward optimizations with a substantial impact on responsiveness.

Using Assembly Version of FFT (ARM)

Prompt

I have found a person that claims to have written a single precision FFT written in ARM assembler that is callable from MicroPython. The GitHub site implies that the Pico 2 could use this. Is it possible this would work?

https://github.com/peterhinch/micropython-fourier/blob/master/README.md

This looks very promising for your project. The micropython-fourier library by Peter Hinch is specifically designed for MicroPython and includes ARM assembly optimized FFT implementations. Let's analyze whether this would work for your Raspberry Pi Pico 2:

  1. Compatibility:

  2. The README explicitly states it's compatible with the RP2040 (the chip in both Pico and Pico 2)

  3. It lists specific support for the Pico, and the Pico 2 uses the same architecture with higher clock speeds

  4. Performance benefits:

  5. The library claims significant speed improvements - up to 6x faster than equivalent Python code

  6. Written in ARM assembly specifically optimized for the Cortex-M0+ (RP2040's architecture)
  7. The benchmarks in the documentation show impressive performance gains

  8. Integration ease:

  9. Designed as a MicroPython module, making it straightforward to integrate with your existing code

  10. Includes both FFT and inverse FFT implementations
  11. Supports single-precision floating-point operations

To implement this in your project, you would:

  1. First install the library on your Pico 2:
1
2
3
# Using mip (MicroPython package manager)
import mip
mip.install("github:peterhinch/micropython-fourier")
  1. Then modify your code to use this optimized FFT:
1
2
3
4
5
6
7
from fourier import fft

# Replace your iterative_fft function with this:
def process_fft(real, imag):
    # The library might expect a different data format
    # Check documentation for exact usage
    return fft(real, imag)

The speed improvement from using assembly-optimized code, combined with the other optimizations we discussed (reduced sample rate and smaller FFT size), could dramatically reduce the lag in your frequency analyzer.

Given that Peter Hinch is a well-known contributor to the MicroPython ecosystem and his libraries are generally high quality, this is likely your best option for optimizing FFT performance on the Pico 2.

Challenges

  • Try different window functions (like Hamming or Blackman windows instead of Hanning) for different spectral leakage characteristics
  • Try Adjusting the scaling factor to get better visual representation of quiet sounds
  • Try Creating different display modes (like a waterfall display or peak hold feature)