Skip to content

NOTES

Tip

TLDR: For real-time processing tasks, it is recommended to:

  • (1) enable DMA for data acquisition;
  • (2) separate data production and consumption into independent tasks and decouple them via buffers;
  • (3) use a ring buffer when overlapping window processing is required, otherwise use a dual-buffer (ping-pong) scheme to improve throughput;
  • (4) choose single-core or dual-core processing based on computational requirements.

Overview

The real-time sensing and processing module provides sensor data acquisition with real-time computation and feature extraction. Unlike the online and offline sensing modules which focus on data collection and transmission, this module adds real-time processing capabilities, enabling on-device computation and feature extraction during data acquisition.

The module implements three different architectures, each optimized for different performance requirements and use cases. All three architectures follow the same fundamental design pattern: Data Production → Data Buffering → Data Consumption, but differ in their implementation details for each stage.

Architecture selection is done at compile time via a macro definition, providing a unified API interface regardless of the underlying implementation.

Relationship with Other Modules

Functional Comparison

Module Primary Function Processing Use Case
Online Sensing Real-time data acquisition and transmission None Continuous monitoring, live data streaming
Offline Sensing High-frequency batch collection and storage None (post-processing) Batch data collection for later analysis
RT-Sense-Processing Real-time acquisition + real-time processing Real-time computation On-device feature extraction, edge AI

When to Use Which Module

Use Online Sensing when:

  • You need to stream raw sensor data in real-time
  • Data processing will be done on a remote server or cloud platform
  • Simple, low-latency data transmission is required
  • Sampling frequency is moderate (0.1 - 300 Hz)

Use Offline Sensing when:

  • You need to collect high-frequency data for later analysis
  • Post-processing and analysis will be performed offline
  • Long-duration data collection is required
  • Sampling frequency is high (up to 4000 Hz)

Use RT-Sense-Processing when:

  • You need real-time computation and feature extraction
  • On-device processing is required (edge AI, real-time detection)
  • Processing must keep up with data acquisition rate
  • You need immediate feedback based on processed data

Technical Architecture Comparison

Online Sensing:

  • Single task with ESP timer callback
  • Direct data formatting and transmission
  • Simple, lightweight implementation
  • Suitable for low to medium frequency

Offline Sensing:

  • Single task with ESP timer callback
  • Memory buffer + SD card storage
  • Two-phase approach: collection then storage
  • Suitable for high-frequency batch collection

RT-Sense-Processing:

  • Multi-task architecture (Producer-Consumer pattern)
  • Three architecture options with different optimizations
  • Parallel data acquisition and processing
  • Suitable for real-time computation requirements

Architecture Components

All three real-time processing architectures follow the same three-stage pipeline:

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│  Data Production│───▶│ Data Buffering  │───▶│ Data Consumption│
│  (Producer)     │    │  (Buffer/Cache) │    │  (Consumer)     │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Stage 1: Data Production

The Producer stage is responsible for acquiring sensor data at fixed intervals. All three architectures use the same fundamental approach:

  • Timer-Driven: ESP timer triggers data acquisition at configured sampling frequency
  • Task Notification: Timer callback notifies producer task via xTaskNotify (non-blocking)
  • Sensor Reading: Producer task reads ADXL355 accelerometer data (x, y, z) and temperature via SPI
  • Sample Preparation: Packages data into rt_process_sample_t structure with timestamp

Differences between architectures:

Architecture SPI Transfer Method CPU Usage Notes
Architecture 1: Producer-Consumer Standard SPI (CPU-driven) Medium Direct SPI read, CPU handles transfer
Architecture 2: DMA + Double Buffer DMA-assisted SPI Low SPI driver automatically uses DMA for transfers
Architecture 3: DMA + Dual Core DMA-assisted SPI Lowest Same as Architecture 2, but producer pinned to Core 0

Stage 2: Data Buffering

The Buffering stage decouples data production from consumption, allowing parallel operation. This is where the three architectures differ most significantly:

Architecture 1: Circular Buffer

  • Buffer Type: Single circular buffer (FIFO)
  • Size: 512 samples (configurable)
  • Memory: PSRAM allocation
  • Synchronization: Mutex-protected access
  • Behavior: FIFO overwrite when full (oldest data overwritten)
  • Write Strategy: Producer writes to current write pointer, advances circularly
  • Read Strategy: Consumer reads from read pointer, advances circularly, supports arbitrary position and length data access
  • Overflow Handling: Read pointer advances automatically when buffer full (overwrite)

Circular Buffer Animation

Figure: Circular buffer operation showing producer writing and consumer reading with FIFO overwrite behavior

Key Advantages of Circular Buffer:

The circular buffer is the ideal choice for real-time DSP applications, especially for scenarios requiring overlapping window processing:

  1. Flexible Window Access: Consumer can read data windows of arbitrary length from any position in the buffer, without buffer boundary restrictions
  2. Window Overlap Support: Easily supports sliding windows, overlapping windows, and other DSP algorithms (e.g., FFT, STFT, filter banks)
  3. Continuous Data Stream: Data is stored continuously in the circular buffer, supporting cross-boundary reads without data copying or reorganization
  4. Low Latency Processing: Can process the latest N samples, or extract arbitrary time windows from historical data
  5. Memory Efficiency: Single buffer, minimal memory footprint

Typical Application Scenarios: - FFT/STFT Analysis: Requires sliding windows, processing 256 samples each time, but window slides only 128 samples (50% overlap) - Filter Banks: Requires multiple overlapping time windows for frequency domain analysis - Feature Extraction: Needs to extract features of different time scales from continuous data streams - Real-time Spectrum Analysis: Requires continuous, overlapping spectrum calculations

Limitations of Double Buffer: The double buffer architecture has significant limitations when processing overlapping windows: - Fixed Window Boundaries: Can only process entire buffers (512 samples), cannot flexibly choose window position and size - Window Overlap Difficulties: To achieve overlapping windows, additional data copying and reorganization is required, increasing latency and memory overhead - Discontinuous Data Access: Data is distributed across two separate buffers, cross-buffer access requires complex data management - Not Suitable for Sliding Windows: Each processing cycle must wait for the entire buffer to fill, cannot achieve flexible sliding window processing

Architecture 2: DMA + Double Buffer (Ping-Pong)

  • Buffer Type: Two separate buffers (A and B) in ping-pong configuration
  • Size: 512 samples per buffer (configurable)
  • Memory: PSRAM allocation for both buffers
  • Synchronization: Mutex per buffer + Semaphore for buffer-ready notification
  • Behavior: Producer fills one buffer while consumer processes the other
  • Write Strategy: Producer writes to active buffer (A or B), switches when full
  • Read Strategy: Consumer waits for buffer-ready semaphore, processes entire buffer
  • Overflow Handling: Producer switches buffers even if previous buffer not fully processed (overwrite detected)

Double Buffering

Figure: Double buffering (ping-pong) operation showing parallel acquisition and processing

Architecture 3: DMA + Dual Core + Double Buffer

  • Buffer Type: Two separate buffers (A and B) in ping-pong configuration
  • Size: 256 samples per buffer (configurable, smaller for dual-core efficiency)
  • Memory: PSRAM allocation for both buffers
  • Synchronization: Mutex per buffer + Semaphore for buffer-ready notification
  • Behavior: Same as Architecture 2, but with core pinning
  • Write Strategy: Producer (Core 0) writes to active buffer, switches when full
  • Read Strategy: Consumer (Core 1) waits for buffer-ready semaphore, processes entire buffer
  • Overflow Handling: Same as Architecture 2
  • Core Assignment: Producer on Core 0, Consumer on Core 1 (true parallelism)

Double Buffering

Figure: Double buffering with dual-core division (Core 0: Producer, Core 1: Consumer)

Stage 3: Data Consumption

The Consumer stage processes the buffered data and performs real-time computation. All three architectures use the same processing logic, but differ in how they access the buffer:

Processing Example: Acceleration Detection

For demonstration purposes, the module implements acceleration detection as the data consumption example:

Detection Conditions:

  • |x| > 0.5g OR |y| > 0.5g OR |z| < 0.5g

Output:

  • LCD Visual Feedback: RED when conditions met, WHITE otherwise
  • Color Persistence: 0.3 seconds hold duration
  • Serial Output: Optional processed data output
  • MQTT Output: Optional processed results publishing

Architecture-Specific Consumption Details

Architecture Buffer Access Processing Strategy Sample Count
Architecture 1: Producer-Consumer Mutex-protected circular buffer read Flexible processing: last N samples, arbitrary windows, supports overlapping windows Up to 10 samples per cycle (configurable), supports arbitrary window size and position
Architecture 2: DMA + Double Buffer Semaphore-wait for buffer-ready, then process entire buffer Fixed processing of entire buffer, no window overlap support 512 samples per cycle (fixed)
Architecture 3: DMA + Dual Core Semaphore-wait for buffer-ready, then process entire buffer Fixed processing of entire buffer, no window overlap support 256 samples per cycle (fixed)

Common Processing Flow:

  1. Wait for data availability (mutex/semaphore)
  2. Read samples from buffer
  3. Perform acceleration detection on each sample
  4. Update LCD color based on detection results
  5. Output results (serial, MQTT if enabled)
  6. Update processing statistics

Architecture Comparison

The following table summarizes the key differences between the three architectures across all three stages:

Feature Architecture 1: Producer-Consumer Architecture 2: DMA + Double Buffer Architecture 3: DMA + Dual Core
Data Production
SPI Transfer DMA-assisted SPI DMA-assisted SPI DMA-assisted SPI
Data Buffering
Buffer Type Circular Buffer Double Buffer (Ping-Pong) Double Buffer (Ping-Pong)
Buffer Size 512 samples 512 samples × 2 256 samples × 2
Synchronization Mutex Mutex + Semaphore Mutex + Semaphore
Data Consumption
Processing Strategy Flexible: last N samples, arbitrary windows, supports overlapping windows Fixed: entire buffer only, no window overlap Fixed: entire buffer only, no window overlap
Core Assignment Automatic (FreeRTOS) Automatic (FreeRTOS) Core 0: Producer, Core 1: Consumer
Overall Performance
Frequency Range 0.1 - 1000 Hz 0.1 - 10000 Hz 0.1 - 10000 Hz
Suitable Frequency 100 Hz - 1 kHz (recommended) 1 kHz - 10 kHz (recommended) Highest performance (recommended)
CPU Usage Medium Low (DMA handles transfers) Lowest (dual-core utilization)
Memory Usage Circular Buffer (512 samples) Two buffers (512 samples each) Two buffers (256 samples each)
Advantages Simple, decoupled, supports window overlap, flexible data access High throughput, parallel processing Maximum performance, true parallelism
Limitations Buffer may become bottleneck Higher memory usage, no window overlap support, fixed window size Requires dual-core support, no window overlap support, fixed window size

Architecture Selection Guide

Tip

Data production and data consumption can be implemented as independent tasks. For data acquisition, it is recommended to enable DMA-assisted SPI to improve efficiency. Data consumption can be executed on a single core or dual cores depending on actual requirements. Regarding buffer management, if real-time processing is required and data windows overlap, a ring buffer is recommended; if there is no overlap between data windows and batch-oriented processing is preferred, a dual-buffer (ping-pong) scheme can be used to improve throughput.

Choose Architecture 1 (Producer-Consumer) when:

  • Window overlap processing is required (e.g., FFT, STFT, filter banks, and other real-time DSP algorithms)
  • Flexible data window access is needed (sliding windows, arbitrary position and length data extraction)
  • Sampling frequency is moderate (0.1 - 1000 Hz, recommended: 100 Hz - 1 kHz)
  • Simple implementation is preferred
  • Memory resources are limited
  • You need a straightforward, well-understood pattern
  • Processing can work with small batches (10 samples) or arbitrary window sizes

Choose Architecture 2 (DMA + Double Buffer) when:

  • Window overlap processing is NOT required (processing algorithm does not need sliding windows or overlapping windows)
  • Sampling frequency is high (0.1 - 10000 Hz, recommended: 1 kHz - 10 kHz)
  • You need high throughput
  • CPU usage should be minimized
  • Parallel acquisition and processing is beneficial
  • Processing benefits from larger batches (512 samples), and fixed window size is acceptable
  • Note: If your DSP algorithm requires window overlap (e.g., FFT overlap, STFT), choose Architecture 1

Choose Architecture 3 (DMA + Dual Core) when:

  • Window overlap processing is NOT required (processing algorithm does not need sliding windows or overlapping windows)
  • Maximum performance is required
  • You have a dual-core ESP32 platform
  • Processing workload is computationally intensive
  • True parallelism between acquisition and processing is needed
  • Processing benefits from dedicated core assignment, and fixed window size is acceptable
  • Note: If your DSP algorithm requires window overlap (e.g., FFT overlap, STFT), choose Architecture 1

Unified API

All three architectures share the same unified API interface, making it easy to switch between implementations without changing application code. Architecture selection is done at compile time via the RT_PROCESS_ARCH_TYPE macro.

Compile-Time Architecture Selection

// Default: Producer-Consumer
#include "real-time-process-arch.h"

// Or explicitly select architecture:
#define RT_PROCESS_ARCH_TYPE RT_ARCH_PRODUCER_CONSUMER
#include "real-time-process-arch.h"

// Or select DMA + Double Buffer:
#define RT_PROCESS_ARCH_TYPE RT_ARCH_DMA_DOUBLE_BUFFER
#include "real-time-process-arch.h"

// Or select DMA + Dual Core:
#define RT_PROCESS_ARCH_TYPE RT_ARCH_DMA_DUAL_CORE
#include "real-time-process-arch.h"

API Functions

All architectures provide the following unified interface:

  • rt_process_init() - Initialize the architecture
  • rt_process_set_sensor_handle() - Set ADXL355 sensor handle
  • rt_process_start() - Start real-time processing
  • rt_process_stop() - Stop real-time processing
  • rt_process_deinit() - Deinitialize the architecture
  • rt_process_get_status() - Get current status
  • rt_process_get_stats() - Get performance statistics
  • rt_process_set_frequency() - Update sampling frequency
  • rt_process_is_running() - Check if processing is running
  • rt_process_set_accel_detection() - Enable/disable acceleration detection

Configuration

Configuration Structure

typedef struct
{
    float sampling_frequency_hz;     // Sampling frequency in Hz (default: 100.0)
    uint32_t queue_size;             // Queue size for producer-consumer (default: 50)
    bool enable_mqtt;                // Enable MQTT output (default: false)
    bool enable_serial;              // Enable serial output (default: true)
    bool enable_accel_detection;     // Enable acceleration detection with LCD feedback (default: false)
    const char *mqtt_topic;         // MQTT topic (default: NULL uses default topic)
} rt_process_config_t;

Default Configuration

  • Sampling Frequency: 100.0 Hz
  • Queue Size (Architecture 1): 50 samples
  • Buffer Size (Architecture ⅔): 512 samples (Architecture 2), 256 samples (Architecture 3)
  • MQTT Output: Disabled
  • Serial Output: Enabled
  • Acceleration Detection: Disabled

Performance Statistics

The module collects comprehensive performance statistics for monitoring and optimization:

typedef struct
{
    uint32_t total_samples;         // Total number of samples acquired
    uint32_t processed_samples;     // Number of samples processed
    uint32_t dropped_samples;       // Number of samples dropped (queue/buffer full)
    float avg_acquisition_time_us;  // Average acquisition time per sample
    float avg_process_time_us;      // Average processing time per sample
    float cpu_usage_core0;          // CPU usage on Core 0 (%)
    float cpu_usage_core1;          // CPU usage on Core 1 (%) (for dual-core arch)
    uint64_t last_sample_time_us;   // Timestamp of last sample
} rt_process_stats_t;

Usage Workflow

  1. Initialize: Call rt_process_init() with configuration (or NULL for defaults)
  2. Set Sensor Handle: Call rt_process_set_sensor_handle() with ADXL355 handle
  3. Start Processing: Call rt_process_start() to begin real-time acquisition and processing
  4. Monitor (optional): Use rt_process_get_status() and rt_process_get_stats() to monitor performance
  5. Stop Processing: Call rt_process_stop() to stop acquisition and processing
  6. Deinitialize: Call rt_process_deinit() to clean up resources

Features

Real-Time Processing

  • Real-time feature extraction from sensor data
  • Configurable processing algorithms
  • Low-latency processing pipeline

Multiple Output Channels

  • MQTT: Publish processed results to MQTT broker
  • Serial: Output processed data via serial port
  • LCD: Visual feedback for acceleration detection (optional)

Acceleration Detection

When enabled, the module can detect specific acceleration conditions and provide visual feedback via LCD:

  • Conditions: |x| > 0.5g OR |y| > 0.5g OR |z| < 0.5g
  • LCD shows RED when conditions are met, WHITE otherwise
  • Color persistence: 0.3 seconds hold duration

Thread Safety

  • All architectures use proper synchronization mechanisms
  • Producer and Consumer tasks are properly isolated
  • Buffer/queue access is protected by mutexes or semaphores

Error Handling

Common error codes:

  • ESP_ERR_INVALID_ARG: Invalid configuration parameters
  • ESP_ERR_INVALID_STATE: Operation not allowed in current state
  • ESP_ERR_NO_MEM: Memory allocation failed
  • ESP_FAIL: General failure

Integration Notes

Sensor Integration

  • Requires ADXL355 sensor handle (must be initialized before use)
  • Sensor handle must be set before starting processing
  • Compatible with existing ADXL355 driver interface

MQTT Integration

  • Requires MQTT client to be connected
  • Uses default topic if mqtt_topic is NULL
  • Non-blocking publish operations

LCD Integration

  • Optional LCD support for acceleration detection feedback
  • Requires LCD driver to be initialized
  • Uses color display for visual feedback

Documentation