NOTES¶

Tip

TLDR: For real-time processing tasks, it is recommended to:

(1) enable DMA for data acquisition;
(2) separate data production and consumption into independent tasks and decouple them via buffers;
(3) use a ring buffer when overlapping window processing is required, otherwise use a dual-buffer (ping-pong) scheme to improve throughput;
(4) choose single-core or dual-core processing based on computational requirements.

Overview¶

The real-time sensing and processing module provides sensor data acquisition with real-time computation and feature extraction. Unlike the online and offline sensing modules which focus on data collection and transmission, this module adds real-time processing capabilities, enabling on-device computation and feature extraction during data acquisition.

The module implements three different architectures, each optimized for different performance requirements and use cases. All three architectures follow the same fundamental design pattern: Data Production → Data Buffering → Data Consumption, but differ in their implementation details for each stage.

Architecture selection is done at compile time via a macro definition, providing a unified API interface regardless of the underlying implementation.

Relationship with Other Modules¶

Functional Comparison¶

Module	Primary Function	Processing	Use Case
Online Sensing	Real-time data acquisition and transmission	None	Continuous monitoring, live data streaming
Offline Sensing	High-frequency batch collection and storage	None (post-processing)	Batch data collection for later analysis
RT-Sense-Processing	Real-time acquisition + real-time processing	Real-time computation	On-device feature extraction, edge AI

When to Use Which Module¶

Use Online Sensing when:

You need to stream raw sensor data in real-time
Data processing will be done on a remote server or cloud platform
Simple, low-latency data transmission is required
Sampling frequency is moderate (0.1 - 300 Hz)

Use Offline Sensing when:

You need to collect high-frequency data for later analysis
Post-processing and analysis will be performed offline
Long-duration data collection is required
Sampling frequency is high (up to 4000 Hz)

Use RT-Sense-Processing when:

You need real-time computation and feature extraction
On-device processing is required (edge AI, real-time detection)
Processing must keep up with data acquisition rate
You need immediate feedback based on processed data

Technical Architecture Comparison¶

Online Sensing:

Single task with ESP timer callback
Direct data formatting and transmission
Simple, lightweight implementation
Suitable for low to medium frequency

Offline Sensing:

Single task with ESP timer callback
Memory buffer + SD card storage
Two-phase approach: collection then storage
Suitable for high-frequency batch collection

RT-Sense-Processing:

Multi-task architecture (Producer-Consumer pattern)
Three architecture options with different optimizations
Parallel data acquisition and processing
Suitable for real-time computation requirements

Architecture Components¶

All three real-time processing architectures follow the same three-stage pipeline:

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│  Data Production│───▶│ Data Buffering  │───▶│ Data Consumption│
│  (Producer)     │    │  (Buffer/Cache) │    │  (Consumer)     │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Stage 1: Data Production¶

The Producer stage is responsible for acquiring sensor data at fixed intervals. All three architectures use the same fundamental approach:

Timer-Driven: ESP timer triggers data acquisition at configured sampling frequency
Task Notification: Timer callback notifies producer task via xTaskNotify (non-blocking)
Sensor Reading: Producer task reads ADXL355 accelerometer data (x, y, z) and temperature via SPI
Sample Preparation: Packages data into rt_process_sample_t structure with timestamp

Differences between architectures:

Architecture	SPI Transfer Method	CPU Usage	Notes
Architecture 1: Producer-Consumer	Standard SPI (CPU-driven)	Medium	Direct SPI read, CPU handles transfer
Architecture 2: DMA + Double Buffer	DMA-assisted SPI	Low	SPI driver automatically uses DMA for transfers
Architecture 3: DMA + Dual Core	DMA-assisted SPI	Lowest	Same as Architecture 2, but producer pinned to Core 0

Stage 2: Data Buffering¶

The Buffering stage decouples data production from consumption, allowing parallel operation. This is where the three architectures differ most significantly:

Architecture 1: Circular Buffer¶

Buffer Type: Single circular buffer (FIFO)
Size: 512 samples (configurable)
Memory: PSRAM allocation
Synchronization: Mutex-protected access
Behavior: FIFO overwrite when full (oldest data overwritten)
Write Strategy: Producer writes to current write pointer, advances circularly
Read Strategy: Consumer reads from read pointer, advances circularly, supports arbitrary position and length data access
Overflow Handling: Read pointer advances automatically when buffer full (overwrite)

Circular Buffer Animation

Figure: Circular buffer operation showing producer writing and consumer reading with FIFO overwrite behavior

Key Advantages of Circular Buffer:

The circular buffer is the ideal choice for real-time DSP applications, especially for scenarios requiring overlapping window processing:

Flexible Window Access: Consumer can read data windows of arbitrary length from any position in the buffer, without buffer boundary restrictions
Window Overlap Support: Easily supports sliding windows, overlapping windows, and other DSP algorithms (e.g., FFT, STFT, filter banks)
Continuous Data Stream: Data is stored continuously in the circular buffer, supporting cross-boundary reads without data copying or reorganization
Low Latency Processing: Can process the latest N samples, or extract arbitrary time windows from historical data
Memory Efficiency: Single buffer, minimal memory footprint

Typical Application Scenarios: - FFT/STFT Analysis: Requires sliding windows, processing 256 samples each time, but window slides only 128 samples (50% overlap) - Filter Banks: Requires multiple overlapping time windows for frequency domain analysis - Feature Extraction: Needs to extract features of different time scales from continuous data streams - Real-time Spectrum Analysis: Requires continuous, overlapping spectrum calculations

Limitations of Double Buffer: The double buffer architecture has significant limitations when processing overlapping windows: - Fixed Window Boundaries: Can only process entire buffers (512 samples), cannot flexibly choose window position and size - Window Overlap Difficulties: To achieve overlapping windows, additional data copying and reorganization is required, increasing latency and memory overhead - Discontinuous Data Access: Data is distributed across two separate buffers, cross-buffer access requires complex data management - Not Suitable for Sliding Windows: Each processing cycle must wait for the entire buffer to fill, cannot achieve flexible sliding window processing

Architecture 2: DMA + Double Buffer (Ping-Pong)¶

Buffer Type: Two separate buffers (A and B) in ping-pong configuration
Size: 512 samples per buffer (configurable)
Memory: PSRAM allocation for both buffers
Synchronization: Mutex per buffer + Semaphore for buffer-ready notification
Behavior: Producer fills one buffer while consumer processes the other
Write Strategy: Producer writes to active buffer (A or B), switches when full
Read Strategy: Consumer waits for buffer-ready semaphore, processes entire buffer
Overflow Handling: Producer switches buffers even if previous buffer not fully processed (overwrite detected)

Double Buffering

Figure: Double buffering (ping-pong) operation showing parallel acquisition and processing

Architecture 3: DMA + Dual Core + Double Buffer¶

Buffer Type: Two separate buffers (A and B) in ping-pong configuration
Size: 256 samples per buffer (configurable, smaller for dual-core efficiency)
Memory: PSRAM allocation for both buffers
Synchronization: Mutex per buffer + Semaphore for buffer-ready notification
Behavior: Same as Architecture 2, but with core pinning
Write Strategy: Producer (Core 0) writes to active buffer, switches when full
Read Strategy: Consumer (Core 1) waits for buffer-ready semaphore, processes entire buffer
Overflow Handling: Same as Architecture 2
Core Assignment: Producer on Core 0, Consumer on Core 1 (true parallelism)

Double Buffering

Figure: Double buffering with dual-core division (Core 0: Producer, Core 1: Consumer)

Stage 3: Data Consumption¶

The Consumer stage processes the buffered data and performs real-time computation. All three architectures use the same processing logic, but differ in how they access the buffer:

Processing Example: Acceleration Detection¶

For demonstration purposes, the module implements acceleration detection as the data consumption example:

Detection Conditions:

|x| > 0.5g OR |y| > 0.5g OR |z| < 0.5g

Output:

LCD Visual Feedback: RED when conditions met, WHITE otherwise
Color Persistence: 0.3 seconds hold duration
Serial Output: Optional processed data output
MQTT Output: Optional processed results publishing

Architecture-Specific Consumption Details¶

Architecture	Buffer Access	Processing Strategy	Sample Count
Architecture 1: Producer-Consumer	Mutex-protected circular buffer read	Flexible processing: last N samples, arbitrary windows, supports overlapping windows	Up to 10 samples per cycle (configurable), supports arbitrary window size and position
Architecture 2: DMA + Double Buffer	Semaphore-wait for buffer-ready, then process entire buffer	Fixed processing of entire buffer, no window overlap support	512 samples per cycle (fixed)
Architecture 3: DMA + Dual Core	Semaphore-wait for buffer-ready, then process entire buffer	Fixed processing of entire buffer, no window overlap support	256 samples per cycle (fixed)

Common Processing Flow:

Wait for data availability (mutex/semaphore)
Read samples from buffer
Perform acceleration detection on each sample
Update LCD color based on detection results
Output results (serial, MQTT if enabled)
Update processing statistics

Architecture Comparison¶

The following table summarizes the key differences between the three architectures across all three stages:

Feature	Architecture 1: Producer-Consumer	Architecture 2: DMA + Double Buffer	Architecture 3: DMA + Dual Core
Data Production
SPI Transfer	DMA-assisted SPI	DMA-assisted SPI	DMA-assisted SPI
Data Buffering
Buffer Type	Circular Buffer	Double Buffer (Ping-Pong)	Double Buffer (Ping-Pong)
Buffer Size	512 samples	512 samples × 2	256 samples × 2
Synchronization	Mutex	Mutex + Semaphore	Mutex + Semaphore
Data Consumption
Processing Strategy	Flexible: last N samples, arbitrary windows, supports overlapping windows	Fixed: entire buffer only, no window overlap	Fixed: entire buffer only, no window overlap
Core Assignment	Automatic (FreeRTOS)	Automatic (FreeRTOS)	Core 0: Producer, Core 1: Consumer
Overall Performance
Frequency Range	0.1 - 1000 Hz	0.1 - 10000 Hz	0.1 - 10000 Hz
Suitable Frequency	100 Hz - 1 kHz (recommended)	1 kHz - 10 kHz (recommended)	Highest performance (recommended)
CPU Usage	Medium	Low (DMA handles transfers)	Lowest (dual-core utilization)
Memory Usage	Circular Buffer (512 samples)	Two buffers (512 samples each)	Two buffers (256 samples each)
Advantages	Simple, decoupled, supports window overlap, flexible data access	High throughput, parallel processing	Maximum performance, true parallelism
Limitations	Buffer may become bottleneck	Higher memory usage, no window overlap support, fixed window size	Requires dual-core support, no window overlap support, fixed window size

Architecture Selection Guide¶

Tip

Data production and data consumption can be implemented as independent tasks. For data acquisition, it is recommended to enable DMA-assisted SPI to improve efficiency. Data consumption can be executed on a single core or dual cores depending on actual requirements. Regarding buffer management, if real-time processing is required and data windows overlap, a ring buffer is recommended; if there is no overlap between data windows and batch-oriented processing is preferred, a dual-buffer (ping-pong) scheme can be used to improve throughput.

Choose Architecture 1 (Producer-Consumer) when:

Window overlap processing is required (e.g., FFT, STFT, filter banks, and other real-time DSP algorithms)
Flexible data window access is needed (sliding windows, arbitrary position and length data extraction)
Sampling frequency is moderate (0.1 - 1000 Hz, recommended: 100 Hz - 1 kHz)
Simple implementation is preferred
Memory resources are limited
You need a straightforward, well-understood pattern
Processing can work with small batches (10 samples) or arbitrary window sizes

Choose Architecture 2 (DMA + Double Buffer) when:

Window overlap processing is NOT required (processing algorithm does not need sliding windows or overlapping windows)
Sampling frequency is high (0.1 - 10000 Hz, recommended: 1 kHz - 10 kHz)
You need high throughput
CPU usage should be minimized
Parallel acquisition and processing is beneficial
Processing benefits from larger batches (512 samples), and fixed window size is acceptable
Note: If your DSP algorithm requires window overlap (e.g., FFT overlap, STFT), choose Architecture 1

Choose Architecture 3 (DMA + Dual Core) when:

Window overlap processing is NOT required (processing algorithm does not need sliding windows or overlapping windows)
Maximum performance is required
You have a dual-core ESP32 platform
Processing workload is computationally intensive
True parallelism between acquisition and processing is needed
Processing benefits from dedicated core assignment, and fixed window size is acceptable
Note: If your DSP algorithm requires window overlap (e.g., FFT overlap, STFT), choose Architecture 1

Unified API¶

All three architectures share the same unified API interface, making it easy to switch between implementations without changing application code. Architecture selection is done at compile time via the RT_PROCESS_ARCH_TYPE macro.

Compile-Time Architecture Selection¶

// Default: Producer-Consumer
#include "real-time-process-arch.h"

// Or explicitly select architecture:
#define RT_PROCESS_ARCH_TYPE RT_ARCH_PRODUCER_CONSUMER
#include "real-time-process-arch.h"

// Or select DMA + Double Buffer:
#define RT_PROCESS_ARCH_TYPE RT_ARCH_DMA_DOUBLE_BUFFER
#include "real-time-process-arch.h"

// Or select DMA + Dual Core:
#define RT_PROCESS_ARCH_TYPE RT_ARCH_DMA_DUAL_CORE
#include "real-time-process-arch.h"

API Functions¶

All architectures provide the following unified interface:

rt_process_init() - Initialize the architecture
rt_process_set_sensor_handle() - Set ADXL355 sensor handle
rt_process_start() - Start real-time processing
rt_process_stop() - Stop real-time processing
rt_process_deinit() - Deinitialize the architecture
rt_process_get_status() - Get current status
rt_process_get_stats() - Get performance statistics
rt_process_set_frequency() - Update sampling frequency
rt_process_is_running() - Check if processing is running
rt_process_set_accel_detection() - Enable/disable acceleration detection

Configuration¶

Configuration Structure¶

typedef struct
{
    float sampling_frequency_hz;     // Sampling frequency in Hz (default: 100.0)
    uint32_t queue_size;             // Queue size for producer-consumer (default: 50)
    bool enable_mqtt;                // Enable MQTT output (default: false)
    bool enable_serial;              // Enable serial output (default: true)
    bool enable_accel_detection;     // Enable acceleration detection with LCD feedback (default: false)
    const char *mqtt_topic;         // MQTT topic (default: NULL uses default topic)
} rt_process_config_t;

Default Configuration¶

Sampling Frequency: 100.0 Hz
Queue Size (Architecture 1): 50 samples
Buffer Size (Architecture ⅔): 512 samples (Architecture 2), 256 samples (Architecture 3)
MQTT Output: Disabled
Serial Output: Enabled
Acceleration Detection: Disabled

Performance Statistics¶

The module collects comprehensive performance statistics for monitoring and optimization:

typedef struct
{
    uint32_t total_samples;         // Total number of samples acquired
    uint32_t processed_samples;     // Number of samples processed
    uint32_t dropped_samples;       // Number of samples dropped (queue/buffer full)
    float avg_acquisition_time_us;  // Average acquisition time per sample
    float avg_process_time_us;      // Average processing time per sample
    float cpu_usage_core0;          // CPU usage on Core 0 (%)
    float cpu_usage_core1;          // CPU usage on Core 1 (%) (for dual-core arch)
    uint64_t last_sample_time_us;   // Timestamp of last sample
} rt_process_stats_t;

Usage Workflow¶

Initialize: Call rt_process_init() with configuration (or NULL for defaults)
Set Sensor Handle: Call rt_process_set_sensor_handle() with ADXL355 handle
Start Processing: Call rt_process_start() to begin real-time acquisition and processing
Monitor (optional): Use rt_process_get_status() and rt_process_get_stats() to monitor performance
Stop Processing: Call rt_process_stop() to stop acquisition and processing
Deinitialize: Call rt_process_deinit() to clean up resources

Features¶

Real-Time Processing¶

Real-time feature extraction from sensor data
Configurable processing algorithms
Low-latency processing pipeline

Multiple Output Channels¶

MQTT: Publish processed results to MQTT broker
Serial: Output processed data via serial port
LCD: Visual feedback for acceleration detection (optional)

Acceleration Detection¶

When enabled, the module can detect specific acceleration conditions and provide visual feedback via LCD:

Conditions: |x| > 0.5g OR |y| > 0.5g OR |z| < 0.5g
LCD shows RED when conditions are met, WHITE otherwise
Color persistence: 0.3 seconds hold duration

Thread Safety¶

All architectures use proper synchronization mechanisms
Producer and Consumer tasks are properly isolated
Buffer/queue access is protected by mutexes or semaphores

Error Handling¶

Common error codes:

ESP_ERR_INVALID_ARG: Invalid configuration parameters
ESP_ERR_INVALID_STATE: Operation not allowed in current state
ESP_ERR_NO_MEM: Memory allocation failed
ESP_FAIL: General failure

Integration Notes¶

Sensor Integration¶

Requires ADXL355 sensor handle (must be initialized before use)
Sensor handle must be set before starting processing
Compatible with existing ADXL355 driver interface

MQTT Integration¶

Requires MQTT client to be connected
Uses default topic if mqtt_topic is NULL
Non-blocking publish operations

LCD Integration¶

Optional LCD support for acceleration detection feedback
Requires LCD driver to be initialized
Uses color display for visual feedback

Documentation¶

Unified Interface Code: Complete code for unified interface
Architecture 1: Producer-Consumer: Detailed documentation
Architecture 2: DMA + Double Buffer: Detailed documentation
Architecture 3: DMA + Dual Core: Detailed documentation