NOTES¶
Tip
TLDR: For real-time processing tasks, it is recommended to:
- (1) enable DMA for data acquisition;
- (2) separate data production and consumption into independent tasks and decouple them via buffers;
- (3) use a ring buffer when overlapping window processing is required, otherwise use a dual-buffer (ping-pong) scheme to improve throughput;
- (4) choose single-core or dual-core processing based on computational requirements.
Overview¶
The real-time sensing and processing module provides sensor data acquisition with real-time computation and feature extraction. Unlike the online and offline sensing modules which focus on data collection and transmission, this module adds real-time processing capabilities, enabling on-device computation and feature extraction during data acquisition.
The module implements three different architectures, each optimized for different performance requirements and use cases. All three architectures follow the same fundamental design pattern: Data Production → Data Buffering → Data Consumption, but differ in their implementation details for each stage.
Architecture selection is done at compile time via a macro definition, providing a unified API interface regardless of the underlying implementation.
Relationship with Other Modules¶
Functional Comparison¶
| Module | Primary Function | Processing | Use Case |
|---|---|---|---|
| Online Sensing | Real-time data acquisition and transmission | None | Continuous monitoring, live data streaming |
| Offline Sensing | High-frequency batch collection and storage | None (post-processing) | Batch data collection for later analysis |
| RT-Sense-Processing | Real-time acquisition + real-time processing | Real-time computation | On-device feature extraction, edge AI |
When to Use Which Module¶
Use Online Sensing when:
- You need to stream raw sensor data in real-time
- Data processing will be done on a remote server or cloud platform
- Simple, low-latency data transmission is required
- Sampling frequency is moderate (0.1 - 300 Hz)
Use Offline Sensing when:
- You need to collect high-frequency data for later analysis
- Post-processing and analysis will be performed offline
- Long-duration data collection is required
- Sampling frequency is high (up to 4000 Hz)
Use RT-Sense-Processing when:
- You need real-time computation and feature extraction
- On-device processing is required (edge AI, real-time detection)
- Processing must keep up with data acquisition rate
- You need immediate feedback based on processed data
Technical Architecture Comparison¶
Online Sensing:
- Single task with ESP timer callback
- Direct data formatting and transmission
- Simple, lightweight implementation
- Suitable for low to medium frequency
Offline Sensing:
- Single task with ESP timer callback
- Memory buffer + SD card storage
- Two-phase approach: collection then storage
- Suitable for high-frequency batch collection
RT-Sense-Processing:
- Multi-task architecture (Producer-Consumer pattern)
- Three architecture options with different optimizations
- Parallel data acquisition and processing
- Suitable for real-time computation requirements
Architecture Components¶
All three real-time processing architectures follow the same three-stage pipeline:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Data Production│───▶│ Data Buffering │───▶│ Data Consumption│
│ (Producer) │ │ (Buffer/Cache) │ │ (Consumer) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Stage 1: Data Production¶
The Producer stage is responsible for acquiring sensor data at fixed intervals. All three architectures use the same fundamental approach:
- Timer-Driven: ESP timer triggers data acquisition at configured sampling frequency
- Task Notification: Timer callback notifies producer task via
xTaskNotify(non-blocking) - Sensor Reading: Producer task reads ADXL355 accelerometer data (x, y, z) and temperature via SPI
- Sample Preparation: Packages data into
rt_process_sample_tstructure with timestamp
Differences between architectures:
| Architecture | SPI Transfer Method | CPU Usage | Notes |
|---|---|---|---|
| Architecture 1: Producer-Consumer | Standard SPI (CPU-driven) | Medium | Direct SPI read, CPU handles transfer |
| Architecture 2: DMA + Double Buffer | DMA-assisted SPI | Low | SPI driver automatically uses DMA for transfers |
| Architecture 3: DMA + Dual Core | DMA-assisted SPI | Lowest | Same as Architecture 2, but producer pinned to Core 0 |
Stage 2: Data Buffering¶
The Buffering stage decouples data production from consumption, allowing parallel operation. This is where the three architectures differ most significantly:
Architecture 1: Circular Buffer¶
- Buffer Type: Single circular buffer (FIFO)
- Size: 512 samples (configurable)
- Memory: PSRAM allocation
- Synchronization: Mutex-protected access
- Behavior: FIFO overwrite when full (oldest data overwritten)
- Write Strategy: Producer writes to current write pointer, advances circularly
- Read Strategy: Consumer reads from read pointer, advances circularly, supports arbitrary position and length data access
- Overflow Handling: Read pointer advances automatically when buffer full (overwrite)

Figure: Circular buffer operation showing producer writing and consumer reading with FIFO overwrite behavior
Key Advantages of Circular Buffer:
The circular buffer is the ideal choice for real-time DSP applications, especially for scenarios requiring overlapping window processing:
- Flexible Window Access: Consumer can read data windows of arbitrary length from any position in the buffer, without buffer boundary restrictions
- Window Overlap Support: Easily supports sliding windows, overlapping windows, and other DSP algorithms (e.g., FFT, STFT, filter banks)
- Continuous Data Stream: Data is stored continuously in the circular buffer, supporting cross-boundary reads without data copying or reorganization
- Low Latency Processing: Can process the latest N samples, or extract arbitrary time windows from historical data
- Memory Efficiency: Single buffer, minimal memory footprint
Typical Application Scenarios: - FFT/STFT Analysis: Requires sliding windows, processing 256 samples each time, but window slides only 128 samples (50% overlap) - Filter Banks: Requires multiple overlapping time windows for frequency domain analysis - Feature Extraction: Needs to extract features of different time scales from continuous data streams - Real-time Spectrum Analysis: Requires continuous, overlapping spectrum calculations
Limitations of Double Buffer: The double buffer architecture has significant limitations when processing overlapping windows: - Fixed Window Boundaries: Can only process entire buffers (512 samples), cannot flexibly choose window position and size - Window Overlap Difficulties: To achieve overlapping windows, additional data copying and reorganization is required, increasing latency and memory overhead - Discontinuous Data Access: Data is distributed across two separate buffers, cross-buffer access requires complex data management - Not Suitable for Sliding Windows: Each processing cycle must wait for the entire buffer to fill, cannot achieve flexible sliding window processing
Architecture 2: DMA + Double Buffer (Ping-Pong)¶
- Buffer Type: Two separate buffers (A and B) in ping-pong configuration
- Size: 512 samples per buffer (configurable)
- Memory: PSRAM allocation for both buffers
- Synchronization: Mutex per buffer + Semaphore for buffer-ready notification
- Behavior: Producer fills one buffer while consumer processes the other
- Write Strategy: Producer writes to active buffer (A or B), switches when full
- Read Strategy: Consumer waits for buffer-ready semaphore, processes entire buffer
- Overflow Handling: Producer switches buffers even if previous buffer not fully processed (overwrite detected)

Figure: Double buffering (ping-pong) operation showing parallel acquisition and processing
Architecture 3: DMA + Dual Core + Double Buffer¶
- Buffer Type: Two separate buffers (A and B) in ping-pong configuration
- Size: 256 samples per buffer (configurable, smaller for dual-core efficiency)
- Memory: PSRAM allocation for both buffers
- Synchronization: Mutex per buffer + Semaphore for buffer-ready notification
- Behavior: Same as Architecture 2, but with core pinning
- Write Strategy: Producer (Core 0) writes to active buffer, switches when full
- Read Strategy: Consumer (Core 1) waits for buffer-ready semaphore, processes entire buffer
- Overflow Handling: Same as Architecture 2
- Core Assignment: Producer on Core 0, Consumer on Core 1 (true parallelism)

Figure: Double buffering with dual-core division (Core 0: Producer, Core 1: Consumer)
Stage 3: Data Consumption¶
The Consumer stage processes the buffered data and performs real-time computation. All three architectures use the same processing logic, but differ in how they access the buffer:
Processing Example: Acceleration Detection¶
For demonstration purposes, the module implements acceleration detection as the data consumption example:
Detection Conditions:
|x| > 0.5g OR |y| > 0.5g OR |z| < 0.5g
Output:
- LCD Visual Feedback: RED when conditions met, WHITE otherwise
- Color Persistence: 0.3 seconds hold duration
- Serial Output: Optional processed data output
- MQTT Output: Optional processed results publishing
Architecture-Specific Consumption Details¶
| Architecture | Buffer Access | Processing Strategy | Sample Count |
|---|---|---|---|
| Architecture 1: Producer-Consumer | Mutex-protected circular buffer read | Flexible processing: last N samples, arbitrary windows, supports overlapping windows | Up to 10 samples per cycle (configurable), supports arbitrary window size and position |
| Architecture 2: DMA + Double Buffer | Semaphore-wait for buffer-ready, then process entire buffer | Fixed processing of entire buffer, no window overlap support | 512 samples per cycle (fixed) |
| Architecture 3: DMA + Dual Core | Semaphore-wait for buffer-ready, then process entire buffer | Fixed processing of entire buffer, no window overlap support | 256 samples per cycle (fixed) |
Common Processing Flow:
- Wait for data availability (mutex/semaphore)
- Read samples from buffer
- Perform acceleration detection on each sample
- Update LCD color based on detection results
- Output results (serial, MQTT if enabled)
- Update processing statistics
Architecture Comparison¶
The following table summarizes the key differences between the three architectures across all three stages:
| Feature | Architecture 1: Producer-Consumer | Architecture 2: DMA + Double Buffer | Architecture 3: DMA + Dual Core |
|---|---|---|---|
| Data Production | |||
| SPI Transfer | DMA-assisted SPI | DMA-assisted SPI | DMA-assisted SPI |
| Data Buffering | |||
| Buffer Type | Circular Buffer | Double Buffer (Ping-Pong) | Double Buffer (Ping-Pong) |
| Buffer Size | 512 samples | 512 samples × 2 | 256 samples × 2 |
| Synchronization | Mutex | Mutex + Semaphore | Mutex + Semaphore |
| Data Consumption | |||
| Processing Strategy | Flexible: last N samples, arbitrary windows, supports overlapping windows | Fixed: entire buffer only, no window overlap | Fixed: entire buffer only, no window overlap |
| Core Assignment | Automatic (FreeRTOS) | Automatic (FreeRTOS) | Core 0: Producer, Core 1: Consumer |
| Overall Performance | |||
| Frequency Range | 0.1 - 1000 Hz | 0.1 - 10000 Hz | 0.1 - 10000 Hz |
| Suitable Frequency | 100 Hz - 1 kHz (recommended) | 1 kHz - 10 kHz (recommended) | Highest performance (recommended) |
| CPU Usage | Medium | Low (DMA handles transfers) | Lowest (dual-core utilization) |
| Memory Usage | Circular Buffer (512 samples) | Two buffers (512 samples each) | Two buffers (256 samples each) |
| Advantages | Simple, decoupled, supports window overlap, flexible data access | High throughput, parallel processing | Maximum performance, true parallelism |
| Limitations | Buffer may become bottleneck | Higher memory usage, no window overlap support, fixed window size | Requires dual-core support, no window overlap support, fixed window size |
Architecture Selection Guide¶
Tip
Data production and data consumption can be implemented as independent tasks. For data acquisition, it is recommended to enable DMA-assisted SPI to improve efficiency. Data consumption can be executed on a single core or dual cores depending on actual requirements. Regarding buffer management, if real-time processing is required and data windows overlap, a ring buffer is recommended; if there is no overlap between data windows and batch-oriented processing is preferred, a dual-buffer (ping-pong) scheme can be used to improve throughput.
Choose Architecture 1 (Producer-Consumer) when:
- Window overlap processing is required (e.g., FFT, STFT, filter banks, and other real-time DSP algorithms)
- Flexible data window access is needed (sliding windows, arbitrary position and length data extraction)
- Sampling frequency is moderate (0.1 - 1000 Hz, recommended: 100 Hz - 1 kHz)
- Simple implementation is preferred
- Memory resources are limited
- You need a straightforward, well-understood pattern
- Processing can work with small batches (10 samples) or arbitrary window sizes
Choose Architecture 2 (DMA + Double Buffer) when:
- Window overlap processing is NOT required (processing algorithm does not need sliding windows or overlapping windows)
- Sampling frequency is high (0.1 - 10000 Hz, recommended: 1 kHz - 10 kHz)
- You need high throughput
- CPU usage should be minimized
- Parallel acquisition and processing is beneficial
- Processing benefits from larger batches (512 samples), and fixed window size is acceptable
- Note: If your DSP algorithm requires window overlap (e.g., FFT overlap, STFT), choose Architecture 1
Choose Architecture 3 (DMA + Dual Core) when:
- Window overlap processing is NOT required (processing algorithm does not need sliding windows or overlapping windows)
- Maximum performance is required
- You have a dual-core ESP32 platform
- Processing workload is computationally intensive
- True parallelism between acquisition and processing is needed
- Processing benefits from dedicated core assignment, and fixed window size is acceptable
- Note: If your DSP algorithm requires window overlap (e.g., FFT overlap, STFT), choose Architecture 1
Unified API¶
All three architectures share the same unified API interface, making it easy to switch between implementations without changing application code. Architecture selection is done at compile time via the RT_PROCESS_ARCH_TYPE macro.
Compile-Time Architecture Selection¶
// Default: Producer-Consumer
#include "real-time-process-arch.h"
// Or explicitly select architecture:
#define RT_PROCESS_ARCH_TYPE RT_ARCH_PRODUCER_CONSUMER
#include "real-time-process-arch.h"
// Or select DMA + Double Buffer:
#define RT_PROCESS_ARCH_TYPE RT_ARCH_DMA_DOUBLE_BUFFER
#include "real-time-process-arch.h"
// Or select DMA + Dual Core:
#define RT_PROCESS_ARCH_TYPE RT_ARCH_DMA_DUAL_CORE
#include "real-time-process-arch.h"
API Functions¶
All architectures provide the following unified interface:
rt_process_init()- Initialize the architecturert_process_set_sensor_handle()- Set ADXL355 sensor handlert_process_start()- Start real-time processingrt_process_stop()- Stop real-time processingrt_process_deinit()- Deinitialize the architecturert_process_get_status()- Get current statusrt_process_get_stats()- Get performance statisticsrt_process_set_frequency()- Update sampling frequencyrt_process_is_running()- Check if processing is runningrt_process_set_accel_detection()- Enable/disable acceleration detection
Configuration¶
Configuration Structure¶
typedef struct
{
float sampling_frequency_hz; // Sampling frequency in Hz (default: 100.0)
uint32_t queue_size; // Queue size for producer-consumer (default: 50)
bool enable_mqtt; // Enable MQTT output (default: false)
bool enable_serial; // Enable serial output (default: true)
bool enable_accel_detection; // Enable acceleration detection with LCD feedback (default: false)
const char *mqtt_topic; // MQTT topic (default: NULL uses default topic)
} rt_process_config_t;
Default Configuration¶
- Sampling Frequency: 100.0 Hz
- Queue Size (Architecture 1): 50 samples
- Buffer Size (Architecture ⅔): 512 samples (Architecture 2), 256 samples (Architecture 3)
- MQTT Output: Disabled
- Serial Output: Enabled
- Acceleration Detection: Disabled
Performance Statistics¶
The module collects comprehensive performance statistics for monitoring and optimization:
typedef struct
{
uint32_t total_samples; // Total number of samples acquired
uint32_t processed_samples; // Number of samples processed
uint32_t dropped_samples; // Number of samples dropped (queue/buffer full)
float avg_acquisition_time_us; // Average acquisition time per sample
float avg_process_time_us; // Average processing time per sample
float cpu_usage_core0; // CPU usage on Core 0 (%)
float cpu_usage_core1; // CPU usage on Core 1 (%) (for dual-core arch)
uint64_t last_sample_time_us; // Timestamp of last sample
} rt_process_stats_t;
Usage Workflow¶
- Initialize: Call
rt_process_init()with configuration (or NULL for defaults) - Set Sensor Handle: Call
rt_process_set_sensor_handle()with ADXL355 handle - Start Processing: Call
rt_process_start()to begin real-time acquisition and processing - Monitor (optional): Use
rt_process_get_status()andrt_process_get_stats()to monitor performance - Stop Processing: Call
rt_process_stop()to stop acquisition and processing - Deinitialize: Call
rt_process_deinit()to clean up resources
Features¶
Real-Time Processing¶
- Real-time feature extraction from sensor data
- Configurable processing algorithms
- Low-latency processing pipeline
Multiple Output Channels¶
- MQTT: Publish processed results to MQTT broker
- Serial: Output processed data via serial port
- LCD: Visual feedback for acceleration detection (optional)
Acceleration Detection¶
When enabled, the module can detect specific acceleration conditions and provide visual feedback via LCD:
- Conditions:
|x| > 0.5g OR |y| > 0.5g OR |z| < 0.5g - LCD shows RED when conditions are met, WHITE otherwise
- Color persistence: 0.3 seconds hold duration
Thread Safety¶
- All architectures use proper synchronization mechanisms
- Producer and Consumer tasks are properly isolated
- Buffer/queue access is protected by mutexes or semaphores
Error Handling¶
Common error codes:
ESP_ERR_INVALID_ARG: Invalid configuration parametersESP_ERR_INVALID_STATE: Operation not allowed in current stateESP_ERR_NO_MEM: Memory allocation failedESP_FAIL: General failure
Integration Notes¶
Sensor Integration¶
- Requires ADXL355 sensor handle (must be initialized before use)
- Sensor handle must be set before starting processing
- Compatible with existing ADXL355 driver interface
MQTT Integration¶
- Requires MQTT client to be connected
- Uses default topic if
mqtt_topicis NULL - Non-blocking publish operations
LCD Integration¶
- Optional LCD support for acceleration detection feedback
- Requires LCD driver to be initialized
- Uses color display for visual feedback
Documentation¶
- Unified Interface Code: Complete code for unified interface
- Architecture 1: Producer-Consumer: Detailed documentation
- Architecture 2: DMA + Double Buffer: Detailed documentation
- Architecture 3: DMA + Dual Core: Detailed documentation