NOTES¶
Overview¶
Architecture 3 implements a DMA + Dual Core Division of Labor pattern, where data acquisition and processing are assigned to different CPU cores. This architecture provides maximum performance by utilizing both cores of a dual-core ESP32 platform, enabling true parallelism between acquisition and processing.
Architecture Principles¶
Dual Core Division of Labor¶
The architecture divides work between two CPU cores:
- Core 0: Producer task (data acquisition with DMA)
- Core 1: Consumer task (data processing)
- Double Buffer: Ping-pong buffer mechanism for parallel operation
- Core Pinning: Tasks are explicitly pinned to specific cores
- True Parallelism: Acquisition and processing run simultaneously on different cores
Key Design Decisions¶
- Core Assignment: Producer on Core 0, Consumer on Core 1
- Task Pinning: Uses
xTaskCreatePinnedToCore()to ensure tasks run on specific cores - Double Buffer: Same ping-pong mechanism as Architecture 2
- DMA Support: SPI transfers use DMA automatically
- Isolated Cores: Each core handles its dedicated task without interference
Implementation Details¶
Timer-Based Sampling¶
- ESP Timer: Creates a periodic timer with period =
1,000,000 / sampling_frequency_hzmicroseconds - Timer Callback: Executes in timer context, notifies producer task (Core 0) via
xTaskNotify - Immediate First Sample: Performs one sample immediately before starting periodic timer
Producer Task (Core 0)¶
Priority: 10 (High priority for timely acquisition)
Core Assignment: Pinned to Core 0 via xTaskCreatePinnedToCore()
Functionality:
- Runs exclusively on Core 0
- Waits for timer notification via
xTaskNotifyWait - Reads sensor data (DMA handles SPI transfer automatically)
- Prepares sample structure with timestamp
- Writes to active buffer (A or B) with mutex protection
- When buffer is full:
- Marks buffer as ready
- Switches to other buffer
- Signals consumer (Core 1) via semaphore
- Resets write index
- Updates acquisition statistics
Core Isolation: Producer task runs only on Core 0, ensuring dedicated CPU time for acquisition
Consumer Task (Core 1)¶
Priority: 10 (High priority for processing)
Core Assignment: Pinned to Core 1 via xTaskCreatePinnedToCore()
Functionality:
- Runs exclusively on Core 1
- Waits for buffer ready semaphore (either A or B)
- Takes mutex for the ready buffer
- Processes all samples in the buffer
- Marks buffer as processed and resets ready flag
- Updates processing statistics
- Outputs results (serial, MQTT, LCD)
Core Isolation: Consumer task runs only on Core 1, ensuring dedicated CPU time for processing
Double Buffer Mechanism¶
Buffer Size: 256 samples per buffer (configurable via RT_DMA_DC_BUFFER_SIZE)
Memory: Both buffers allocated from PSRAM
Synchronization:
- Mutex for each buffer (protects cross-core access)
- Binary semaphore for each buffer (signals when ready)
- Volatile flags for buffer ready state
Operation Flow:
- Producer (Core 0) fills buffer A
- When buffer A is full, producer switches to buffer B and signals consumer (Core 1)
- Consumer (Core 1) processes buffer A while producer (Core 0) fills buffer B
- When buffer B is full, producer switches to buffer A and signals consumer
- Consumer processes buffer B while producer fills buffer A
- Cycle repeats with true parallelism

Figure: Double buffer (ping-pong) mechanism with dual-core division - Producer on Core 0, Consumer on Core 1
Core Pinning¶
Tasks are explicitly pinned to cores using xTaskCreatePinnedToCore():
// Producer on Core 0
xTaskCreatePinnedToCore(
producer_task,
"rt_dma_dc_producer",
RT_DMA_DC_PRODUCER_STACK_SIZE,
NULL,
RT_DMA_DC_PRODUCER_PRIORITY,
&s_producer_task_handle,
RT_DMA_DC_PRODUCER_CORE // Core 0
);
// Consumer on Core 1
xTaskCreatePinnedToCore(
consumer_task,
"rt_dma_dc_consumer",
RT_DMA_DC_CONSUMER_STACK_SIZE,
NULL,
RT_DMA_DC_CONSUMER_PRIORITY,
&s_consumer_task_handle,
RT_DMA_DC_CONSUMER_CORE // Core 1
);
DMA Support¶
- Automatic DMA: ESP-IDF SPI driver automatically uses DMA for transfers
- CPU Reduction: DMA handles data transfer, freeing Core 0 for other tasks
- Non-blocking: SPI operations are non-blocking with DMA
Data Flow¶
ESP Timer → Timer Callback → xTaskNotify → Producer Task (Core 0)
↓
Read Sensor (SPI + DMA)
↓
Active Buffer (A or B)
↓
Buffer Full → Switch Buffer
↓
Semaphore Signal → Consumer Task (Core 1)
↓
Process Buffer (Core 1)
↓
┌───────────────────┴───────────────────┐
↓ ↓ ↓
Serial MQTT LCD
Configuration¶
Default Configuration¶
#define RT_DMA_DC_BUFFER_SIZE 256
#define RT_DMA_DC_PRODUCER_PRIORITY 10
#define RT_DMA_DC_CONSUMER_PRIORITY 10
#define RT_DMA_DC_PRODUCER_STACK_SIZE 4096
#define RT_DMA_DC_CONSUMER_STACK_SIZE 8192
#define RT_DMA_DC_PRODUCER_CORE 0
#define RT_DMA_DC_CONSUMER_CORE 1
Configuration Parameters¶
- Sampling Frequency: 0.1 - 10000 Hz (validated at initialization)
- Buffer Size: 256 samples per buffer (fixed at compile time)
- Total Memory: 256 × 2 × 32 bytes = 16 KB from PSRAM
- Producer Core: Core 0 (fixed)
- Consumer Core: Core 1 (fixed)
Features¶
Advantages¶
- Maximum Performance: Utilizes both CPU cores for true parallelism
- Core Isolation: Each task has dedicated CPU core
- Low CPU Usage: DMA handles transfers, Core 0 has more free time
- Highest Throughput: Best performance for computationally intensive processing
- No Core Contention: Producer and consumer never compete for the same core
Limitations¶
- Requires Dual-Core: Only works on dual-core ESP32 platforms
- Higher Memory Usage: Requires two buffers (double the memory)
- Fixed Core Assignment: Core assignment is fixed at compile time
- Buffer Overwrite Risk: If consumer cannot keep up, buffers may be overwritten
Performance Characteristics¶
Suitable Frequency Range¶
- Valid Range: 0.1 - 10000 Hz (validated at initialization)
- Recommended: Highest performance requirements (typically 1 kHz - 10 kHz)
- Maximum: Up to 10 kHz (validated limit)
- Minimum: 0.1 Hz (practical limit)
Resource Usage¶
- Memory: Two buffers (256 samples × 2 × 32 bytes = 16 KB from PSRAM)
- CPU Core 0: Producer task (acquisition)
- CPU Core 1: Consumer task (processing)
- Synchronization: Two mutexes and two binary semaphores (cross-core)
Usage Notes¶
- Dual-Core Requirement: This architecture requires a dual-core ESP32 platform
- Initialization Order: Must call
arch_dma_dc_init()beforearch_dma_dc_set_sensor_handle() - Sensor Handle: Must be set before starting
- Buffer Monitoring: Monitor
overwrite_countin statistics to detect if consumer is falling behind - Core Verification: Tasks log their core ID on startup for verification
- Processing Latency: Processing latency depends on buffer fill time (buffer_size / sampling_frequency)
Error Handling¶
- Mutex Timeout: Producer drops sample if mutex cannot be acquired within 10ms
- Buffer Not Initialized: Sample dropped if buffers not ready
- Sensor Read Failure: Logged but does not stop processing
- Task Creation Failure: Returns error, cleans up resources
Thread Safety¶
- Cross-Core Synchronization: Mutexes and semaphores work across cores
- Task Isolation: Producer (Core 0) and consumer (Core 1) are properly isolated
- Semaphore Signaling: Binary semaphores provide safe inter-core communication
- Statistics: Updated atomically within mutex-protected sections
Core Assignment Verification¶
Tasks verify their core assignment on startup:
ESP_LOGI(TAG, "Producer task started on Core %d (DMA + Dual Core mode)",
xPortGetCoreID());
ESP_LOGI(TAG, "Consumer task started on Core %d (DMA + Dual Core mode)",
xPortGetCoreID());
This allows verification that tasks are running on the correct cores.