NOTES¶

Overview¶

Architecture 3 implements a DMA + Dual Core Division of Labor pattern, where data acquisition and processing are assigned to different CPU cores. This architecture provides maximum performance by utilizing both cores of a dual-core ESP32 platform, enabling true parallelism between acquisition and processing.

Architecture Principles¶

Dual Core Division of Labor¶

The architecture divides work between two CPU cores:

Core 0: Producer task (data acquisition with DMA)
Core 1: Consumer task (data processing)
Double Buffer: Ping-pong buffer mechanism for parallel operation
Core Pinning: Tasks are explicitly pinned to specific cores
True Parallelism: Acquisition and processing run simultaneously on different cores

Key Design Decisions¶

Core Assignment: Producer on Core 0, Consumer on Core 1
Task Pinning: Uses xTaskCreatePinnedToCore() to ensure tasks run on specific cores
Double Buffer: Same ping-pong mechanism as Architecture 2
DMA Support: SPI transfers use DMA automatically
Isolated Cores: Each core handles its dedicated task without interference

Implementation Details¶

Timer-Based Sampling¶

ESP Timer: Creates a periodic timer with period = 1,000,000 / sampling_frequency_hz microseconds
Timer Callback: Executes in timer context, notifies producer task (Core 0) via xTaskNotify
Immediate First Sample: Performs one sample immediately before starting periodic timer

Producer Task (Core 0)¶

Priority: 10 (High priority for timely acquisition)

Core Assignment: Pinned to Core 0 via xTaskCreatePinnedToCore()

Functionality:

Runs exclusively on Core 0
Waits for timer notification via xTaskNotifyWait
Reads sensor data (DMA handles SPI transfer automatically)
Prepares sample structure with timestamp
Writes to active buffer (A or B) with mutex protection
When buffer is full:
Marks buffer as ready
Switches to other buffer
Signals consumer (Core 1) via semaphore
Resets write index
Updates acquisition statistics

Core Isolation: Producer task runs only on Core 0, ensuring dedicated CPU time for acquisition

Consumer Task (Core 1)¶

Priority: 10 (High priority for processing)

Core Assignment: Pinned to Core 1 via xTaskCreatePinnedToCore()

Functionality:

Runs exclusively on Core 1
Waits for buffer ready semaphore (either A or B)
Takes mutex for the ready buffer
Processes all samples in the buffer
Marks buffer as processed and resets ready flag
Updates processing statistics
Outputs results (serial, MQTT, LCD)

Core Isolation: Consumer task runs only on Core 1, ensuring dedicated CPU time for processing

Double Buffer Mechanism¶

Buffer Size: 256 samples per buffer (configurable via RT_DMA_DC_BUFFER_SIZE)

Memory: Both buffers allocated from PSRAM

Synchronization:

Mutex for each buffer (protects cross-core access)
Binary semaphore for each buffer (signals when ready)
Volatile flags for buffer ready state

Operation Flow:

Producer (Core 0) fills buffer A
When buffer A is full, producer switches to buffer B and signals consumer (Core 1)
Consumer (Core 1) processes buffer A while producer (Core 0) fills buffer B
When buffer B is full, producer switches to buffer A and signals consumer
Consumer processes buffer B while producer fills buffer A
Cycle repeats with true parallelism

Double Buffering

Figure: Double buffer (ping-pong) mechanism with dual-core division - Producer on Core 0, Consumer on Core 1

Core Pinning¶

Tasks are explicitly pinned to cores using xTaskCreatePinnedToCore():

// Producer on Core 0
xTaskCreatePinnedToCore(
    producer_task,
    "rt_dma_dc_producer",
    RT_DMA_DC_PRODUCER_STACK_SIZE,
    NULL,
    RT_DMA_DC_PRODUCER_PRIORITY,
    &s_producer_task_handle,
    RT_DMA_DC_PRODUCER_CORE  // Core 0
);

// Consumer on Core 1
xTaskCreatePinnedToCore(
    consumer_task,
    "rt_dma_dc_consumer",
    RT_DMA_DC_CONSUMER_STACK_SIZE,
    NULL,
    RT_DMA_DC_CONSUMER_PRIORITY,
    &s_consumer_task_handle,
    RT_DMA_DC_CONSUMER_CORE  // Core 1
);

DMA Support¶

Automatic DMA: ESP-IDF SPI driver automatically uses DMA for transfers
CPU Reduction: DMA handles data transfer, freeing Core 0 for other tasks
Non-blocking: SPI operations are non-blocking with DMA

Data Flow¶

ESP Timer → Timer Callback → xTaskNotify → Producer Task (Core 0)
                                              ↓
                                    Read Sensor (SPI + DMA)
                                              ↓
                                    Active Buffer (A or B)
                                              ↓
                                    Buffer Full → Switch Buffer
                                              ↓
                                    Semaphore Signal → Consumer Task (Core 1)
                                              ↓
                                    Process Buffer (Core 1)
                                              ↓
                          ┌───────────────────┴───────────────────┐
                          ↓                   ↓                    ↓
                       Serial              MQTT                  LCD

Configuration¶

Default Configuration¶

#define RT_DMA_DC_BUFFER_SIZE 256
#define RT_DMA_DC_PRODUCER_PRIORITY 10
#define RT_DMA_DC_CONSUMER_PRIORITY 10
#define RT_DMA_DC_PRODUCER_STACK_SIZE 4096
#define RT_DMA_DC_CONSUMER_STACK_SIZE 8192
#define RT_DMA_DC_PRODUCER_CORE 0
#define RT_DMA_DC_CONSUMER_CORE 1

Configuration Parameters¶

Sampling Frequency: 0.1 - 10000 Hz (validated at initialization)
Buffer Size: 256 samples per buffer (fixed at compile time)
Total Memory: 256 × 2 × 32 bytes = 16 KB from PSRAM
Producer Core: Core 0 (fixed)
Consumer Core: Core 1 (fixed)

Features¶

Advantages¶

Maximum Performance: Utilizes both CPU cores for true parallelism
Core Isolation: Each task has dedicated CPU core
Low CPU Usage: DMA handles transfers, Core 0 has more free time
Highest Throughput: Best performance for computationally intensive processing
No Core Contention: Producer and consumer never compete for the same core

Limitations¶

Requires Dual-Core: Only works on dual-core ESP32 platforms
Higher Memory Usage: Requires two buffers (double the memory)
Fixed Core Assignment: Core assignment is fixed at compile time
Buffer Overwrite Risk: If consumer cannot keep up, buffers may be overwritten

Performance Characteristics¶

Suitable Frequency Range¶

Valid Range: 0.1 - 10000 Hz (validated at initialization)
Recommended: Highest performance requirements (typically 1 kHz - 10 kHz)
Maximum: Up to 10 kHz (validated limit)
Minimum: 0.1 Hz (practical limit)

Resource Usage¶

Memory: Two buffers (256 samples × 2 × 32 bytes = 16 KB from PSRAM)
CPU Core 0: Producer task (acquisition)
CPU Core 1: Consumer task (processing)
Synchronization: Two mutexes and two binary semaphores (cross-core)

Usage Notes¶

Dual-Core Requirement: This architecture requires a dual-core ESP32 platform
Initialization Order: Must call arch_dma_dc_init() before arch_dma_dc_set_sensor_handle()
Sensor Handle: Must be set before starting
Buffer Monitoring: Monitor overwrite_count in statistics to detect if consumer is falling behind
Core Verification: Tasks log their core ID on startup for verification
Processing Latency: Processing latency depends on buffer fill time (buffer_size / sampling_frequency)

Error Handling¶

Mutex Timeout: Producer drops sample if mutex cannot be acquired within 10ms
Buffer Not Initialized: Sample dropped if buffers not ready
Sensor Read Failure: Logged but does not stop processing
Task Creation Failure: Returns error, cleans up resources

Thread Safety¶

Cross-Core Synchronization: Mutexes and semaphores work across cores
Task Isolation: Producer (Core 0) and consumer (Core 1) are properly isolated
Semaphore Signaling: Binary semaphores provide safe inter-core communication
Statistics: Updated atomically within mutex-protected sections

Core Assignment Verification¶

Tasks verify their core assignment on startup:

ESP_LOGI(TAG, "Producer task started on Core %d (DMA + Dual Core mode)", 
         xPortGetCoreID());

ESP_LOGI(TAG, "Consumer task started on Core %d (DMA + Dual Core mode)", 
         xPortGetCoreID());

This allows verification that tasks are running on the correct cores.