Skip to content

NOTES

Overview

Architecture 3 implements a DMA + Dual Core Division of Labor pattern, where data acquisition and processing are assigned to different CPU cores. This architecture provides maximum performance by utilizing both cores of a dual-core ESP32 platform, enabling true parallelism between acquisition and processing.

Architecture Principles

Dual Core Division of Labor

The architecture divides work between two CPU cores:

  • Core 0: Producer task (data acquisition with DMA)
  • Core 1: Consumer task (data processing)
  • Double Buffer: Ping-pong buffer mechanism for parallel operation
  • Core Pinning: Tasks are explicitly pinned to specific cores
  • True Parallelism: Acquisition and processing run simultaneously on different cores

Key Design Decisions

  1. Core Assignment: Producer on Core 0, Consumer on Core 1
  2. Task Pinning: Uses xTaskCreatePinnedToCore() to ensure tasks run on specific cores
  3. Double Buffer: Same ping-pong mechanism as Architecture 2
  4. DMA Support: SPI transfers use DMA automatically
  5. Isolated Cores: Each core handles its dedicated task without interference

Implementation Details

Timer-Based Sampling

  • ESP Timer: Creates a periodic timer with period = 1,000,000 / sampling_frequency_hz microseconds
  • Timer Callback: Executes in timer context, notifies producer task (Core 0) via xTaskNotify
  • Immediate First Sample: Performs one sample immediately before starting periodic timer

Producer Task (Core 0)

Priority: 10 (High priority for timely acquisition)

Core Assignment: Pinned to Core 0 via xTaskCreatePinnedToCore()

Functionality:

  1. Runs exclusively on Core 0
  2. Waits for timer notification via xTaskNotifyWait
  3. Reads sensor data (DMA handles SPI transfer automatically)
  4. Prepares sample structure with timestamp
  5. Writes to active buffer (A or B) with mutex protection
  6. When buffer is full:
  7. Marks buffer as ready
  8. Switches to other buffer
  9. Signals consumer (Core 1) via semaphore
  10. Resets write index
  11. Updates acquisition statistics

Core Isolation: Producer task runs only on Core 0, ensuring dedicated CPU time for acquisition

Consumer Task (Core 1)

Priority: 10 (High priority for processing)

Core Assignment: Pinned to Core 1 via xTaskCreatePinnedToCore()

Functionality:

  1. Runs exclusively on Core 1
  2. Waits for buffer ready semaphore (either A or B)
  3. Takes mutex for the ready buffer
  4. Processes all samples in the buffer
  5. Marks buffer as processed and resets ready flag
  6. Updates processing statistics
  7. Outputs results (serial, MQTT, LCD)

Core Isolation: Consumer task runs only on Core 1, ensuring dedicated CPU time for processing

Double Buffer Mechanism

Buffer Size: 256 samples per buffer (configurable via RT_DMA_DC_BUFFER_SIZE)

Memory: Both buffers allocated from PSRAM

Synchronization:

  • Mutex for each buffer (protects cross-core access)
  • Binary semaphore for each buffer (signals when ready)
  • Volatile flags for buffer ready state

Operation Flow:

  1. Producer (Core 0) fills buffer A
  2. When buffer A is full, producer switches to buffer B and signals consumer (Core 1)
  3. Consumer (Core 1) processes buffer A while producer (Core 0) fills buffer B
  4. When buffer B is full, producer switches to buffer A and signals consumer
  5. Consumer processes buffer B while producer fills buffer A
  6. Cycle repeats with true parallelism

Double Buffering

Figure: Double buffer (ping-pong) mechanism with dual-core division - Producer on Core 0, Consumer on Core 1

Core Pinning

Tasks are explicitly pinned to cores using xTaskCreatePinnedToCore():

// Producer on Core 0
xTaskCreatePinnedToCore(
    producer_task,
    "rt_dma_dc_producer",
    RT_DMA_DC_PRODUCER_STACK_SIZE,
    NULL,
    RT_DMA_DC_PRODUCER_PRIORITY,
    &s_producer_task_handle,
    RT_DMA_DC_PRODUCER_CORE  // Core 0
);

// Consumer on Core 1
xTaskCreatePinnedToCore(
    consumer_task,
    "rt_dma_dc_consumer",
    RT_DMA_DC_CONSUMER_STACK_SIZE,
    NULL,
    RT_DMA_DC_CONSUMER_PRIORITY,
    &s_consumer_task_handle,
    RT_DMA_DC_CONSUMER_CORE  // Core 1
);

DMA Support

  • Automatic DMA: ESP-IDF SPI driver automatically uses DMA for transfers
  • CPU Reduction: DMA handles data transfer, freeing Core 0 for other tasks
  • Non-blocking: SPI operations are non-blocking with DMA

Data Flow

ESP Timer → Timer Callback → xTaskNotify → Producer Task (Core 0)
                                    Read Sensor (SPI + DMA)
                                    Active Buffer (A or B)
                                    Buffer Full → Switch Buffer
                                    Semaphore Signal → Consumer Task (Core 1)
                                    Process Buffer (Core 1)
                          ┌───────────────────┴───────────────────┐
                          ↓                   ↓                    ↓
                       Serial              MQTT                  LCD

Configuration

Default Configuration

#define RT_DMA_DC_BUFFER_SIZE 256
#define RT_DMA_DC_PRODUCER_PRIORITY 10
#define RT_DMA_DC_CONSUMER_PRIORITY 10
#define RT_DMA_DC_PRODUCER_STACK_SIZE 4096
#define RT_DMA_DC_CONSUMER_STACK_SIZE 8192
#define RT_DMA_DC_PRODUCER_CORE 0
#define RT_DMA_DC_CONSUMER_CORE 1

Configuration Parameters

  • Sampling Frequency: 0.1 - 10000 Hz (validated at initialization)
  • Buffer Size: 256 samples per buffer (fixed at compile time)
  • Total Memory: 256 × 2 × 32 bytes = 16 KB from PSRAM
  • Producer Core: Core 0 (fixed)
  • Consumer Core: Core 1 (fixed)

Features

Advantages

  • Maximum Performance: Utilizes both CPU cores for true parallelism
  • Core Isolation: Each task has dedicated CPU core
  • Low CPU Usage: DMA handles transfers, Core 0 has more free time
  • Highest Throughput: Best performance for computationally intensive processing
  • No Core Contention: Producer and consumer never compete for the same core

Limitations

  • Requires Dual-Core: Only works on dual-core ESP32 platforms
  • Higher Memory Usage: Requires two buffers (double the memory)
  • Fixed Core Assignment: Core assignment is fixed at compile time
  • Buffer Overwrite Risk: If consumer cannot keep up, buffers may be overwritten

Performance Characteristics

Suitable Frequency Range

  • Valid Range: 0.1 - 10000 Hz (validated at initialization)
  • Recommended: Highest performance requirements (typically 1 kHz - 10 kHz)
  • Maximum: Up to 10 kHz (validated limit)
  • Minimum: 0.1 Hz (practical limit)

Resource Usage

  • Memory: Two buffers (256 samples × 2 × 32 bytes = 16 KB from PSRAM)
  • CPU Core 0: Producer task (acquisition)
  • CPU Core 1: Consumer task (processing)
  • Synchronization: Two mutexes and two binary semaphores (cross-core)

Usage Notes

  1. Dual-Core Requirement: This architecture requires a dual-core ESP32 platform
  2. Initialization Order: Must call arch_dma_dc_init() before arch_dma_dc_set_sensor_handle()
  3. Sensor Handle: Must be set before starting
  4. Buffer Monitoring: Monitor overwrite_count in statistics to detect if consumer is falling behind
  5. Core Verification: Tasks log their core ID on startup for verification
  6. Processing Latency: Processing latency depends on buffer fill time (buffer_size / sampling_frequency)

Error Handling

  • Mutex Timeout: Producer drops sample if mutex cannot be acquired within 10ms
  • Buffer Not Initialized: Sample dropped if buffers not ready
  • Sensor Read Failure: Logged but does not stop processing
  • Task Creation Failure: Returns error, cleans up resources

Thread Safety

  • Cross-Core Synchronization: Mutexes and semaphores work across cores
  • Task Isolation: Producer (Core 0) and consumer (Core 1) are properly isolated
  • Semaphore Signaling: Binary semaphores provide safe inter-core communication
  • Statistics: Updated atomically within mutex-protected sections

Core Assignment Verification

Tasks verify their core assignment on startup:

ESP_LOGI(TAG, "Producer task started on Core %d (DMA + Dual Core mode)", 
         xPortGetCoreID());

ESP_LOGI(TAG, "Consumer task started on Core %d (DMA + Dual Core mode)", 
         xPortGetCoreID());

This allows verification that tasks are running on the correct cores.