# MiniProfiler Communication Protocol ## Overview MiniProfiler uses a binary command-response protocol over UART/Serial communication at 115200 baud (configurable). - **Command packets**: Host → Embedded device - **Response packets**: Embedded device → Host ## Command Packet Format Commands are sent from the host to the embedded device. ### Structure ``` ┌────────┬─────────┬─────────────┬──────────┬──────────┐ │ Header │ Command │ Payload Len │ Payload │ Checksum │ │ (1B) │ (1B) │ (1B) │ (8B) │ (1B) │ └────────┴─────────┴─────────────┴──────────┴──────────┘ Total: 12 bytes ``` ### Fields | Field | Size | Value | Description | |-------|------|-------|-------------| | Header | 1 byte | `0x55` | Packet start marker | | Command | 1 byte | See table below | Command code | | Payload Length | 1 byte | 0-8 | Actual payload size | | Payload | 8 bytes | Variable | Command parameters (padded with 0x00) | | Checksum | 1 byte | Sum of all bytes & 0xFF | Simple checksum | ### Command Codes | Command | Code | Description | Payload | |---------|------|-------------|---------| | START_PROFILING | `0x01` | Start profiling | None | | STOP_PROFILING | `0x02` | Stop profiling | None | | GET_STATUS | `0x03` | Request status | None | | RESET_BUFFERS | `0x04` | Clear profiling buffers | None | | GET_METADATA | `0x05` | Request device metadata | None | | SET_CONFIG | `0x06` | Configure profiler | Config bytes (reserved) | ### Example Start profiling command: ``` 55 01 00 00 00 00 00 00 00 00 00 56 │ │ │ └─────────────────────┘ │ │ │ │ Payload (8B) │ │ │ └── Payload Length (0) │ │ └── Command (START_PROFILING) │ └── Header (0x55) └── Checksum ``` ## Response Packet Format Responses are sent from the embedded device to the host. ### Structure ``` ┌─────────┬──────┬──────────┬──────────┬────────┬─────┐ │ Header │ Type │ Length │ Payload │ CRC │ End │ │ (2B) │ (1B) │ (2B) │ (N bytes)│ (2B) │(1B) │ └─────────┴──────┴──────────┴──────────┴────────┴─────┘ Total: 8 + N bytes ``` ### Fields | Field | Size | Value | Description | |-------|------|-------|-------------| | Header | 2 bytes | `0xAA55` | Packet start marker (little-endian) | | Type | 1 byte | See table below | Response type | | Length | 2 bytes | 0-65535 | Payload size (little-endian) | | Payload | Variable | Depends on type | Response data | | CRC16 | 2 bytes | CRC16-CCITT | Checksum of header+type+length+payload | | End | 1 byte | `0x0A` | Packet end marker (newline) | ### Response Types | Type | Code | Description | Payload Format | |------|------|-------------|----------------| | ACK | `0x01` | Command acknowledged | None | | NACK | `0x02` | Command failed | None | | METADATA | `0x03` | Device metadata | See Metadata Payload | | STATUS | `0x04` | Device status | See Status Payload | | PROFILE_DATA | `0x05` | Profiling records | See Profile Data Payload | ## Payload Formats ### Metadata Payload (28 bytes) Sent in response to `GET_METADATA` command or automatically on startup. ```c struct MetadataPayload { uint32_t mcu_clock_hz; // MCU clock frequency in Hz uint32_t timer_freq; // Profiling timer frequency in Hz uint32_t elf_build_id; // CRC32 of .text section for version matching char fw_version[16]; // Firmware version string (null-terminated) } __attribute__((packed)); ``` **Example:** - MCU Clock: 168,000,000 Hz (168 MHz STM32F4) - Timer Freq: 1,000,000 Hz (1 MHz for microsecond precision) - Build ID: 0xDEADBEEF - FW Version: "v1.0.0" ### Status Payload (10 bytes) Sent in response to `GET_STATUS` command. ```c struct StatusPayload { uint8_t is_profiling; // 1 if profiling active, 0 otherwise uint32_t buffer_overflows; // Number of buffer overflow events uint32_t records_captured; // Total records captured uint8_t buffer_usage_percent; // Current buffer usage (0-100) } __attribute__((packed)); ``` ### Profile Data Payload (Variable) Sent automatically during profiling or in response to data requests. ```c struct ProfileDataPayload { uint8_t version; // Protocol version (0x01) uint16_t record_count; // Number of records in this packet ProfileRecord records[]; // Array of profile records } __attribute__((packed)); ``` Each `ProfileRecord` is 14 bytes: ```c struct ProfileRecord { uint32_t func_addr; // Function address (from instrumentation) uint32_t entry_time; // Entry timestamp in microseconds uint32_t duration_us; // Function duration in microseconds uint16_t depth; // Call stack depth (0 = root) } __attribute__((packed)); ``` **Field Details:** - `func_addr`: Return address from `__builtin_return_address(0)` in instrumentation hook - `entry_time`: Microsecond timestamp when function was entered (wraps at ~71 minutes) - `duration_us`: Time spent in function including children - `depth`: Call stack depth (0 for main, 1 for functions called by main, etc.) ## Communication Flow ### Initial Connection ``` Host Device | | |--- GET_METADATA ------>| |<---- METADATA ---------| | | |--- START_PROFILING --->| |<---- ACK --------------| | | |<---- PROFILE_DATA -----| (continuous stream) |<---- PROFILE_DATA -----| |<---- PROFILE_DATA -----| | ... | ``` ### Typical Session ``` 1. Host connects to serial port 2. Host sends GET_METADATA 3. Device responds with METADATA packet 4. Host sends START_PROFILING 5. Device responds with ACK 6. Device begins streaming PROFILE_DATA packets 7. Host processes and visualizes data in real-time 8. Host sends STOP_PROFILING when done 9. Device responds with ACK and stops streaming ``` ## Error Handling ### CRC Mismatch If the host detects a CRC mismatch: - Log the error - Discard the packet - Continue listening for next packet - No retransmission (real-time streaming) ### Packet Loss - Sequence numbers not implemented (keeps protocol simple) - Missing data will create gaps in visualization - Not critical for profiling use case ### Buffer Overflow - Device sets `buffer_overflows` counter in status - Host should warn user - Options: increase baud rate, reduce instrumentation, or use sampling ## Performance Considerations ### Bandwidth Calculation At 115200 baud: - Effective throughput: ~11.5 KB/s - Profile record size: 14 bytes - Packet overhead: ~8 bytes per packet - Records per packet (typical): 20 - Packet size: 8 + 3 + 280 = 291 bytes - Packets per second: ~39 - Records per second: ~780 **Recommendation:** If profiling >780 function calls/sec, increase baud rate to 460800 or 921600. ### Timing Overhead Instrumentation overhead per function: - Entry hook: ~0.5-1 μs - Exit hook: ~0.5-1 μs - Total: ~1-2 μs per function call Target: <5% overhead for typical applications. ## Protocol Versioning Current version: **0x01** The `version` field in `ProfileDataPayload` allows for future protocol extensions: - v0x01: Current format (entry_time + duration) - v0x02: Future - could add ISR markers, task IDs, etc. - v0x03: Future - compressed format, delta encoding Host should check version and handle accordingly or reject unsupported versions. ## Example Packet Dumps ### GET_METADATA Command ``` 55 05 00 00 00 00 00 00 00 00 00 5A ``` ### METADATA Response ``` AA 55 03 1C 00 // Header, Type=METADATA, Length=28 00 09 FB 0A // mcu_clock_hz = 168000000 40 42 0F 00 // timer_freq = 1000000 EF BE AD DE // build_id = 0xDEADBEEF 76 31 2E 30 2E 30 00 ... // fw_version = "v1.0.0\0..." XX XX // CRC16 0A // End marker ``` ### PROFILE_DATA Response (2 records) ``` AA 55 05 1F 00 // Header, Type=PROFILE_DATA, Length=31 01 // Version = 1 02 00 // Record count = 2 // Record 1 00 01 00 08 // func_addr = 0x08000100 E8 03 00 00 // entry_time = 1000 μs D0 07 00 00 // duration = 2000 μs 00 00 // depth = 0 // Record 2 20 02 00 08 // func_addr = 0x08000220 F4 01 00 00 // entry_time = 500 μs 2C 01 00 00 // duration = 300 μs 01 00 // depth = 1 XX XX // CRC16 0A // End marker ``` ## Implementation Notes ### Embedded Side - Use DMA for UART transmission to minimize CPU overhead - Implement ring buffer with power-of-2 size for efficient modulo operations - Send packets in background task or idle hook - Consider double-buffering: one buffer for capturing, one for transmitting ### Host Side - Use state machine for packet parsing (don't assume atomicity) - Handle partial packets gracefully - Verify CRC before processing payload - Use background thread for serial reading to not block UI ## References - CRC16-CCITT: Polynomial 0x1021, initial value 0xFFFF - Little-endian byte order for multi-byte integers - GCC instrumentation: `__cyg_profile_func_enter/exit`