- Contains the host code with a protocol implementation, data analyser and web-based visualiser
9.7 KiB
MiniProfiler Communication Protocol
Overview
MiniProfiler uses a binary command-response protocol over UART/Serial communication at 115200 baud (configurable).
- Command packets: Host → Embedded device
- Response packets: Embedded device → Host
Command Packet Format
Commands are sent from the host to the embedded device.
Structure
┌────────┬─────────┬─────────────┬──────────┬──────────┐
│ Header │ Command │ Payload Len │ Payload │ Checksum │
│ (1B) │ (1B) │ (1B) │ (8B) │ (1B) │
└────────┴─────────┴─────────────┴──────────┴──────────┘
Total: 12 bytes
Fields
| Field | Size | Value | Description |
|---|---|---|---|
| Header | 1 byte | 0x55 |
Packet start marker |
| Command | 1 byte | See table below | Command code |
| Payload Length | 1 byte | 0-8 | Actual payload size |
| Payload | 8 bytes | Variable | Command parameters (padded with 0x00) |
| Checksum | 1 byte | Sum of all bytes & 0xFF | Simple checksum |
Command Codes
| Command | Code | Description | Payload |
|---|---|---|---|
| START_PROFILING | 0x01 |
Start profiling | None |
| STOP_PROFILING | 0x02 |
Stop profiling | None |
| GET_STATUS | 0x03 |
Request status | None |
| RESET_BUFFERS | 0x04 |
Clear profiling buffers | None |
| GET_METADATA | 0x05 |
Request device metadata | None |
| SET_CONFIG | 0x06 |
Configure profiler | Config bytes (reserved) |
Example
Start profiling command:
55 01 00 00 00 00 00 00 00 00 00 56
│ │ │ └─────────────────────┘ │
│ │ │ Payload (8B) │
│ │ └── Payload Length (0) │
│ └── Command (START_PROFILING) │
└── Header (0x55) └── Checksum
Response Packet Format
Responses are sent from the embedded device to the host.
Structure
┌─────────┬──────┬──────────┬──────────┬────────┬─────┐
│ Header │ Type │ Length │ Payload │ CRC │ End │
│ (2B) │ (1B) │ (2B) │ (N bytes)│ (2B) │(1B) │
└─────────┴──────┴──────────┴──────────┴────────┴─────┘
Total: 8 + N bytes
Fields
| Field | Size | Value | Description |
|---|---|---|---|
| Header | 2 bytes | 0xAA55 |
Packet start marker (little-endian) |
| Type | 1 byte | See table below | Response type |
| Length | 2 bytes | 0-65535 | Payload size (little-endian) |
| Payload | Variable | Depends on type | Response data |
| CRC16 | 2 bytes | CRC16-CCITT | Checksum of header+type+length+payload |
| End | 1 byte | 0x0A |
Packet end marker (newline) |
Response Types
| Type | Code | Description | Payload Format |
|---|---|---|---|
| ACK | 0x01 |
Command acknowledged | None |
| NACK | 0x02 |
Command failed | None |
| METADATA | 0x03 |
Device metadata | See Metadata Payload |
| STATUS | 0x04 |
Device status | See Status Payload |
| PROFILE_DATA | 0x05 |
Profiling records | See Profile Data Payload |
Payload Formats
Metadata Payload (28 bytes)
Sent in response to GET_METADATA command or automatically on startup.
struct MetadataPayload {
uint32_t mcu_clock_hz; // MCU clock frequency in Hz
uint32_t timer_freq; // Profiling timer frequency in Hz
uint32_t elf_build_id; // CRC32 of .text section for version matching
char fw_version[16]; // Firmware version string (null-terminated)
} __attribute__((packed));
Example:
- MCU Clock: 168,000,000 Hz (168 MHz STM32F4)
- Timer Freq: 1,000,000 Hz (1 MHz for microsecond precision)
- Build ID: 0xDEADBEEF
- FW Version: "v1.0.0"
Status Payload (10 bytes)
Sent in response to GET_STATUS command.
struct StatusPayload {
uint8_t is_profiling; // 1 if profiling active, 0 otherwise
uint32_t buffer_overflows; // Number of buffer overflow events
uint32_t records_captured; // Total records captured
uint8_t buffer_usage_percent; // Current buffer usage (0-100)
} __attribute__((packed));
Profile Data Payload (Variable)
Sent automatically during profiling or in response to data requests.
struct ProfileDataPayload {
uint8_t version; // Protocol version (0x01)
uint16_t record_count; // Number of records in this packet
ProfileRecord records[]; // Array of profile records
} __attribute__((packed));
Each ProfileRecord is 14 bytes:
struct ProfileRecord {
uint32_t func_addr; // Function address (from instrumentation)
uint32_t entry_time; // Entry timestamp in microseconds
uint32_t duration_us; // Function duration in microseconds
uint16_t depth; // Call stack depth (0 = root)
} __attribute__((packed));
Field Details:
func_addr: Return address from__builtin_return_address(0)in instrumentation hookentry_time: Microsecond timestamp when function was entered (wraps at ~71 minutes)duration_us: Time spent in function including childrendepth: Call stack depth (0 for main, 1 for functions called by main, etc.)
Communication Flow
Initial Connection
Host Device
| |
|--- GET_METADATA ------>|
|<---- METADATA ---------|
| |
|--- START_PROFILING --->|
|<---- ACK --------------|
| |
|<---- PROFILE_DATA -----| (continuous stream)
|<---- PROFILE_DATA -----|
|<---- PROFILE_DATA -----|
| ... |
Typical Session
1. Host connects to serial port
2. Host sends GET_METADATA
3. Device responds with METADATA packet
4. Host sends START_PROFILING
5. Device responds with ACK
6. Device begins streaming PROFILE_DATA packets
7. Host processes and visualizes data in real-time
8. Host sends STOP_PROFILING when done
9. Device responds with ACK and stops streaming
Error Handling
CRC Mismatch
If the host detects a CRC mismatch:
- Log the error
- Discard the packet
- Continue listening for next packet
- No retransmission (real-time streaming)
Packet Loss
- Sequence numbers not implemented (keeps protocol simple)
- Missing data will create gaps in visualization
- Not critical for profiling use case
Buffer Overflow
- Device sets
buffer_overflowscounter in status - Host should warn user
- Options: increase baud rate, reduce instrumentation, or use sampling
Performance Considerations
Bandwidth Calculation
At 115200 baud:
- Effective throughput: ~11.5 KB/s
- Profile record size: 14 bytes
- Packet overhead: ~8 bytes per packet
- Records per packet (typical): 20
- Packet size: 8 + 3 + 280 = 291 bytes
- Packets per second: ~39
- Records per second: ~780
Recommendation: If profiling >780 function calls/sec, increase baud rate to 460800 or 921600.
Timing Overhead
Instrumentation overhead per function:
- Entry hook: ~0.5-1 μs
- Exit hook: ~0.5-1 μs
- Total: ~1-2 μs per function call
Target: <5% overhead for typical applications.
Protocol Versioning
Current version: 0x01
The version field in ProfileDataPayload allows for future protocol extensions:
- v0x01: Current format (entry_time + duration)
- v0x02: Future - could add ISR markers, task IDs, etc.
- v0x03: Future - compressed format, delta encoding
Host should check version and handle accordingly or reject unsupported versions.
Example Packet Dumps
GET_METADATA Command
55 05 00 00 00 00 00 00 00 00 00 5A
METADATA Response
AA 55 03 1C 00 // Header, Type=METADATA, Length=28
00 09 FB 0A // mcu_clock_hz = 168000000
40 42 0F 00 // timer_freq = 1000000
EF BE AD DE // build_id = 0xDEADBEEF
76 31 2E 30 2E 30 00 ... // fw_version = "v1.0.0\0..."
XX XX // CRC16
0A // End marker
PROFILE_DATA Response (2 records)
AA 55 05 1F 00 // Header, Type=PROFILE_DATA, Length=31
01 // Version = 1
02 00 // Record count = 2
// Record 1
00 01 00 08 // func_addr = 0x08000100
E8 03 00 00 // entry_time = 1000 μs
D0 07 00 00 // duration = 2000 μs
00 00 // depth = 0
// Record 2
20 02 00 08 // func_addr = 0x08000220
F4 01 00 00 // entry_time = 500 μs
2C 01 00 00 // duration = 300 μs
01 00 // depth = 1
XX XX // CRC16
0A // End marker
Implementation Notes
Embedded Side
- Use DMA for UART transmission to minimize CPU overhead
- Implement ring buffer with power-of-2 size for efficient modulo operations
- Send packets in background task or idle hook
- Consider double-buffering: one buffer for capturing, one for transmitting
Host Side
- Use state machine for packet parsing (don't assume atomicity)
- Handle partial packets gracefully
- Verify CRC before processing payload
- Use background thread for serial reading to not block UI
References
- CRC16-CCITT: Polynomial 0x1021, initial value 0xFFFF
- Little-endian byte order for multi-byte integers
- GCC instrumentation:
__cyg_profile_func_enter/exit