Files
MiniProfiler/docs/PROTOCOL.md
Atharva Sawant 852957a7de Initialized MiniProfiler project
- Contains the host code with a protocol implementation, data analyser and web-based visualiser
2025-11-27 20:34:41 +05:30

301 lines
9.7 KiB
Markdown

# MiniProfiler Communication Protocol
## Overview
MiniProfiler uses a binary command-response protocol over UART/Serial communication at 115200 baud (configurable).
- **Command packets**: Host → Embedded device
- **Response packets**: Embedded device → Host
## Command Packet Format
Commands are sent from the host to the embedded device.
### Structure
```
┌────────┬─────────┬─────────────┬──────────┬──────────┐
│ Header │ Command │ Payload Len │ Payload │ Checksum │
│ (1B) │ (1B) │ (1B) │ (8B) │ (1B) │
└────────┴─────────┴─────────────┴──────────┴──────────┘
Total: 12 bytes
```
### Fields
| Field | Size | Value | Description |
|-------|------|-------|-------------|
| Header | 1 byte | `0x55` | Packet start marker |
| Command | 1 byte | See table below | Command code |
| Payload Length | 1 byte | 0-8 | Actual payload size |
| Payload | 8 bytes | Variable | Command parameters (padded with 0x00) |
| Checksum | 1 byte | Sum of all bytes & 0xFF | Simple checksum |
### Command Codes
| Command | Code | Description | Payload |
|---------|------|-------------|---------|
| START_PROFILING | `0x01` | Start profiling | None |
| STOP_PROFILING | `0x02` | Stop profiling | None |
| GET_STATUS | `0x03` | Request status | None |
| RESET_BUFFERS | `0x04` | Clear profiling buffers | None |
| GET_METADATA | `0x05` | Request device metadata | None |
| SET_CONFIG | `0x06` | Configure profiler | Config bytes (reserved) |
### Example
Start profiling command:
```
55 01 00 00 00 00 00 00 00 00 00 56
│ │ │ └─────────────────────┘ │
│ │ │ Payload (8B) │
│ │ └── Payload Length (0) │
│ └── Command (START_PROFILING) │
└── Header (0x55) └── Checksum
```
## Response Packet Format
Responses are sent from the embedded device to the host.
### Structure
```
┌─────────┬──────┬──────────┬──────────┬────────┬─────┐
│ Header │ Type │ Length │ Payload │ CRC │ End │
│ (2B) │ (1B) │ (2B) │ (N bytes)│ (2B) │(1B) │
└─────────┴──────┴──────────┴──────────┴────────┴─────┘
Total: 8 + N bytes
```
### Fields
| Field | Size | Value | Description |
|-------|------|-------|-------------|
| Header | 2 bytes | `0xAA55` | Packet start marker (little-endian) |
| Type | 1 byte | See table below | Response type |
| Length | 2 bytes | 0-65535 | Payload size (little-endian) |
| Payload | Variable | Depends on type | Response data |
| CRC16 | 2 bytes | CRC16-CCITT | Checksum of header+type+length+payload |
| End | 1 byte | `0x0A` | Packet end marker (newline) |
### Response Types
| Type | Code | Description | Payload Format |
|------|------|-------------|----------------|
| ACK | `0x01` | Command acknowledged | None |
| NACK | `0x02` | Command failed | None |
| METADATA | `0x03` | Device metadata | See Metadata Payload |
| STATUS | `0x04` | Device status | See Status Payload |
| PROFILE_DATA | `0x05` | Profiling records | See Profile Data Payload |
## Payload Formats
### Metadata Payload (28 bytes)
Sent in response to `GET_METADATA` command or automatically on startup.
```c
struct MetadataPayload {
uint32_t mcu_clock_hz; // MCU clock frequency in Hz
uint32_t timer_freq; // Profiling timer frequency in Hz
uint32_t elf_build_id; // CRC32 of .text section for version matching
char fw_version[16]; // Firmware version string (null-terminated)
} __attribute__((packed));
```
**Example:**
- MCU Clock: 168,000,000 Hz (168 MHz STM32F4)
- Timer Freq: 1,000,000 Hz (1 MHz for microsecond precision)
- Build ID: 0xDEADBEEF
- FW Version: "v1.0.0"
### Status Payload (10 bytes)
Sent in response to `GET_STATUS` command.
```c
struct StatusPayload {
uint8_t is_profiling; // 1 if profiling active, 0 otherwise
uint32_t buffer_overflows; // Number of buffer overflow events
uint32_t records_captured; // Total records captured
uint8_t buffer_usage_percent; // Current buffer usage (0-100)
} __attribute__((packed));
```
### Profile Data Payload (Variable)
Sent automatically during profiling or in response to data requests.
```c
struct ProfileDataPayload {
uint8_t version; // Protocol version (0x01)
uint16_t record_count; // Number of records in this packet
ProfileRecord records[]; // Array of profile records
} __attribute__((packed));
```
Each `ProfileRecord` is 14 bytes:
```c
struct ProfileRecord {
uint32_t func_addr; // Function address (from instrumentation)
uint32_t entry_time; // Entry timestamp in microseconds
uint32_t duration_us; // Function duration in microseconds
uint16_t depth; // Call stack depth (0 = root)
} __attribute__((packed));
```
**Field Details:**
- `func_addr`: Return address from `__builtin_return_address(0)` in instrumentation hook
- `entry_time`: Microsecond timestamp when function was entered (wraps at ~71 minutes)
- `duration_us`: Time spent in function including children
- `depth`: Call stack depth (0 for main, 1 for functions called by main, etc.)
## Communication Flow
### Initial Connection
```
Host Device
| |
|--- GET_METADATA ------>|
|<---- METADATA ---------|
| |
|--- START_PROFILING --->|
|<---- ACK --------------|
| |
|<---- PROFILE_DATA -----| (continuous stream)
|<---- PROFILE_DATA -----|
|<---- PROFILE_DATA -----|
| ... |
```
### Typical Session
```
1. Host connects to serial port
2. Host sends GET_METADATA
3. Device responds with METADATA packet
4. Host sends START_PROFILING
5. Device responds with ACK
6. Device begins streaming PROFILE_DATA packets
7. Host processes and visualizes data in real-time
8. Host sends STOP_PROFILING when done
9. Device responds with ACK and stops streaming
```
## Error Handling
### CRC Mismatch
If the host detects a CRC mismatch:
- Log the error
- Discard the packet
- Continue listening for next packet
- No retransmission (real-time streaming)
### Packet Loss
- Sequence numbers not implemented (keeps protocol simple)
- Missing data will create gaps in visualization
- Not critical for profiling use case
### Buffer Overflow
- Device sets `buffer_overflows` counter in status
- Host should warn user
- Options: increase baud rate, reduce instrumentation, or use sampling
## Performance Considerations
### Bandwidth Calculation
At 115200 baud:
- Effective throughput: ~11.5 KB/s
- Profile record size: 14 bytes
- Packet overhead: ~8 bytes per packet
- Records per packet (typical): 20
- Packet size: 8 + 3 + 280 = 291 bytes
- Packets per second: ~39
- Records per second: ~780
**Recommendation:** If profiling >780 function calls/sec, increase baud rate to 460800 or 921600.
### Timing Overhead
Instrumentation overhead per function:
- Entry hook: ~0.5-1 μs
- Exit hook: ~0.5-1 μs
- Total: ~1-2 μs per function call
Target: <5% overhead for typical applications.
## Protocol Versioning
Current version: **0x01**
The `version` field in `ProfileDataPayload` allows for future protocol extensions:
- v0x01: Current format (entry_time + duration)
- v0x02: Future - could add ISR markers, task IDs, etc.
- v0x03: Future - compressed format, delta encoding
Host should check version and handle accordingly or reject unsupported versions.
## Example Packet Dumps
### GET_METADATA Command
```
55 05 00 00 00 00 00 00 00 00 00 5A
```
### METADATA Response
```
AA 55 03 1C 00 // Header, Type=METADATA, Length=28
00 09 FB 0A // mcu_clock_hz = 168000000
40 42 0F 00 // timer_freq = 1000000
EF BE AD DE // build_id = 0xDEADBEEF
76 31 2E 30 2E 30 00 ... // fw_version = "v1.0.0\0..."
XX XX // CRC16
0A // End marker
```
### PROFILE_DATA Response (2 records)
```
AA 55 05 1F 00 // Header, Type=PROFILE_DATA, Length=31
01 // Version = 1
02 00 // Record count = 2
// Record 1
00 01 00 08 // func_addr = 0x08000100
E8 03 00 00 // entry_time = 1000 μs
D0 07 00 00 // duration = 2000 μs
00 00 // depth = 0
// Record 2
20 02 00 08 // func_addr = 0x08000220
F4 01 00 00 // entry_time = 500 μs
2C 01 00 00 // duration = 300 μs
01 00 // depth = 1
XX XX // CRC16
0A // End marker
```
## Implementation Notes
### Embedded Side
- Use DMA for UART transmission to minimize CPU overhead
- Implement ring buffer with power-of-2 size for efficient modulo operations
- Send packets in background task or idle hook
- Consider double-buffering: one buffer for capturing, one for transmitting
### Host Side
- Use state machine for packet parsing (don't assume atomicity)
- Handle partial packets gracefully
- Verify CRC before processing payload
- Use background thread for serial reading to not block UI
## References
- CRC16-CCITT: Polynomial 0x1021, initial value 0xFFFF
- Little-endian byte order for multi-byte integers
- GCC instrumentation: `__cyg_profile_func_enter/exit`