Initialized MiniProfiler project
- Contains the host code with a protocol implementation, data analyser and web-based visualiser
This commit is contained in:
300
docs/PROTOCOL.md
Normal file
300
docs/PROTOCOL.md
Normal file
@@ -0,0 +1,300 @@
|
||||
# MiniProfiler Communication Protocol
|
||||
|
||||
## Overview
|
||||
|
||||
MiniProfiler uses a binary command-response protocol over UART/Serial communication at 115200 baud (configurable).
|
||||
|
||||
- **Command packets**: Host → Embedded device
|
||||
- **Response packets**: Embedded device → Host
|
||||
|
||||
## Command Packet Format
|
||||
|
||||
Commands are sent from the host to the embedded device.
|
||||
|
||||
### Structure
|
||||
|
||||
```
|
||||
┌────────┬─────────┬─────────────┬──────────┬──────────┐
|
||||
│ Header │ Command │ Payload Len │ Payload │ Checksum │
|
||||
│ (1B) │ (1B) │ (1B) │ (8B) │ (1B) │
|
||||
└────────┴─────────┴─────────────┴──────────┴──────────┘
|
||||
Total: 12 bytes
|
||||
```
|
||||
|
||||
### Fields
|
||||
|
||||
| Field | Size | Value | Description |
|
||||
|-------|------|-------|-------------|
|
||||
| Header | 1 byte | `0x55` | Packet start marker |
|
||||
| Command | 1 byte | See table below | Command code |
|
||||
| Payload Length | 1 byte | 0-8 | Actual payload size |
|
||||
| Payload | 8 bytes | Variable | Command parameters (padded with 0x00) |
|
||||
| Checksum | 1 byte | Sum of all bytes & 0xFF | Simple checksum |
|
||||
|
||||
### Command Codes
|
||||
|
||||
| Command | Code | Description | Payload |
|
||||
|---------|------|-------------|---------|
|
||||
| START_PROFILING | `0x01` | Start profiling | None |
|
||||
| STOP_PROFILING | `0x02` | Stop profiling | None |
|
||||
| GET_STATUS | `0x03` | Request status | None |
|
||||
| RESET_BUFFERS | `0x04` | Clear profiling buffers | None |
|
||||
| GET_METADATA | `0x05` | Request device metadata | None |
|
||||
| SET_CONFIG | `0x06` | Configure profiler | Config bytes (reserved) |
|
||||
|
||||
### Example
|
||||
|
||||
Start profiling command:
|
||||
```
|
||||
55 01 00 00 00 00 00 00 00 00 00 56
|
||||
│ │ │ └─────────────────────┘ │
|
||||
│ │ │ Payload (8B) │
|
||||
│ │ └── Payload Length (0) │
|
||||
│ └── Command (START_PROFILING) │
|
||||
└── Header (0x55) └── Checksum
|
||||
```
|
||||
|
||||
## Response Packet Format
|
||||
|
||||
Responses are sent from the embedded device to the host.
|
||||
|
||||
### Structure
|
||||
|
||||
```
|
||||
┌─────────┬──────┬──────────┬──────────┬────────┬─────┐
|
||||
│ Header │ Type │ Length │ Payload │ CRC │ End │
|
||||
│ (2B) │ (1B) │ (2B) │ (N bytes)│ (2B) │(1B) │
|
||||
└─────────┴──────┴──────────┴──────────┴────────┴─────┘
|
||||
Total: 8 + N bytes
|
||||
```
|
||||
|
||||
### Fields
|
||||
|
||||
| Field | Size | Value | Description |
|
||||
|-------|------|-------|-------------|
|
||||
| Header | 2 bytes | `0xAA55` | Packet start marker (little-endian) |
|
||||
| Type | 1 byte | See table below | Response type |
|
||||
| Length | 2 bytes | 0-65535 | Payload size (little-endian) |
|
||||
| Payload | Variable | Depends on type | Response data |
|
||||
| CRC16 | 2 bytes | CRC16-CCITT | Checksum of header+type+length+payload |
|
||||
| End | 1 byte | `0x0A` | Packet end marker (newline) |
|
||||
|
||||
### Response Types
|
||||
|
||||
| Type | Code | Description | Payload Format |
|
||||
|------|------|-------------|----------------|
|
||||
| ACK | `0x01` | Command acknowledged | None |
|
||||
| NACK | `0x02` | Command failed | None |
|
||||
| METADATA | `0x03` | Device metadata | See Metadata Payload |
|
||||
| STATUS | `0x04` | Device status | See Status Payload |
|
||||
| PROFILE_DATA | `0x05` | Profiling records | See Profile Data Payload |
|
||||
|
||||
## Payload Formats
|
||||
|
||||
### Metadata Payload (28 bytes)
|
||||
|
||||
Sent in response to `GET_METADATA` command or automatically on startup.
|
||||
|
||||
```c
|
||||
struct MetadataPayload {
|
||||
uint32_t mcu_clock_hz; // MCU clock frequency in Hz
|
||||
uint32_t timer_freq; // Profiling timer frequency in Hz
|
||||
uint32_t elf_build_id; // CRC32 of .text section for version matching
|
||||
char fw_version[16]; // Firmware version string (null-terminated)
|
||||
} __attribute__((packed));
|
||||
```
|
||||
|
||||
**Example:**
|
||||
- MCU Clock: 168,000,000 Hz (168 MHz STM32F4)
|
||||
- Timer Freq: 1,000,000 Hz (1 MHz for microsecond precision)
|
||||
- Build ID: 0xDEADBEEF
|
||||
- FW Version: "v1.0.0"
|
||||
|
||||
### Status Payload (10 bytes)
|
||||
|
||||
Sent in response to `GET_STATUS` command.
|
||||
|
||||
```c
|
||||
struct StatusPayload {
|
||||
uint8_t is_profiling; // 1 if profiling active, 0 otherwise
|
||||
uint32_t buffer_overflows; // Number of buffer overflow events
|
||||
uint32_t records_captured; // Total records captured
|
||||
uint8_t buffer_usage_percent; // Current buffer usage (0-100)
|
||||
} __attribute__((packed));
|
||||
```
|
||||
|
||||
### Profile Data Payload (Variable)
|
||||
|
||||
Sent automatically during profiling or in response to data requests.
|
||||
|
||||
```c
|
||||
struct ProfileDataPayload {
|
||||
uint8_t version; // Protocol version (0x01)
|
||||
uint16_t record_count; // Number of records in this packet
|
||||
ProfileRecord records[]; // Array of profile records
|
||||
} __attribute__((packed));
|
||||
```
|
||||
|
||||
Each `ProfileRecord` is 14 bytes:
|
||||
|
||||
```c
|
||||
struct ProfileRecord {
|
||||
uint32_t func_addr; // Function address (from instrumentation)
|
||||
uint32_t entry_time; // Entry timestamp in microseconds
|
||||
uint32_t duration_us; // Function duration in microseconds
|
||||
uint16_t depth; // Call stack depth (0 = root)
|
||||
} __attribute__((packed));
|
||||
```
|
||||
|
||||
**Field Details:**
|
||||
- `func_addr`: Return address from `__builtin_return_address(0)` in instrumentation hook
|
||||
- `entry_time`: Microsecond timestamp when function was entered (wraps at ~71 minutes)
|
||||
- `duration_us`: Time spent in function including children
|
||||
- `depth`: Call stack depth (0 for main, 1 for functions called by main, etc.)
|
||||
|
||||
## Communication Flow
|
||||
|
||||
### Initial Connection
|
||||
|
||||
```
|
||||
Host Device
|
||||
| |
|
||||
|--- GET_METADATA ------>|
|
||||
|<---- METADATA ---------|
|
||||
| |
|
||||
|--- START_PROFILING --->|
|
||||
|<---- ACK --------------|
|
||||
| |
|
||||
|<---- PROFILE_DATA -----| (continuous stream)
|
||||
|<---- PROFILE_DATA -----|
|
||||
|<---- PROFILE_DATA -----|
|
||||
| ... |
|
||||
```
|
||||
|
||||
### Typical Session
|
||||
|
||||
```
|
||||
1. Host connects to serial port
|
||||
2. Host sends GET_METADATA
|
||||
3. Device responds with METADATA packet
|
||||
4. Host sends START_PROFILING
|
||||
5. Device responds with ACK
|
||||
6. Device begins streaming PROFILE_DATA packets
|
||||
7. Host processes and visualizes data in real-time
|
||||
8. Host sends STOP_PROFILING when done
|
||||
9. Device responds with ACK and stops streaming
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
### CRC Mismatch
|
||||
If the host detects a CRC mismatch:
|
||||
- Log the error
|
||||
- Discard the packet
|
||||
- Continue listening for next packet
|
||||
- No retransmission (real-time streaming)
|
||||
|
||||
### Packet Loss
|
||||
- Sequence numbers not implemented (keeps protocol simple)
|
||||
- Missing data will create gaps in visualization
|
||||
- Not critical for profiling use case
|
||||
|
||||
### Buffer Overflow
|
||||
- Device sets `buffer_overflows` counter in status
|
||||
- Host should warn user
|
||||
- Options: increase baud rate, reduce instrumentation, or use sampling
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Bandwidth Calculation
|
||||
|
||||
At 115200 baud:
|
||||
- Effective throughput: ~11.5 KB/s
|
||||
- Profile record size: 14 bytes
|
||||
- Packet overhead: ~8 bytes per packet
|
||||
- Records per packet (typical): 20
|
||||
- Packet size: 8 + 3 + 280 = 291 bytes
|
||||
- Packets per second: ~39
|
||||
- Records per second: ~780
|
||||
|
||||
**Recommendation:** If profiling >780 function calls/sec, increase baud rate to 460800 or 921600.
|
||||
|
||||
### Timing Overhead
|
||||
|
||||
Instrumentation overhead per function:
|
||||
- Entry hook: ~0.5-1 μs
|
||||
- Exit hook: ~0.5-1 μs
|
||||
- Total: ~1-2 μs per function call
|
||||
|
||||
Target: <5% overhead for typical applications.
|
||||
|
||||
## Protocol Versioning
|
||||
|
||||
Current version: **0x01**
|
||||
|
||||
The `version` field in `ProfileDataPayload` allows for future protocol extensions:
|
||||
- v0x01: Current format (entry_time + duration)
|
||||
- v0x02: Future - could add ISR markers, task IDs, etc.
|
||||
- v0x03: Future - compressed format, delta encoding
|
||||
|
||||
Host should check version and handle accordingly or reject unsupported versions.
|
||||
|
||||
## Example Packet Dumps
|
||||
|
||||
### GET_METADATA Command
|
||||
```
|
||||
55 05 00 00 00 00 00 00 00 00 00 5A
|
||||
```
|
||||
|
||||
### METADATA Response
|
||||
```
|
||||
AA 55 03 1C 00 // Header, Type=METADATA, Length=28
|
||||
00 09 FB 0A // mcu_clock_hz = 168000000
|
||||
40 42 0F 00 // timer_freq = 1000000
|
||||
EF BE AD DE // build_id = 0xDEADBEEF
|
||||
76 31 2E 30 2E 30 00 ... // fw_version = "v1.0.0\0..."
|
||||
XX XX // CRC16
|
||||
0A // End marker
|
||||
```
|
||||
|
||||
### PROFILE_DATA Response (2 records)
|
||||
```
|
||||
AA 55 05 1F 00 // Header, Type=PROFILE_DATA, Length=31
|
||||
01 // Version = 1
|
||||
02 00 // Record count = 2
|
||||
|
||||
// Record 1
|
||||
00 01 00 08 // func_addr = 0x08000100
|
||||
E8 03 00 00 // entry_time = 1000 μs
|
||||
D0 07 00 00 // duration = 2000 μs
|
||||
00 00 // depth = 0
|
||||
|
||||
// Record 2
|
||||
20 02 00 08 // func_addr = 0x08000220
|
||||
F4 01 00 00 // entry_time = 500 μs
|
||||
2C 01 00 00 // duration = 300 μs
|
||||
01 00 // depth = 1
|
||||
|
||||
XX XX // CRC16
|
||||
0A // End marker
|
||||
```
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
### Embedded Side
|
||||
- Use DMA for UART transmission to minimize CPU overhead
|
||||
- Implement ring buffer with power-of-2 size for efficient modulo operations
|
||||
- Send packets in background task or idle hook
|
||||
- Consider double-buffering: one buffer for capturing, one for transmitting
|
||||
|
||||
### Host Side
|
||||
- Use state machine for packet parsing (don't assume atomicity)
|
||||
- Handle partial packets gracefully
|
||||
- Verify CRC before processing payload
|
||||
- Use background thread for serial reading to not block UI
|
||||
|
||||
## References
|
||||
|
||||
- CRC16-CCITT: Polynomial 0x1021, initial value 0xFFFF
|
||||
- Little-endian byte order for multi-byte integers
|
||||
- GCC instrumentation: `__cyg_profile_func_enter/exit`
|
||||
Reference in New Issue
Block a user