Files
MiniProfiler/docs/PROTOCOL.md
Atharva Sawant 852957a7de Initialized MiniProfiler project
- Contains the host code with a protocol implementation, data analyser and web-based visualiser
2025-11-27 20:34:41 +05:30

9.7 KiB

MiniProfiler Communication Protocol

Overview

MiniProfiler uses a binary command-response protocol over UART/Serial communication at 115200 baud (configurable).

  • Command packets: Host → Embedded device
  • Response packets: Embedded device → Host

Command Packet Format

Commands are sent from the host to the embedded device.

Structure

┌────────┬─────────┬─────────────┬──────────┬──────────┐
│ Header │ Command │ Payload Len │ Payload  │ Checksum │
│  (1B)  │  (1B)   │    (1B)     │  (8B)    │   (1B)   │
└────────┴─────────┴─────────────┴──────────┴──────────┘
Total: 12 bytes

Fields

Field Size Value Description
Header 1 byte 0x55 Packet start marker
Command 1 byte See table below Command code
Payload Length 1 byte 0-8 Actual payload size
Payload 8 bytes Variable Command parameters (padded with 0x00)
Checksum 1 byte Sum of all bytes & 0xFF Simple checksum

Command Codes

Command Code Description Payload
START_PROFILING 0x01 Start profiling None
STOP_PROFILING 0x02 Stop profiling None
GET_STATUS 0x03 Request status None
RESET_BUFFERS 0x04 Clear profiling buffers None
GET_METADATA 0x05 Request device metadata None
SET_CONFIG 0x06 Configure profiler Config bytes (reserved)

Example

Start profiling command:

55 01 00 00 00 00 00 00 00 00 00 56
│  │  │  └─────────────────────┘ │
│  │  │         Payload (8B)      │
│  │  └── Payload Length (0)      │
│  └── Command (START_PROFILING)  │
└── Header (0x55)                 └── Checksum

Response Packet Format

Responses are sent from the embedded device to the host.

Structure

┌─────────┬──────┬──────────┬──────────┬────────┬─────┐
│ Header  │ Type │  Length  │ Payload  │  CRC   │ End │
│  (2B)   │ (1B) │   (2B)   │ (N bytes)│  (2B)  │(1B) │
└─────────┴──────┴──────────┴──────────┴────────┴─────┘
Total: 8 + N bytes

Fields

Field Size Value Description
Header 2 bytes 0xAA55 Packet start marker (little-endian)
Type 1 byte See table below Response type
Length 2 bytes 0-65535 Payload size (little-endian)
Payload Variable Depends on type Response data
CRC16 2 bytes CRC16-CCITT Checksum of header+type+length+payload
End 1 byte 0x0A Packet end marker (newline)

Response Types

Type Code Description Payload Format
ACK 0x01 Command acknowledged None
NACK 0x02 Command failed None
METADATA 0x03 Device metadata See Metadata Payload
STATUS 0x04 Device status See Status Payload
PROFILE_DATA 0x05 Profiling records See Profile Data Payload

Payload Formats

Metadata Payload (28 bytes)

Sent in response to GET_METADATA command or automatically on startup.

struct MetadataPayload {
    uint32_t mcu_clock_hz;    // MCU clock frequency in Hz
    uint32_t timer_freq;      // Profiling timer frequency in Hz
    uint32_t elf_build_id;    // CRC32 of .text section for version matching
    char fw_version[16];      // Firmware version string (null-terminated)
} __attribute__((packed));

Example:

  • MCU Clock: 168,000,000 Hz (168 MHz STM32F4)
  • Timer Freq: 1,000,000 Hz (1 MHz for microsecond precision)
  • Build ID: 0xDEADBEEF
  • FW Version: "v1.0.0"

Status Payload (10 bytes)

Sent in response to GET_STATUS command.

struct StatusPayload {
    uint8_t is_profiling;           // 1 if profiling active, 0 otherwise
    uint32_t buffer_overflows;      // Number of buffer overflow events
    uint32_t records_captured;      // Total records captured
    uint8_t buffer_usage_percent;   // Current buffer usage (0-100)
} __attribute__((packed));

Profile Data Payload (Variable)

Sent automatically during profiling or in response to data requests.

struct ProfileDataPayload {
    uint8_t version;              // Protocol version (0x01)
    uint16_t record_count;        // Number of records in this packet
    ProfileRecord records[];      // Array of profile records
} __attribute__((packed));

Each ProfileRecord is 14 bytes:

struct ProfileRecord {
    uint32_t func_addr;      // Function address (from instrumentation)
    uint32_t entry_time;     // Entry timestamp in microseconds
    uint32_t duration_us;    // Function duration in microseconds
    uint16_t depth;          // Call stack depth (0 = root)
} __attribute__((packed));

Field Details:

  • func_addr: Return address from __builtin_return_address(0) in instrumentation hook
  • entry_time: Microsecond timestamp when function was entered (wraps at ~71 minutes)
  • duration_us: Time spent in function including children
  • depth: Call stack depth (0 for main, 1 for functions called by main, etc.)

Communication Flow

Initial Connection

Host                    Device
  |                        |
  |--- GET_METADATA ------>|
  |<---- METADATA ---------|
  |                        |
  |--- START_PROFILING --->|
  |<---- ACK --------------|
  |                        |
  |<---- PROFILE_DATA -----|  (continuous stream)
  |<---- PROFILE_DATA -----|
  |<---- PROFILE_DATA -----|
  |         ...            |

Typical Session

1. Host connects to serial port
2. Host sends GET_METADATA
3. Device responds with METADATA packet
4. Host sends START_PROFILING
5. Device responds with ACK
6. Device begins streaming PROFILE_DATA packets
7. Host processes and visualizes data in real-time
8. Host sends STOP_PROFILING when done
9. Device responds with ACK and stops streaming

Error Handling

CRC Mismatch

If the host detects a CRC mismatch:

  • Log the error
  • Discard the packet
  • Continue listening for next packet
  • No retransmission (real-time streaming)

Packet Loss

  • Sequence numbers not implemented (keeps protocol simple)
  • Missing data will create gaps in visualization
  • Not critical for profiling use case

Buffer Overflow

  • Device sets buffer_overflows counter in status
  • Host should warn user
  • Options: increase baud rate, reduce instrumentation, or use sampling

Performance Considerations

Bandwidth Calculation

At 115200 baud:

  • Effective throughput: ~11.5 KB/s
  • Profile record size: 14 bytes
  • Packet overhead: ~8 bytes per packet
  • Records per packet (typical): 20
  • Packet size: 8 + 3 + 280 = 291 bytes
  • Packets per second: ~39
  • Records per second: ~780

Recommendation: If profiling >780 function calls/sec, increase baud rate to 460800 or 921600.

Timing Overhead

Instrumentation overhead per function:

  • Entry hook: ~0.5-1 μs
  • Exit hook: ~0.5-1 μs
  • Total: ~1-2 μs per function call

Target: <5% overhead for typical applications.

Protocol Versioning

Current version: 0x01

The version field in ProfileDataPayload allows for future protocol extensions:

  • v0x01: Current format (entry_time + duration)
  • v0x02: Future - could add ISR markers, task IDs, etc.
  • v0x03: Future - compressed format, delta encoding

Host should check version and handle accordingly or reject unsupported versions.

Example Packet Dumps

GET_METADATA Command

55 05 00 00 00 00 00 00 00 00 00 5A

METADATA Response

AA 55 03 1C 00                    // Header, Type=METADATA, Length=28
00 09 FB 0A                        // mcu_clock_hz = 168000000
40 42 0F 00                        // timer_freq = 1000000
EF BE AD DE                        // build_id = 0xDEADBEEF
76 31 2E 30 2E 30 00 ...          // fw_version = "v1.0.0\0..."
XX XX                              // CRC16
0A                                 // End marker

PROFILE_DATA Response (2 records)

AA 55 05 1F 00                    // Header, Type=PROFILE_DATA, Length=31
01                                 // Version = 1
02 00                              // Record count = 2

// Record 1
00 01 00 08                        // func_addr = 0x08000100
E8 03 00 00                        // entry_time = 1000 μs
D0 07 00 00                        // duration = 2000 μs
00 00                              // depth = 0

// Record 2
20 02 00 08                        // func_addr = 0x08000220
F4 01 00 00                        // entry_time = 500 μs
2C 01 00 00                        // duration = 300 μs
01 00                              // depth = 1

XX XX                              // CRC16
0A                                 // End marker

Implementation Notes

Embedded Side

  • Use DMA for UART transmission to minimize CPU overhead
  • Implement ring buffer with power-of-2 size for efficient modulo operations
  • Send packets in background task or idle hook
  • Consider double-buffering: one buffer for capturing, one for transmitting

Host Side

  • Use state machine for packet parsing (don't assume atomicity)
  • Handle partial packets gracefully
  • Verify CRC before processing payload
  • Use background thread for serial reading to not block UI

References

  • CRC16-CCITT: Polynomial 0x1021, initial value 0xFFFF
  • Little-endian byte order for multi-byte integers
  • GCC instrumentation: __cyg_profile_func_enter/exit