Initialized MiniProfiler project

- Contains the host code with a protocol implementation, data analyser and web-based visualiser
This commit is contained in:
Atharva Sawant
2025-11-27 20:34:41 +05:30
commit 852957a7de
20 changed files with 3845 additions and 0 deletions

349
docs/GETTING_STARTED.md Normal file
View File

@@ -0,0 +1,349 @@
# Getting Started with MiniProfiler
This guide will help you get started with MiniProfiler for profiling your embedded STM32 applications.
## Prerequisites
### Host System
- Python 3.8 or higher
- pip package manager
- Modern web browser (Chrome, Firefox, Edge)
- Serial port access (USB-to-Serial adapter or built-in UART)
### Embedded Target (for Phase 2)
- STM32 microcontroller (STM32F4/F7/H7 recommended)
- GCC ARM toolchain
- UART or USB-CDC peripheral configured
- ST-Link or similar programmer/debugger
## Installation
### 1. Clone the Repository
```bash
git clone https://github.com/yourusername/miniprofiler.git
cd miniprofiler
```
### 2. Install Python Dependencies
```bash
cd host
pip install -r requirements.txt
```
Or install as a package:
```bash
pip install -e .
```
### 3. Verify Installation
```bash
miniprofiler --help
```
You should see the help message with available options.
## Testing Without Hardware
Before connecting to real hardware, you can test the visualization with sample data.
### Generate Sample Data
```bash
cd host/tests
python sample_data_generator.py
```
This creates several sample data files:
- `sample_flamegraph.json` - Flame graph visualization data
- `sample_statistics.json` - Function statistics
- `sample_timeline.json` - Timeline data
- `sample_profile_data.bin` - Binary protocol data
### View Sample Visualizations
You can view the sample JSON files by loading them in the web interface or by opening them directly:
```bash
# View flame graph data
cat sample_flamegraph.json | python -m json.tool
# View statistics
cat sample_statistics.json | python -m json.tool
```
## Running the Host Application
### Start the Web Server
```bash
# From the host directory
python run.py
# Or using the installed CLI
miniprofiler
```
The server will start on `http://localhost:5000` by default.
### Custom Host/Port
```bash
miniprofiler --host 0.0.0.0 --port 8080
```
### Enable Debug Mode
```bash
miniprofiler --debug --verbose
```
## Using the Web Interface
### 1. Open the Browser
Navigate to `http://localhost:5000`
You should see the MiniProfiler dashboard with:
- Connection controls
- Profiling controls (disabled until connected)
- Status display
- Three visualization tabs
### 2. Configure Connection
Enter your serial port details:
- **Serial Port**: `/dev/ttyUSB0` (Linux/Mac) or `COM3` (Windows)
- **Baud Rate**: `115200` (default)
- **ELF Path**: Path to your `.elf` file (optional, for symbol resolution)
**Finding Your Serial Port:**
Linux/Mac:
```bash
ls /dev/tty* | grep -i usb
# or
ls /dev/tty.usb*
```
Windows:
- Open Device Manager
- Look under "Ports (COM & LPT)"
- Note the COM port number (e.g., COM3)
### 3. Connect to Device
Click the **Connect** button.
If successful, you'll see:
- Status indicator turns green
- "Connected to /dev/ttyUSB0" message
- Metadata panel appears with device information
- Profiling controls become enabled
### 4. Start Profiling
Click **Start Profiling**.
The device will begin sending profiling data, and you'll see:
- Real-time updates in all three visualization tabs
- Record count incrementing
- Summary statistics updating
### 5. Explore Visualizations
#### Flame Graph Tab
- Shows aggregate CPU time by function
- Wider bars = more time spent
- Click to zoom into specific call stacks
- Search for functions by name
- Hover for details
#### Timeline Tab
- Shows function execution over time
- X-axis = time in microseconds
- Y-axis = call stack depth
- Color = duration (darker = longer)
- Useful for finding timing issues
#### Statistics Tab
- Sortable table of function statistics
- Columns: Function, Address, Calls, Total/Avg/Min/Max Time
- Click column headers to sort
- Find hot spots and outliers
### 6. Control Profiling
- **Stop Profiling**: Pause data collection
- **Clear Data**: Reset all visualizations
- **Reset Buffers**: Clear device-side buffers
### 7. Disconnect
Click **Disconnect** when done to close the serial connection.
## Understanding the Visualizations
### Flame Graph
The flame graph shows **aggregated** profiling data:
```
┌─────────────────────────────────┐
│ main (10s) │ ← Root function
├──────────────┬──────────────────┤
│ app_loop │ process_data │ ← Called by main
│ (6s) │ (4s) │
├──────┬───────┼──────────────────┤
│ read │ write │ calculate │ ← Nested calls
│ (3s) │ (3s) │ (4s) │
└──────┴───────┴──────────────────┘
```
**Interpretation:**
- Width = total time (including children)
- Read from bottom (root) to top (leaves)
- Widest bars are hotspots to optimize
### Timeline
The timeline shows **chronological** execution:
```
Time ───────────────────────►
│ ████ func_a
│ ██ func_b (called by func_a)
│ ████ func_c
│ ██ func_d
```
**Interpretation:**
- X-axis = time progression
- Y-axis = call depth
- Gaps = idle time or excluded functions
- Useful for timing analysis and debugging
### Statistics Table
| Function | Calls | Total Time | Avg Time |
|----------|-------|------------|----------|
| main | 1 | 10000 μs | 10000 μs |
| app_loop | 100 | 6000 μs | 60 μs |
| calculate | 100 | 4000 μs | 40 μs |
**Interpretation:**
- Calls = number of times function was called
- Total = cumulative time across all calls
- Avg = total / calls
- Min/Max = shortest/longest single execution
## Troubleshooting
### "Failed to connect to /dev/ttyUSB0"
**Possible causes:**
- Wrong port name
- Port in use by another application
- Insufficient permissions
**Solutions:**
```bash
# Linux: Check permissions
ls -l /dev/ttyUSB0
sudo chmod 666 /dev/ttyUSB0
# Or add user to dialout group
sudo usermod -a -G dialout $USER
# Log out and back in
# Check if port is in use
lsof | grep ttyUSB0
```
### No Data Appearing
**Check:**
1. Is profiling started? (Click "Start Profiling")
2. Is embedded device actually profiling?
3. Is UART configured correctly on embedded side?
4. Check baud rate matches on both sides
5. Look for errors in browser console (F12)
### CRC Errors in Console
**Possible causes:**
- Baud rate mismatch
- Electrical noise on UART lines
- Cable issues
**Solutions:**
- Verify baud rate configuration
- Use shielded cable
- Add delays in embedded UART transmission
- Reduce baud rate to 57600
### Buffer Overflows
**Symptoms:**
- `buffer_overflows` counter > 0 in device status
- Missing profiling data
**Solutions:**
- Increase baud rate (460800 or 921600)
- Increase embedded ring buffer size
- Reduce instrumentation (exclude more files)
- Use sampling mode (future feature)
### Symbols Not Resolved
**Symptoms:**
- Function names show as `func_0x08000XXX` or `unknown_0x08000XXX`
**Solutions:**
- Provide path to `.elf` file in connection settings
- Ensure `.elf` file has debug symbols (`-g` flag)
- Verify `.elf` file matches firmware on device
- Check build ID in metadata matches
### Web Interface Not Loading
**Check:**
1. Is server running? Look for "Starting web server..." message
2. Correct URL? Should be `http://localhost:5000`
3. Port already in use? Try different port: `miniprofiler --port 8080`
4. Firewall blocking? Add exception for Python/Flask
## Next Steps
### For Development
1. Read [PROTOCOL.md](PROTOCOL.md) to understand the communication protocol
2. Review the code in `host/miniprofiler/` to customize behavior
3. Modify visualizations in `host/web/`
### For Embedded Integration
1. Wait for Phase 2 implementation of embedded module
2. Or start implementing based on protocol specification
3. See examples in `embedded/` directory (coming soon)
### For Testing
1. Create custom sample data with `sample_data_generator.py`
2. Test with Renode emulation (Phase 4)
3. Benchmark overhead on real hardware
## Support
- **Documentation**: See `docs/` directory
- **Issues**: Open an issue on GitHub
- **Examples**: Check `examples/` directory (coming soon)
## What's Next?
After getting familiar with the host application:
1. **Phase 2**: Implement embedded module for STM32
2. **Phase 3**: Test on real hardware
3. **Phase 4**: Set up Renode emulation for automated testing
Stay tuned for updates!

422
docs/PROJECT_STRUCTURE.md Normal file
View File

@@ -0,0 +1,422 @@
# MiniProfiler Project Structure
## Directory Layout
```
MiniProfiler/
├── docs/ # Documentation
│ ├── GETTING_STARTED.md # Quick start guide
│ ├── PROTOCOL.md # Communication protocol specification
│ └── PROJECT_STRUCTURE.md # This file
├── host/ # Host application (Python)
│ ├── miniprofiler/ # Main package
│ │ ├── __init__.py # Package initialization
│ │ ├── analyzer.py # Data analysis and visualization data generation
│ │ ├── cli.py # Command-line interface
│ │ ├── protocol.py # Binary protocol implementation
│ │ ├── serial_reader.py # Serial communication
│ │ ├── symbolizer.py # ELF/DWARF symbol resolution
│ │ └── web_server.py # Flask web server with SocketIO
│ │
│ ├── web/ # Web interface assets
│ │ ├── static/
│ │ │ ├── css/
│ │ │ │ └── style.css # Stylesheet
│ │ │ └── js/
│ │ │ └── app.js # JavaScript application logic
│ │ └── templates/
│ │ └── index.html # Main HTML template
│ │
│ ├── tests/ # Tests and utilities
│ │ ├── __init__.py
│ │ └── sample_data_generator.py # Generate mock profiling data
│ │
│ ├── requirements.txt # Python dependencies
│ ├── setup.py # Package setup
│ └── run.py # Quick start script
├── embedded/ # Embedded module (Phase 2 - TODO)
│ ├── src/
│ ├── inc/
│ └── examples/
├── .gitignore # Git ignore rules
├── CLAUDE.md # Project overview for Claude
└── README.md # Main project README
```
## Module Descriptions
### Host Application (`host/miniprofiler/`)
#### `protocol.py`
**Purpose:** Binary protocol implementation for serial communication
**Key Components:**
- `ProfileRecord`: Data class for profiling records (14 bytes)
- `Metadata`: Device metadata (MCU clock, timer freq, etc.)
- `StatusInfo`: Device status information
- `CommandPacket`: Commands sent to device
- `ResponsePacket`: Responses from device
- CRC16 calculation and validation
**Used by:** `serial_reader.py`, `analyzer.py`, `sample_data_generator.py`
---
#### `serial_reader.py`
**Purpose:** Serial port communication and packet parsing
**Key Components:**
- `SerialReader`: Main class for serial I/O
- Background thread for continuous reading
- State machine for packet parsing
- Callback-based event handling
- Command sending (START, STOP, GET_STATUS, etc.)
**Callbacks:**
- `on_profile_data`: Profiling records received
- `on_metadata`: Device metadata received
- `on_status`: Status update received
- `on_error`: Error occurred
**Used by:** `web_server.py`
---
#### `symbolizer.py`
**Purpose:** Resolve function addresses to names using ELF/DWARF debug info
**Key Components:**
- `Symbolizer`: ELF file parser
- Loads symbol table from `.elf` file
- Parses DWARF debug info for file/line mappings
- Address-to-name resolution
- Handles function address ranges
**Dependencies:** `pyelftools`
**Used by:** `analyzer.py`, `web_server.py`
---
#### `analyzer.py`
**Purpose:** Analyze profiling data and generate visualization data structures
**Key Components:**
- `ProfileAnalyzer`: Main analysis engine
- Build call tree from flat records
- Compute statistics (call counts, durations)
- Generate flame graph data (d3-flame-graph format)
- Generate timeline data (Plotly format)
- Generate statistics table data
**Data Structures:**
- `CallTreeNode`: Hierarchical call tree
- `FunctionStats`: Per-function statistics
**Used by:** `web_server.py`
---
#### `web_server.py`
**Purpose:** Flask web server with SocketIO for real-time updates
**Key Components:**
- `ProfilerWebServer`: Main server class
- Flask HTTP routes (`/`, `/api/status`, `/api/flamegraph`, etc.)
- SocketIO event handlers (connect, start_profiling, etc.)
- Integrates `SerialReader`, `Symbolizer`, and `ProfileAnalyzer`
- Real-time data streaming to web clients
**Routes:**
- `GET /`: Main web interface
- `GET /api/status`: Server status JSON
- `GET /api/flamegraph`: Flame graph data JSON
- `GET /api/timeline`: Timeline data JSON
- `GET /api/statistics`: Statistics table JSON
**SocketIO Events:**
- `connect_serial`: Connect to device
- `start_profiling`: Start profiling
- `stop_profiling`: Stop profiling
- `clear_data`: Clear all data
- Emits: `flamegraph_update`, `statistics_update`, etc.
**Used by:** `cli.py`
---
#### `cli.py`
**Purpose:** Command-line interface entry point
**Key Components:**
- Argument parsing (--host, --port, --debug, --verbose)
- Logging configuration
- Server initialization and startup
**Entry point:** `miniprofiler` command
---
### Web Interface (`host/web/`)
#### `templates/index.html`
**Purpose:** Main HTML page structure
**Features:**
- Connection controls (serial port, baud rate, ELF path)
- Profiling controls (start, stop, clear, reset)
- Status display
- Metadata panel
- Summary panel
- Three-tab interface (Flame Graph, Timeline, Statistics)
**Dependencies:**
- Socket.IO client
- D3.js
- d3-flame-graph
- Plotly.js
---
#### `static/css/style.css`
**Purpose:** Styling and layout
**Features:**
- Dark theme (VSCode-inspired)
- Responsive design
- Flexbox layouts
- Custom button styles
- Table styling
- Status indicators with animations
---
#### `static/js/app.js`
**Purpose:** Client-side application logic
**Key Functions:**
- `initializeSocket()`: Set up SocketIO connection
- `toggleConnection()`: Connect/disconnect from device
- `startProfiling()`, `stopProfiling()`: Control profiling
- `updateFlameGraph()`: Render flame graph with d3-flame-graph
- `updateTimeline()`: Render timeline with Plotly.js
- `updateStatistics()`: Update statistics table
- `showTab()`: Tab switching
**Event Handlers:**
- Socket events (connect, disconnect, data updates)
- Button clicks
- Window resize
---
### Tests (`host/tests/`)
#### `sample_data_generator.py`
**Purpose:** Generate realistic mock profiling data for testing
**Features:**
- Simulates typical embedded application (main, init, loop, sensors, etc.)
- Generates nested function calls with realistic timing
- Creates binary protocol packets
- Exports JSON files for visualization testing
**Outputs:**
- `sample_profile_data.bin`: Binary protocol data
- `sample_flamegraph.json`: Flame graph data
- `sample_statistics.json`: Statistics data
- `sample_timeline.json`: Timeline data
**Usage:**
```bash
cd host/tests
python sample_data_generator.py
```
---
## Data Flow
### Connection and Initialization
```
User Web UI Web Server Serial Reader Device
│ │ │ │ │
│─── Open Browser ──►│ │ │ │
│ │ │ │ │
│─── Enter Port ────►│ │ │ │
│─── Click Connect ─►│─── connect_serial ──►│─── connect() ─────►│ │
│ │ │ │─── Open ─────►│
│ │ │ │ │
│ │ │─── get_metadata() ►│─── CMD ──────►│
│ │ │ │◄── METADATA ──│
│ │◄── metadata ─────────│◄── on_metadata() ──│ │
│◄── Display Info ───│ │ │ │
```
### Profiling Session
```
User Web UI Web Server Analyzer Device
│ │ │ │ │
│─── Start ─────────►│─── start_profiling ─►│─── start() ─────►│ │
│ │ │ │─── CMD ────────►│
│ │ │ │ │
│ │ │ │◄── DATA ────────│
│ │ │◄── on_profile ───│ │
│ │ │ │ │
│ │ │── add_records() ►│ │
│ │ │ │─ Analyze │
│ │ │ │─ Build Tree │
│ │ │ │─ Compute Stats │
│ │ │◄── JSON ─────────│ │
│ │◄─ flamegraph_update ─│ │ │
│◄── Update Viz ─────│ │ │ │
```
## Technology Stack
### Backend
- **Python 3.8+**: Main language
- **Flask 3.0+**: Web framework
- **Flask-SocketIO 5.3+**: Real-time WebSocket communication
- **pyserial 3.5+**: Serial port communication
- **pyelftools 0.29+**: ELF/DWARF parsing
- **crc 6.1+**: CRC16 calculation
- **eventlet**: Async I/O for SocketIO
### Frontend
- **HTML5/CSS3**: Structure and styling
- **JavaScript (ES6)**: Application logic
- **Socket.IO Client**: Real-time communication
- **D3.js v7**: Visualization library
- **d3-flame-graph 4.1**: Flame graph component
- **Plotly.js 2.27**: Timeline/chart visualization
### Development Tools
- **setuptools**: Package management
- **pip**: Dependency management
- **git**: Version control
## Configuration Files
### `requirements.txt`
Python package dependencies with minimum versions
### `setup.py`
Package metadata and installation configuration
- Entry point: `miniprofiler` CLI command
- Package data includes web assets
### `.gitignore`
Excludes:
- Python bytecode and caches
- Virtual environments
- IDE configs
- Build artifacts
- Generated test data
## Key Design Decisions
### Why Command-Response Protocol?
- Allows host to control profiling (start/stop)
- Can request status and metadata
- More flexible than auto-start mode
- Small overhead acceptable at 115200 baud
### Why Entry Time + Duration?
- Enables both flame graphs (aggregate) and timelines (chronological)
- Only 40% more data than duration-only
- Essential for debugging timing-sensitive embedded systems
### Why d3-flame-graph?
- Industry standard for flame graph visualization
- Interactive (zoom, search, tooltips)
- Customizable colors and layout
- Handles large datasets efficiently
### Why Separate Analyzer Module?
- Decouples data processing from I/O
- Easier to test in isolation
- Can swap visualization formats without changing protocol
- Allows offline analysis of captured data
## Extension Points
### Adding New Commands
1. Add to `Command` enum in `protocol.py`
2. Implement in `SerialReader.send_command()`
3. Add handler in `web_server.py` SocketIO events
4. Update embedded firmware to handle command
### Adding New Visualizations
1. Add route in `web_server.py` (e.g., `/api/callgraph`)
2. Implement data generation in `analyzer.py`
3. Add HTML tab in `index.html`
4. Add JavaScript rendering in `app.js`
5. Update CSS as needed
### Supporting More Microcontrollers
1. Ensure GCC toolchain supports `-finstrument-functions`
2. Implement timing mechanism (DWT, SysTick, or custom timer)
3. Port ring buffer and UART code to new MCU
4. Test and document
### Adding Compression
1. Update protocol version to 0x02
2. Implement compression in embedded module (e.g., delta encoding)
3. Add decompression in `protocol.py`
4. Update `ProfileDataPayload` parsing
## Future Enhancements
### Phase 2: Embedded Module
- [ ] STM32 HAL/LL implementation
- [ ] FreeRTOS integration
- [ ] Example projects for STM32F4/F7/H7
- [ ] CMake build system
### Phase 3: Advanced Features
- [ ] Statistical sampling mode
- [ ] ISR profiling
- [ ] Multi-core support (dual-core STM32H7)
- [ ] Task/thread tracking for RTOS
- [ ] Filtering and search
### Phase 4: Renode Integration
- [ ] Renode platform description
- [ ] Virtual UART setup
- [ ] CI/CD integration
- [ ] Automated regression tests
### Phase 5: Analysis Tools
- [ ] Differential profiling (compare two runs)
- [ ] Export to Chrome Trace Format
- [ ] Call graph visualization
- [ ] Performance regression detection
- [ ] Integration with debuggers (GDB)
## Performance Targets
### Embedded Overhead
- **Target**: <5% CPU overhead
- **Memory**: 2-10 KB RAM for buffers
- **Instrumentation**: 1-2 μs per function call
### Host Performance
- **Latency**: <100ms from device to visualization
- **Throughput**: Handle 500-1000 records/sec
- **Memory**: Scale to 100K+ records in browser
### Bandwidth
- **115200 baud**: ~780 records/sec
- **460800 baud**: ~3100 records/sec
- **921600 baud**: ~6200 records/sec
## Contributing
See individual module docstrings for implementation details.
Follow existing code style and structure when adding features.

300
docs/PROTOCOL.md Normal file
View File

@@ -0,0 +1,300 @@
# MiniProfiler Communication Protocol
## Overview
MiniProfiler uses a binary command-response protocol over UART/Serial communication at 115200 baud (configurable).
- **Command packets**: Host → Embedded device
- **Response packets**: Embedded device → Host
## Command Packet Format
Commands are sent from the host to the embedded device.
### Structure
```
┌────────┬─────────┬─────────────┬──────────┬──────────┐
│ Header │ Command │ Payload Len │ Payload │ Checksum │
│ (1B) │ (1B) │ (1B) │ (8B) │ (1B) │
└────────┴─────────┴─────────────┴──────────┴──────────┘
Total: 12 bytes
```
### Fields
| Field | Size | Value | Description |
|-------|------|-------|-------------|
| Header | 1 byte | `0x55` | Packet start marker |
| Command | 1 byte | See table below | Command code |
| Payload Length | 1 byte | 0-8 | Actual payload size |
| Payload | 8 bytes | Variable | Command parameters (padded with 0x00) |
| Checksum | 1 byte | Sum of all bytes & 0xFF | Simple checksum |
### Command Codes
| Command | Code | Description | Payload |
|---------|------|-------------|---------|
| START_PROFILING | `0x01` | Start profiling | None |
| STOP_PROFILING | `0x02` | Stop profiling | None |
| GET_STATUS | `0x03` | Request status | None |
| RESET_BUFFERS | `0x04` | Clear profiling buffers | None |
| GET_METADATA | `0x05` | Request device metadata | None |
| SET_CONFIG | `0x06` | Configure profiler | Config bytes (reserved) |
### Example
Start profiling command:
```
55 01 00 00 00 00 00 00 00 00 00 56
│ │ │ └─────────────────────┘ │
│ │ │ Payload (8B) │
│ │ └── Payload Length (0) │
│ └── Command (START_PROFILING) │
└── Header (0x55) └── Checksum
```
## Response Packet Format
Responses are sent from the embedded device to the host.
### Structure
```
┌─────────┬──────┬──────────┬──────────┬────────┬─────┐
│ Header │ Type │ Length │ Payload │ CRC │ End │
│ (2B) │ (1B) │ (2B) │ (N bytes)│ (2B) │(1B) │
└─────────┴──────┴──────────┴──────────┴────────┴─────┘
Total: 8 + N bytes
```
### Fields
| Field | Size | Value | Description |
|-------|------|-------|-------------|
| Header | 2 bytes | `0xAA55` | Packet start marker (little-endian) |
| Type | 1 byte | See table below | Response type |
| Length | 2 bytes | 0-65535 | Payload size (little-endian) |
| Payload | Variable | Depends on type | Response data |
| CRC16 | 2 bytes | CRC16-CCITT | Checksum of header+type+length+payload |
| End | 1 byte | `0x0A` | Packet end marker (newline) |
### Response Types
| Type | Code | Description | Payload Format |
|------|------|-------------|----------------|
| ACK | `0x01` | Command acknowledged | None |
| NACK | `0x02` | Command failed | None |
| METADATA | `0x03` | Device metadata | See Metadata Payload |
| STATUS | `0x04` | Device status | See Status Payload |
| PROFILE_DATA | `0x05` | Profiling records | See Profile Data Payload |
## Payload Formats
### Metadata Payload (28 bytes)
Sent in response to `GET_METADATA` command or automatically on startup.
```c
struct MetadataPayload {
uint32_t mcu_clock_hz; // MCU clock frequency in Hz
uint32_t timer_freq; // Profiling timer frequency in Hz
uint32_t elf_build_id; // CRC32 of .text section for version matching
char fw_version[16]; // Firmware version string (null-terminated)
} __attribute__((packed));
```
**Example:**
- MCU Clock: 168,000,000 Hz (168 MHz STM32F4)
- Timer Freq: 1,000,000 Hz (1 MHz for microsecond precision)
- Build ID: 0xDEADBEEF
- FW Version: "v1.0.0"
### Status Payload (10 bytes)
Sent in response to `GET_STATUS` command.
```c
struct StatusPayload {
uint8_t is_profiling; // 1 if profiling active, 0 otherwise
uint32_t buffer_overflows; // Number of buffer overflow events
uint32_t records_captured; // Total records captured
uint8_t buffer_usage_percent; // Current buffer usage (0-100)
} __attribute__((packed));
```
### Profile Data Payload (Variable)
Sent automatically during profiling or in response to data requests.
```c
struct ProfileDataPayload {
uint8_t version; // Protocol version (0x01)
uint16_t record_count; // Number of records in this packet
ProfileRecord records[]; // Array of profile records
} __attribute__((packed));
```
Each `ProfileRecord` is 14 bytes:
```c
struct ProfileRecord {
uint32_t func_addr; // Function address (from instrumentation)
uint32_t entry_time; // Entry timestamp in microseconds
uint32_t duration_us; // Function duration in microseconds
uint16_t depth; // Call stack depth (0 = root)
} __attribute__((packed));
```
**Field Details:**
- `func_addr`: Return address from `__builtin_return_address(0)` in instrumentation hook
- `entry_time`: Microsecond timestamp when function was entered (wraps at ~71 minutes)
- `duration_us`: Time spent in function including children
- `depth`: Call stack depth (0 for main, 1 for functions called by main, etc.)
## Communication Flow
### Initial Connection
```
Host Device
| |
|--- GET_METADATA ------>|
|<---- METADATA ---------|
| |
|--- START_PROFILING --->|
|<---- ACK --------------|
| |
|<---- PROFILE_DATA -----| (continuous stream)
|<---- PROFILE_DATA -----|
|<---- PROFILE_DATA -----|
| ... |
```
### Typical Session
```
1. Host connects to serial port
2. Host sends GET_METADATA
3. Device responds with METADATA packet
4. Host sends START_PROFILING
5. Device responds with ACK
6. Device begins streaming PROFILE_DATA packets
7. Host processes and visualizes data in real-time
8. Host sends STOP_PROFILING when done
9. Device responds with ACK and stops streaming
```
## Error Handling
### CRC Mismatch
If the host detects a CRC mismatch:
- Log the error
- Discard the packet
- Continue listening for next packet
- No retransmission (real-time streaming)
### Packet Loss
- Sequence numbers not implemented (keeps protocol simple)
- Missing data will create gaps in visualization
- Not critical for profiling use case
### Buffer Overflow
- Device sets `buffer_overflows` counter in status
- Host should warn user
- Options: increase baud rate, reduce instrumentation, or use sampling
## Performance Considerations
### Bandwidth Calculation
At 115200 baud:
- Effective throughput: ~11.5 KB/s
- Profile record size: 14 bytes
- Packet overhead: ~8 bytes per packet
- Records per packet (typical): 20
- Packet size: 8 + 3 + 280 = 291 bytes
- Packets per second: ~39
- Records per second: ~780
**Recommendation:** If profiling >780 function calls/sec, increase baud rate to 460800 or 921600.
### Timing Overhead
Instrumentation overhead per function:
- Entry hook: ~0.5-1 μs
- Exit hook: ~0.5-1 μs
- Total: ~1-2 μs per function call
Target: <5% overhead for typical applications.
## Protocol Versioning
Current version: **0x01**
The `version` field in `ProfileDataPayload` allows for future protocol extensions:
- v0x01: Current format (entry_time + duration)
- v0x02: Future - could add ISR markers, task IDs, etc.
- v0x03: Future - compressed format, delta encoding
Host should check version and handle accordingly or reject unsupported versions.
## Example Packet Dumps
### GET_METADATA Command
```
55 05 00 00 00 00 00 00 00 00 00 5A
```
### METADATA Response
```
AA 55 03 1C 00 // Header, Type=METADATA, Length=28
00 09 FB 0A // mcu_clock_hz = 168000000
40 42 0F 00 // timer_freq = 1000000
EF BE AD DE // build_id = 0xDEADBEEF
76 31 2E 30 2E 30 00 ... // fw_version = "v1.0.0\0..."
XX XX // CRC16
0A // End marker
```
### PROFILE_DATA Response (2 records)
```
AA 55 05 1F 00 // Header, Type=PROFILE_DATA, Length=31
01 // Version = 1
02 00 // Record count = 2
// Record 1
00 01 00 08 // func_addr = 0x08000100
E8 03 00 00 // entry_time = 1000 μs
D0 07 00 00 // duration = 2000 μs
00 00 // depth = 0
// Record 2
20 02 00 08 // func_addr = 0x08000220
F4 01 00 00 // entry_time = 500 μs
2C 01 00 00 // duration = 300 μs
01 00 // depth = 1
XX XX // CRC16
0A // End marker
```
## Implementation Notes
### Embedded Side
- Use DMA for UART transmission to minimize CPU overhead
- Implement ring buffer with power-of-2 size for efficient modulo operations
- Send packets in background task or idle hook
- Consider double-buffering: one buffer for capturing, one for transmitting
### Host Side
- Use state machine for packet parsing (don't assume atomicity)
- Handle partial packets gracefully
- Verify CRC before processing payload
- Use background thread for serial reading to not block UI
## References
- CRC16-CCITT: Polynomial 0x1021, initial value 0xFFFF
- Little-endian byte order for multi-byte integers
- GCC instrumentation: `__cyg_profile_func_enter/exit`