3d4c3f7f0459100879d1e4d398cd594972d46a87
MiniProfiler
Real-time profiling tool for embedded STM32 applications using GCC's -finstrument-functions feature.
Features
- Embedded Profiling: Automatic function instrumentation using GCC hooks
- Real-time Visualization: Live flame graphs, timelines, and statistics
- Low Overhead: DMA-based UART transmission, <5% performance impact
- Symbol Resolution: Automatic function name resolution from ELF/DWARF debug info
- Web Interface: Modern, responsive web UI with multiple visualization modes
Architecture
MiniProfiler consists of two main components:
1. Embedded Module (STM32)
- Uses
__cyg_profile_func_enter/exithooks to capture function calls - Lock-free ring buffer for storing profiling data
- UART/Serial communication with host
- Minimal memory footprint (~2-10KB)
2. Host Application (Python)
- Serial communication and protocol parsing
- ELF/DWARF symbol resolution
- Web server with real-time updates (Flask + SocketIO)
- Three visualization modes:
- Flame Graph: Aggregate CPU time by function
- Timeline: Execution over time (flame chart)
- Statistics: Call counts, min/max/avg durations
Quick Start
Installation with uv (Recommended - 10x faster)
cd host
# Create virtual environment and install
uv venv
source .venv/bin/activate # Linux/macOS (.venv\Scripts\activate on Windows)
uv pip install -e .
Installation with pip
cd host
pip install -e .
Using Makefile (easiest)
# From project root
make install # Install with uv
make run # Run the server
make sample # Generate sample data
Running the Host Application
# Using the installed CLI
miniprofiler
# Or directly with Python
python -m miniprofiler.cli
# With custom host/port
miniprofiler --host localhost --port 8080
# With verbose logging
miniprofiler --verbose
Testing Without Hardware
Generate sample profiling data to test the visualization:
cd host/tests
python sample_data_generator.py
This creates:
sample_profile_data.bin- Binary protocol datasample_flamegraph.json- Flame graph datasample_statistics.json- Statistics datasample_timeline.json- Timeline data
Using the Web Interface
- Start the host application:
miniprofiler - Open browser to
http://localhost:5000 - Enter serial port (e.g.,
/dev/ttyUSB0orCOM3) - Optionally provide path to
.elffile for symbol resolution - Click Connect
- Click Start Profiling
- View real-time profiling data in the three visualization tabs
Protocol
Command-Response Structure
Commands (Host → Embedded)
START_PROFILING(0x01)STOP_PROFILING(0x02)GET_STATUS(0x03)RESET_BUFFERS(0x04)GET_METADATA(0x05)
Responses (Embedded → Host)
ACK/NACK(0x01/0x02)METADATA(0x03)STATUS(0x04)PROFILE_DATA(0x05)
Profile Record Format
Each profiling record is 14 bytes:
struct ProfileRecord {
uint32_t func_addr; // Function address
uint32_t entry_time; // Entry timestamp (μs)
uint32_t duration_us; // Duration (μs)
uint16_t depth; // Call stack depth
} __attribute__((packed));
Packet Format
┌─────────┬──────────┬───────────────┬─────────┬─────┐
│ Header │ Length │ Payload │ CRC │ End │
│ (0xAA55)│ (2B) │ (N bytes) │ (2B) │(0x0A)│
└─────────┴──────────┴───────────────┴─────────┴─────┘
Development Roadmap
Phase 1: Host Application ✓
- Protocol implementation
- Serial communication
- Symbol resolution (ELF/DWARF)
- Data analysis and statistics
- Web interface with Flask + SocketIO
- Flame graph visualization (d3-flame-graph)
- Timeline visualization (Plotly.js)
- Sample data generator
Phase 2: Embedded Module (Next)
- Instrumentation hooks (
__cyg_profile_func_enter/exit) - DWT/SysTick timing implementation
- Ring buffer implementation
- UART communication with DMA
- Command handling
- STM32 example project
Phase 3: Integration & Testing
- End-to-end testing with real hardware
- Performance overhead measurement
- Buffer overflow handling
- Symbol resolution verification
Phase 4: Renode Emulation
- Renode platform description
- Virtual UART setup
- CI/CD integration
- Automated testing
Configuration
GCC Compilation Flags
To enable instrumentation in your embedded project:
CFLAGS += -finstrument-functions
CFLAGS += -finstrument-functions-exclude-file-list=drivers/,lib/
Excluding Functions
// Exclude specific functions
void __attribute__((no_instrument_function)) driver_function(void);
// Exclude entire files
#pragma GCC optimize ("no-instrument-functions")
Requirements
Host Application
- Python 3.8+
- Flask 3.0+
- pyserial 3.5+
- pyelftools 0.29+
- Modern web browser with JavaScript enabled
Embedded Target
- STM32 MCU (STM32F4/F7/H7 recommended)
- GCC ARM toolchain with
-finstrument-functionssupport - UART/USB-CDC peripheral
- ~2-10KB RAM for profiling buffer
License
MIT License - See LICENSE file for details
Contributing
Contributions welcome! Please open an issue or submit a pull request.
Acknowledgments
- Inspired by Brendan Gregg's FlameGraphs
- Uses d3-flame-graph for visualization
- Built with Flask, SocketIO, and Plotly.js
Description
Languages
Python
73.4%
JavaScript
14.4%
CSS
5.5%
HTML
5.1%
Makefile
1.6%