# MiniProfiler Real-time profiling tool for embedded STM32 applications using GCC's `-finstrument-functions` feature. ## Features - **Embedded Profiling**: Automatic function instrumentation using GCC hooks - **Real-time Visualization**: Live flame graphs, timelines, and statistics - **Low Overhead**: DMA-based UART transmission, <5% performance impact - **Symbol Resolution**: Automatic function name resolution from ELF/DWARF debug info - **Web Interface**: Modern, responsive web UI with multiple visualization modes ## Architecture MiniProfiler consists of two main components: ### 1. Embedded Module (STM32) - Uses `__cyg_profile_func_enter/exit` hooks to capture function calls - Lock-free ring buffer for storing profiling data - UART/Serial communication with host - Minimal memory footprint (~2-10KB) ### 2. Host Application (Python) - Serial communication and protocol parsing - ELF/DWARF symbol resolution - Web server with real-time updates (Flask + SocketIO) - Three visualization modes: - **Flame Graph**: Aggregate CPU time by function - **Timeline**: Execution over time (flame chart) - **Statistics**: Call counts, min/max/avg durations ## Quick Start ### Installation with uv (Recommended - 10x faster) ```bash cd host # Create virtual environment and install uv venv source .venv/bin/activate # Linux/macOS (.venv\Scripts\activate on Windows) uv pip install -e . ``` ### Installation with pip ```bash cd host pip install -e . ``` ### Using Makefile (easiest) ```bash # From project root make install # Install with uv make run # Run the server make sample # Generate sample data ``` ### Running the Host Application ```bash # Using the installed CLI miniprofiler # Or directly with Python python -m miniprofiler.cli # With custom host/port miniprofiler --host localhost --port 8080 # With verbose logging miniprofiler --verbose ``` ### Testing Without Hardware Generate sample profiling data to test the visualization: ```bash cd host/tests python sample_data_generator.py ``` This creates: - `sample_profile_data.bin` - Binary protocol data - `sample_flamegraph.json` - Flame graph data - `sample_statistics.json` - Statistics data - `sample_timeline.json` - Timeline data ### Using the Web Interface 1. Start the host application: `miniprofiler` 2. Open browser to `http://localhost:5000` 3. Enter serial port (e.g., `/dev/ttyUSB0` or `COM3`) 4. Optionally provide path to `.elf` file for symbol resolution 5. Click **Connect** 6. Click **Start Profiling** 7. View real-time profiling data in the three visualization tabs ## Protocol ### Command-Response Structure **Commands (Host → Embedded)** - `START_PROFILING` (0x01) - `STOP_PROFILING` (0x02) - `GET_STATUS` (0x03) - `RESET_BUFFERS` (0x04) - `GET_METADATA` (0x05) **Responses (Embedded → Host)** - `ACK/NACK` (0x01/0x02) - `METADATA` (0x03) - `STATUS` (0x04) - `PROFILE_DATA` (0x05) ### Profile Record Format Each profiling record is 14 bytes: ```c struct ProfileRecord { uint32_t func_addr; // Function address uint32_t entry_time; // Entry timestamp (μs) uint32_t duration_us; // Duration (μs) uint16_t depth; // Call stack depth } __attribute__((packed)); ``` ### Packet Format ``` ┌─────────┬──────────┬───────────────┬─────────┬─────┐ │ Header │ Length │ Payload │ CRC │ End │ │ (0xAA55)│ (2B) │ (N bytes) │ (2B) │(0x0A)│ └─────────┴──────────┴───────────────┴─────────┴─────┘ ``` ## Development Roadmap ### Phase 1: Host Application ✓ - [x] Protocol implementation - [x] Serial communication - [x] Symbol resolution (ELF/DWARF) - [x] Data analysis and statistics - [x] Web interface with Flask + SocketIO - [x] Flame graph visualization (d3-flame-graph) - [x] Timeline visualization (Plotly.js) - [x] Sample data generator ### Phase 2: Embedded Module (Next) - [ ] Instrumentation hooks (`__cyg_profile_func_enter/exit`) - [ ] DWT/SysTick timing implementation - [ ] Ring buffer implementation - [ ] UART communication with DMA - [ ] Command handling - [ ] STM32 example project ### Phase 3: Integration & Testing - [ ] End-to-end testing with real hardware - [ ] Performance overhead measurement - [ ] Buffer overflow handling - [ ] Symbol resolution verification ### Phase 4: Renode Emulation - [ ] Renode platform description - [ ] Virtual UART setup - [ ] CI/CD integration - [ ] Automated testing ## Configuration ### GCC Compilation Flags To enable instrumentation in your embedded project: ```makefile CFLAGS += -finstrument-functions CFLAGS += -finstrument-functions-exclude-file-list=drivers/,lib/ ``` ### Excluding Functions ```c // Exclude specific functions void __attribute__((no_instrument_function)) driver_function(void); // Exclude entire files #pragma GCC optimize ("no-instrument-functions") ``` ## Requirements ### Host Application - Python 3.8+ - Flask 3.0+ - pyserial 3.5+ - pyelftools 0.29+ - Modern web browser with JavaScript enabled ### Embedded Target - STM32 MCU (STM32F4/F7/H7 recommended) - GCC ARM toolchain with `-finstrument-functions` support - UART/USB-CDC peripheral - ~2-10KB RAM for profiling buffer ## License MIT License - See LICENSE file for details ## Contributing Contributions welcome! Please open an issue or submit a pull request. ## Acknowledgments - Inspired by [Brendan Gregg's FlameGraphs](https://www.brendangregg.com/flamegraphs.html) - Uses [d3-flame-graph](https://github.com/spiermar/d3-flame-graph) for visualization - Built with Flask, SocketIO, and Plotly.js