Skip to main content
Reactor includes a profiler so you can instrument your code once and use it everywhere: locally during optimization, in deployed infrastructure, and across code changes to track regressions. It is a no-op when disabled, so instrumentation can stay in your code permanently at zero cost.

Enabling Profiling

Pass --enable-profiling when starting your model:
reactor run --runtime http --enable-profiling
By default, profiling output is written to ./profiling. Change the directory with --profiling-output-dir:
reactor run --runtime http --enable-profiling --profiling-output-dir ./my-profiling-data
FlagDescriptionDefault
--enable-profilingEnable file-based profiling outputOff
--profiling-output-dirDirectory for profiling output files./profiling
When profiling is not enabled, get_ctx().profiler() returns a NoOpProfiler whose methods are all no-ops. There is zero runtime cost.

Instrumenting Your Code

Access the profiler through the session context and wrap code blocks with section():
from reactor_runtime import get_ctx

def start_session(self):
    while not get_ctx().should_stop():
        with get_ctx().profiler().section("inference"):
            with get_ctx().profiler().section("preprocess"):
                latent = self.preprocess(self.current_prompt)
            with get_ctx().profiler().section("model_forward"):
                output = self.model(latent)
            with get_ctx().profiler().section("vae_decode"):
                frames = self.vae.decode(output)

        get_ctx().emit_block(frames)

Nested Sections

Sections can be nested to any depth. Each measurement records the full path through the nesting hierarchy:
with get_ctx().profiler().section("inference"):
    with get_ctx().profiler().section("diffusion"):
        with get_ctx().profiler().section("denoise_step"):
            self.denoise(latent)
This produces the following paths:
FieldValue
section_keydenoise_step
parent_pathinference/diffusion
full_pathinference/diffusion/denoise_step
The hierarchy makes it easy to see both the flat cost of individual operations and how they nest within your pipeline.

The profile_fn Decorator

For functions you want to profile without adding a with block inside, use the profile_fn decorator:
from reactor_runtime.profiling import profile_fn, CudaTimingMode

@profile_fn("vae_decode", cuda_timing=CudaTimingMode.EVENT)
def decode_latents(self, latents):
    return self.vae.decode(latents)
This is equivalent to wrapping the function body in with get_ctx().profiler().section("vae_decode", ...). If no session is active (e.g., during offline testing), the decorator falls back to calling the function directly with no overhead.

CUDA Timing Modes

By default, sections are timed with time.perf_counter() (CPU wall-clock time). For GPU-bound operations this can be inaccurate because GPU work is asynchronous. The profiler provides three timing modes:
from reactor_runtime.profiling.profiler import CudaTimingMode

CudaTimingMode.NONE (default)

CPU timing only. Use for operations that do not touch the GPU.
with get_ctx().profiler().section("json_parsing"):
    data = json.loads(response)

CudaTimingMode.EVENT

Records CUDA events at section entry and exit, then computes the elapsed GPU time when the session ends. This avoids synchronization during the section itself, making it the most efficient mode for single-stream GPU work.
with get_ctx().profiler().section("model_forward", cuda_timing=CudaTimingMode.EVENT):
    output = self.model(input)

CudaTimingMode.SYNC

Performs a full torch.cuda.synchronize() at both entry and exit, then measures wall-clock time. Gives accurate results when multiple CUDA streams are in use, at the cost of higher overhead.
with get_ctx().profiler().section("pipeline", cuda_timing=CudaTimingMode.SYNC):
    with torch.cuda.stream(stream1):
        result1 = op1(data)
    with torch.cuda.stream(stream2):
        result2 = op2(data)

Choosing a Mode

ModeOverheadAccuracyBest For
NONENoneCPU onlyData loading, pre/post processing, CPU-bound work
EVENT~1%Single-stream GPUMost inference forward passes
SYNC~5-20%Multi-stream GPUPipelined or overlapping GPU work
If CUDA is not available (no GPU or PyTorch not installed), EVENT and SYNC automatically fall back to CPU timing.
EVENT and SYNC modes capture torch.cuda.current_device() at section entry and only instrument that device. If a section distributes work across multiple GPUs (e.g., DataParallel, pipeline parallelism), only the current device’s portion is measured. For DDP, where each process owns a single GPU, the profiler works correctly per-rank.

Output Files

When a session ends, the file backend flushes all accumulated timing data. For each session, the following files are written:
profiling/
├── profiling_<timestamp>.json             # Raw timing data (all samples)
├── profiling_<timestamp>_summary.md       # Human-readable summary with statistics
├── profiling_<timestamp>_sections.csv     # Per-section stats as CSV
├── profiling_<timestamp>_meta.json        # Run metadata (git SHA, GPU info, etc.)
└── run_<timestamp>/                       # Bundle directory with copies of all artifacts
    ├── profiling_<timestamp>.json
    ├── profiling_<timestamp>_summary.md
    ├── profiling_<timestamp>_sections.csv
    └── profiling_<timestamp>_meta.json
The <timestamp> is a Unix timestamp in milliseconds.

JSON Format

The main JSON file contains all raw sample durations so you can compute any statistics you need:
{
  "timestamp": 1700000000.0,
  "timestamp_iso": "2025-11-14T12:00:00+0000",
  "total_sections": 4,
  "total_samples": 1200,
  "sections": {
    "inference": {
      "key": "inference",
      "parent_path": "",
      "samples": [0.0312, 0.0298, 0.0305, ...],
      "count": 300,
      "total_seconds": 9.15,
      "mean_seconds": 0.0305,
      "min_seconds": 0.0280,
      "max_seconds": 0.0412
    },
    "inference/model_forward": {
      "key": "model_forward",
      "parent_path": "inference",
      "samples": [0.0251, 0.0243, ...],
      ...
    }
  }
}

Summary Markdown

The _summary.md file gives you a quick overview with percentile statistics:
SectionCountMean (ms)P50 (ms)P90 (ms)P99 (ms)Max (ms)
inference30030.529.833.141.245.0
inference/model_forward30025.124.327.535.038.2
It also includes Top Hot Sections ranked by mean and P99 latency, and FPS Estimates for sections whose names contain frame_pipeline or emit_block.

Metadata

The _meta.json file captures the environment so you can reproduce and compare runs:
  • Hostname, Python version
  • Git SHA and branch (if in a git repo)
  • PyTorch version, CUDA version, GPU device name
  • Model name and runtime type (from environment variables)

Visualizing Results

The runtime ships with a plotting script that reads profiling JSON files and produces charts. It requires matplotlib and numpy:
pip install matplotlib numpy

Basic Usage

Plot a single session:
python -m reactor_runtime.profiling.plotting.plot_profiling ./profiling/profiling_1700000000000.json
Plot all sessions in a directory:
python -m reactor_runtime.profiling.plotting.plot_profiling ./profiling/
Save plots to a directory instead of displaying interactively:
python -m reactor_runtime.profiling.plotting.plot_profiling ./profiling/ --output ./plots/
This generates two charts per session:
  1. Box plot — shows the distribution (median, quartiles, outliers) of each section
  2. Histogram — shows the frequency distribution for each section with mean, min, max, and sample count

Text Summary Only

If you do not need plots, print a text summary to the terminal:
python -m reactor_runtime.profiling.plotting.plot_profiling ./profiling/ --summary-only

Filtering Sections

Focus on specific parts of your pipeline:
# Only sections under inference/
python -m reactor_runtime.profiling.plotting.plot_profiling ./profiling/ --include "inference/"

# Exclude IO sections
python -m reactor_runtime.profiling.plotting.plot_profiling ./profiling/ --exclude "io/"

Output Formats

Plots default to PNG. You can also export as PDF or SVG:
python -m reactor_runtime.profiling.plotting.plot_profiling ./profiling/ --output ./plots/ --format svg

Aggregating Multiple Sessions

When you have profiling data from multiple sessions, use --aggregate to merge them into a single combined report:
python -m reactor_runtime.profiling.plotting.plot_profiling ./profiling/ --aggregate --output ./plots/
This produces:
  • profiling_aggregate_summary.md — combined statistics across all sessions
  • profiling_aggregate_sections.csv — combined per-section stats as CSV
  • profiling_aggregate_boxplot.png — box plot of combined data
  • profiling_aggregate_histograms.png — histograms of combined data

Skipping Warmup Samples

The first few iterations of a model are often slower due to CUDA kernel compilation, memory allocation, and cache warmup. Use --steady-state-skip to exclude the first N samples per section:
python -m reactor_runtime.profiling.plotting.plot_profiling ./profiling/ --aggregate --steady-state-skip 10

Saving a Baseline

Save aggregate results as a baseline for future comparisons:
python -m reactor_runtime.profiling.plotting.plot_profiling ./profiling/ \
  --aggregate --save-baseline ./baselines/
This creates a baselines/baseline_<timestamp>/ directory with all aggregate artifacts and metadata.

Comparing Runs

Compare current profiling data against a saved baseline to detect performance regressions:
python -m reactor_runtime.profiling.plotting.plot_profiling ./profiling/ \
  --aggregate --compare-to ./baselines/baseline_1700000000000/
This generates a profiling_comparison.md report that shows:
  • Regressions — sections whose mean or P99 latency increased beyond the threshold
  • Full comparison table — baseline vs. current for every section, with delta and percentage

Regression Threshold

By default, sections with >10% increase are flagged as regressions. Adjust with --regression-threshold:
python -m reactor_runtime.profiling.plotting.plot_profiling ./profiling/ \
  --aggregate --compare-to ./baselines/ --regression-threshold 5.0

Top Regressions

Show only the N worst regressions:
python -m reactor_runtime.profiling.plotting.plot_profiling ./profiling/ \
  --aggregate --compare-to ./baselines/ --top-regressions 5

Visualization CLI Reference

python -m reactor_runtime.profiling.plotting.plot_profiling <path> [options]
FlagDescriptionDefault
pathPath to a profiling JSON file or directoryRequired
--output, -oOutput directory for plotsInteractive display
--format, -fOutput format: png, pdf, svgpng
--no-boxSkip box plot generationOff
--no-histSkip histogram generationOff
--summary-onlyPrint text summary only, no plotsOff
--aggregateAggregate multiple files into one reportOff
--steady-state-skipSkip first N warmup samples per section0
--compare-toBaseline path for regression comparisonNone
--regression-thresholdPercentage threshold for regression warnings10.0
--top-regressionsShow only top N regressionsAll
--save-baselineSave aggregate results as baseline in this directoryNone
--includeOnly include sections matching this prefixAll
--excludeExclude sections matching this prefixNone
--titleCustom title for plotsAuto-generated
--tagTag to prefix to plot titles (e.g., PR-123)None

Thread Safety

The profiler uses a thread-local stack for tracking nested sections, so each thread maintains its own section hierarchy independently. You can safely call profiler().section() from different threads without interference.