Skip to main content
This page covers the technical details of how Reactor Runtime works. You’ll learn about the runtime architecture, model lifecycle, component orchestration, and best practices for implementation.

The Manifest File

The manifest.json file configures your model for the runtime:
  • What class implements your model
  • What arguments to pass during initialization
  • Which weights to download
  • Version compatibility information
Example manifest.json:
{
  "reactor-runtime": "0.0.0",
  "class": "model_longlive:LongLiveVideoModel",
  "model_name": "longlive",
  "model_version": "1.0.0",
  "args": {
    "fps": 30,
    "size": [480, 640]
  },
  "weights": ["LongLive", "Wan2_1_VAE", "WanVideo_comfy"]
}
Breaking it down:
  • reactor-runtime: The runtime version this model requires
  • class: Points to your VideoModel implementation (filename:ClassName)
  • model_name: Unique identifier for your model
  • model_version: Semantic version of your model
  • args: Arguments passed to your model’s __init__ method
  • weights: List of weight folders to download from Model Registry

The VideoModel Class

The VideoModel class is an abstract base class that provides the interface between your model and the runtime:
from reactor_runtime import VideoModel, command

class YourModel(VideoModel):
    def __init__(self, fps=30, size=(480, 640), **kwargs):
        # Load YOUR existing model here
        self.my_model = load_my_existing_model()

    def start_session(self):
        # Called when a user connects
        # Run your generation loop here
        self.my_model.generate()
Your model file can import anything - your entire existing codebase, external libraries, custom modules. The VideoModel is just the thin wrapper that makes it work with Reactor.

How the Runtime Works

The Reactor Runtime orchestrates three key components:

1. Runtime Server (Port 8081)

The main FastAPI server that hosts your model. When started:
  • Instantiates your VideoModel class (calls __init__)
  • Loads all weights into GPU memory
  • Exposes API endpoints for session management
  • Keeps your model warm and ready for connections

2. Local Coordinator (Port 8080)

Manages session lifecycle and client connections:
  • Receives connection requests from frontends
  • Starts/stops sessions on the runtime
  • Handles WebSocket communication
  • Mimics production coordinator behavior for local testing

3. LiveKit Server (Port 7880)

Powers real-time video streaming:
  • Creates “rooms” for each session
  • Handles WebRTC connections between client and model
  • Streams video frames with minimal latency
  • Runs entirely locally during development

The VideoModel Lifecycle

Understanding when methods are called is crucial for optimal performance:
RUNTIME STARTUP

__init__() called - HEAVY LOADING HERE

[Model stays in memory]

USER CONNECTS → start_session() called

[Session active, frames stream]

USER DISCONNECTS → cleanup in finally block

[Model stays in memory, ready for next user]

NEXT USER → start_session() called again (instant!)

Phase 1: Initialization (__init__)

When: Once at runtime startup, before any users connect Purpose: Load everything heavy Duration: Can take minutes (only happens once!)
def __init__(self, fps=30, size=(480, 640), **kwargs):
    """
    Called ONCE when runtime starts.
    Do ALL heavy lifting here:
    - Load model architectures
    - Download/load weights into GPU
    - Compile models
    - Initialize pipelines
    """
    self._device = torch.device("cuda")

    # Load weights from Model Registry
    weights_path = VideoModel.weights("YourModel-Weights")

    # Initialize your model (heavy operation)
    self.model = YourHeavyModel()
    self.model.load_state_dict(torch.load(weights_path / "model.pt"))
    self.model.to(self._device)
Critical: Everything expensive goes here. This instance stays in memory forever.

Phase 2: Session Start (start_session)

When: Every time a user connects Purpose: Run your generation loop Duration: Runs until user disconnects or generation completes
def start_session(self):
    """
    Called when a user connects.
    Should be FAST to start (model already loaded).
    Run your generation loop here.
    """
    try:
        # Light session-specific setup
        noise = torch.randn(self._latent_shape, device=self._device)

        # Your generation loop
        for frame in self.model.generate(noise):
            # Emit frame to user in real-time
            get_ctx().emit_frame(frame)

    except Exception as e:
        # Reset model to initial state before raising
        self.model.reset()
        raise e  # Notify runtime of error
    finally:
        # Cleanup session resources
        self.model.reset()
        torch.cuda.empty_cache()
Important: Keep this method lightweight. Heavy loading should be in __init__.

Phase 3: Session End

When: User disconnects or generation completes Purpose: Clean up session-specific resources and reset model
def start_session(self):
    try:
        # Generation loop
        pass
    finally:
        # Reset model to initial state for next user
        self.model.reset()

        # Lightweight cleanup
        torch.cuda.empty_cache()

        # DO NOT unload model weights!
        # Model stays loaded for next user

The Session Context (get_ctx())

During an active session, you can access the runtime context using get_ctx(). This gives you access to methods for communicating with the frontend and streaming video frames.
from reactor_runtime import VideoModel, get_ctx

class YourModel(VideoModel):
    def start_session(self):
        # Call methods directly on get_ctx()
        for frame in self.generate():
            get_ctx().emit_block(frame)
Important: get_ctx() can only be called during an active session (inside start_session() or any methods called from it). Calling it outside a session will raise a RuntimeError.

Emitting Frames (emit_block)

Stream video frames to the connected client:
def start_session(self):
    # Emit a single frame
    frame = np.random.rand(480, 640, 3)  # (H, W, 3) in RGB
    get_ctx().emit_block(frame)
    
    # Emit multiple frames at once
    frames = np.random.rand(10, 480, 640, 3)  # (N, H, W, 3)
    get_ctx().emit_block(frames)
    
    # Send a black frame
    get_ctx().emit_block(None)
Frame Format:
  • Single frame: np.ndarray with shape (H, W, 3) in RGB color space
  • Multiple frames: np.ndarray with shape (N, H, W, 3) where N is the number of frames
  • Black frame: Pass None to display a black frame on the client
Don’t worry about FPS! You can emit frames as fast as your model generates them. The runtime automatically buffers and smooths frame delivery to maintain consistent playback, even if your generation rate varies. Just focus on producing frames - the runtime handles timing.

Sending Messages (send)

Send arbitrary data from your model to the frontend:
def start_session(self):
    # Send generation progress
    get_ctx().send({
        "type": "progress",
        "value": 0.5,
        "message": "Halfway done"
    })
    
    # Send model state updates
    get_ctx().send({
        "type": "state_changed",
        "new_state": "generating",
        "metadata": {"step": 42}
    })
    
    # Send any custom data structure
    get_ctx().send({
        "type": "custom_event",
        "data": {"key": "value"}
    })
Message Format:
  • Must be a Python dict
  • Will be automatically wrapped in an ApplicationMessage envelope by the runtime
  • The frontend receives this via WebSocket and can handle it in your React components

Key Features

Automatic Weight Management

Weights are stored in the Model Registry (S3 bucket). When you run your model:
  1. Runtime checks the weights array in your manifest
  2. Downloads missing weights to ~/.cache/reactor_registry/
  3. Caches them for future runs
No manual Hugging Face downloads. No path configuration. Just works.
Don’t have AWS access? You can load weights normally from local paths during development. When you’re ready to deploy, the Reactor team will handle uploading your weights to S3 and updating your model to use the Model Registry.

Built-in Networking

LiveKit handles all WebRTC complexity:
  • Bidirectional video streaming
  • Audio support
  • Data channels for commands
  • Adaptive bitrate streaming
You just emit frames. The runtime handles delivery.

Seamless Local → Production

The development workflow mirrors production exactly:
  • Local: All services run on your machine
  • Production: Services run in the cloud
Same code. Same behavior. Deploy with confidence.

Efficient Resource Usage

The VideoModel instance stays loaded throughout the runtime lifecycle:
  • Startup: Heavy weight loading happens once in __init__
  • Session Start: Lightweight setup when user connects
  • Session End: Quick reset when user disconnects
  • Next Session: Model already loaded, instant start
Users transition between sessions in milliseconds.

Development Flow

The complete local workflow:
# 1. Start runtime (starts all components)
reactor run

# 2. In another terminal, start your frontend
pnpm run dev

# 3. Connect from browser (ReactorProvider with local flag)
# → Frontend connects to runtime
# → Runtime starts your session
# → Video streams via LiveKit
No tokens. No credentials. No manual coordination.

Model Structure

Every Reactor model consists of:
your-model/
├── manifest.json              # Model configuration
├── model_yourmodel.py         # VideoModel wrapper
├── requirements.txt           # Dependencies
└── your_existing_model/       # Your original codebase (optional)
    ├── models.py
    ├── utils.py
    └── ...
The wrapper is thin - your existing code stays intact.

Next Steps

Now that you understand the core concepts, learn how to implement your own model: