This page covers the technical details of how Reactor Runtime works. You’ll learn about the runtime architecture, model lifecycle, component orchestration, and best practices for implementation.
The Manifest File
The manifest.json file configures your model for the runtime:
- What class implements your model
- What arguments to pass during initialization
- Which weights to download
- Version compatibility information
Example manifest.json:
{
"reactor-runtime": "0.0.0",
"class": "model_longlive:LongLiveVideoModel",
"model_name": "longlive",
"model_version": "1.0.0",
"args": {
"fps": 30,
"size": [480, 640]
},
"weights": ["LongLive", "Wan2_1_VAE", "WanVideo_comfy"]
}
Breaking it down:
reactor-runtime: The runtime version this model requires
class: Points to your VideoModel implementation (filename:ClassName)
model_name: Unique identifier for your model
model_version: Semantic version of your model
args: Arguments passed to your model’s __init__ method
weights: List of weight folders to download from Model Registry
The VideoModel Class
The VideoModel class is an abstract base class that provides the interface between your model and the runtime:
from reactor_runtime import VideoModel, command
class YourModel(VideoModel):
def __init__(self, fps=30, size=(480, 640), **kwargs):
# Load YOUR existing model here
self.my_model = load_my_existing_model()
def start_session(self):
# Called when a user connects
# Run your generation loop here
self.my_model.generate()
Your model file can import anything - your entire existing codebase, external libraries, custom modules. The VideoModel is just the thin wrapper that makes it work with Reactor.
How the Runtime Works
The Reactor Runtime orchestrates three key components:
1. Runtime Server (Port 8081)
The main FastAPI server that hosts your model. When started:
- Instantiates your
VideoModel class (calls __init__)
- Loads all weights into GPU memory
- Exposes API endpoints for session management
- Keeps your model warm and ready for connections
2. Local Coordinator (Port 8080)
Manages session lifecycle and client connections:
- Receives connection requests from frontends
- Starts/stops sessions on the runtime
- Handles WebSocket communication
- Mimics production coordinator behavior for local testing
3. LiveKit Server (Port 7880)
Powers real-time video streaming:
- Creates “rooms” for each session
- Handles WebRTC connections between client and model
- Streams video frames with minimal latency
- Runs entirely locally during development
The VideoModel Lifecycle
Understanding when methods are called is crucial for optimal performance:
RUNTIME STARTUP
↓
__init__() called - HEAVY LOADING HERE
↓
[Model stays in memory]
↓
USER CONNECTS → start_session() called
↓
[Session active, frames stream]
↓
USER DISCONNECTS → cleanup in finally block
↓
[Model stays in memory, ready for next user]
↓
NEXT USER → start_session() called again (instant!)
Phase 1: Initialization (__init__)
When: Once at runtime startup, before any users connect
Purpose: Load everything heavy
Duration: Can take minutes (only happens once!)
def __init__(self, fps=30, size=(480, 640), **kwargs):
"""
Called ONCE when runtime starts.
Do ALL heavy lifting here:
- Load model architectures
- Download/load weights into GPU
- Compile models
- Initialize pipelines
"""
self._device = torch.device("cuda")
# Load weights from Model Registry
weights_path = VideoModel.weights("YourModel-Weights")
# Initialize your model (heavy operation)
self.model = YourHeavyModel()
self.model.load_state_dict(torch.load(weights_path / "model.pt"))
self.model.to(self._device)
Critical: Everything expensive goes here. This instance stays in memory forever.
Phase 2: Session Start (start_session)
When: Every time a user connects
Purpose: Run your generation loop
Duration: Runs until user disconnects or generation completes
def start_session(self):
"""
Called when a user connects.
Should be FAST to start (model already loaded).
Run your generation loop here.
"""
try:
# Light session-specific setup
noise = torch.randn(self._latent_shape, device=self._device)
# Your generation loop
for frame in self.model.generate(noise):
# Emit frame to user in real-time
get_ctx().emit_frame(frame)
except Exception as e:
# Reset model to initial state before raising
self.model.reset()
raise e # Notify runtime of error
finally:
# Cleanup session resources
self.model.reset()
torch.cuda.empty_cache()
Important: Keep this method lightweight. Heavy loading should be in __init__.
Phase 3: Session End
When: User disconnects or generation completes
Purpose: Clean up session-specific resources and reset model
def start_session(self):
try:
# Generation loop
pass
finally:
# Reset model to initial state for next user
self.model.reset()
# Lightweight cleanup
torch.cuda.empty_cache()
# DO NOT unload model weights!
# Model stays loaded for next user
The Session Context (get_ctx())
During an active session, you can access the runtime context using get_ctx(). This gives you access to methods for communicating with the frontend and streaming video frames.
from reactor_runtime import VideoModel, get_ctx
class YourModel(VideoModel):
def start_session(self):
# Call methods directly on get_ctx()
for frame in self.generate():
get_ctx().emit_block(frame)
Important: get_ctx() can only be called during an active session (inside start_session() or any methods called from it). Calling it outside a session will raise a RuntimeError.
Emitting Frames (emit_block)
Stream video frames to the connected client:
def start_session(self):
# Emit a single frame
frame = np.random.rand(480, 640, 3) # (H, W, 3) in RGB
get_ctx().emit_block(frame)
# Emit multiple frames at once
frames = np.random.rand(10, 480, 640, 3) # (N, H, W, 3)
get_ctx().emit_block(frames)
# Send a black frame
get_ctx().emit_block(None)
Frame Format:
- Single frame:
np.ndarray with shape (H, W, 3) in RGB color space
- Multiple frames:
np.ndarray with shape (N, H, W, 3) where N is the number of frames
- Black frame: Pass
None to display a black frame on the client
Don’t worry about FPS! You can emit frames as fast as your model generates
them. The runtime automatically buffers and smooths frame delivery to maintain
consistent playback, even if your generation rate varies. Just focus on
producing frames - the runtime handles timing.
Sending Messages (send)
Send arbitrary data from your model to the frontend:
def start_session(self):
# Send generation progress
get_ctx().send({
"type": "progress",
"value": 0.5,
"message": "Halfway done"
})
# Send model state updates
get_ctx().send({
"type": "state_changed",
"new_state": "generating",
"metadata": {"step": 42}
})
# Send any custom data structure
get_ctx().send({
"type": "custom_event",
"data": {"key": "value"}
})
Message Format:
- Must be a Python
dict
- Will be automatically wrapped in an
ApplicationMessage envelope by the runtime
- The frontend receives this via WebSocket and can handle it in your React components
Key Features
Automatic Weight Management
Weights are stored in the Model Registry (S3 bucket). When you run your model:
- Runtime checks the
weights array in your manifest
- Downloads missing weights to
~/.cache/reactor_registry/
- Caches them for future runs
No manual Hugging Face downloads. No path configuration. Just works.
Don’t have AWS access? You can load weights normally from local paths
during development. When you’re ready to deploy, the Reactor team will handle
uploading your weights to S3 and updating your model to use the Model
Registry.
Built-in Networking
LiveKit handles all WebRTC complexity:
- Bidirectional video streaming
- Audio support
- Data channels for commands
- Adaptive bitrate streaming
You just emit frames. The runtime handles delivery.
Seamless Local → Production
The development workflow mirrors production exactly:
- Local: All services run on your machine
- Production: Services run in the cloud
Same code. Same behavior. Deploy with confidence.
Efficient Resource Usage
The VideoModel instance stays loaded throughout the runtime lifecycle:
- Startup: Heavy weight loading happens once in
__init__
- Session Start: Lightweight setup when user connects
- Session End: Quick reset when user disconnects
- Next Session: Model already loaded, instant start
Users transition between sessions in milliseconds.
Development Flow
The complete local workflow:
# 1. Start runtime (starts all components)
reactor run
# 2. In another terminal, start your frontend
pnpm run dev
# 3. Connect from browser (ReactorProvider with local flag)
# → Frontend connects to runtime
# → Runtime starts your session
# → Video streams via LiveKit
No tokens. No credentials. No manual coordination.
Model Structure
Every Reactor model consists of:
your-model/
├── manifest.json # Model configuration
├── model_yourmodel.py # VideoModel wrapper
├── requirements.txt # Dependencies
└── your_existing_model/ # Your original codebase (optional)
├── models.py
├── utils.py
└── ...
The wrapper is thin - your existing code stays intact.
Next Steps
Now that you understand the core concepts, learn how to implement your own model: