StreamDiffusionV2

The StreamDiffusionV2 model is Reactor’s first Video-to-Video (V2V) transformation system that enables real-time editing of video streams with ultra-low latency. Connect any video source—whether it’s your webcam, a live stream, or any other video feed—and transform it in real-time based on your text prompts.

Key Features

Real-Time V2V

Transform live video streams in real-time with minimal latency

Dynamic Prompting

Change prompts on the fly and see instant transformations in your video feed

Flexible Input Sources

Connect webcams, streaming sources, or any video input for transformation

The model processes video frames continuously, applying AI-driven transformations based on your text prompts. Longer, more detailed prompts tend to produce the best results, and you can update prompts dynamically to see the video transformation change in real-time.

Quick Start

Get started with StreamDiffusionV2 in seconds:

npx create-reactor-app my-sd2-app stream-diffusion-v2

This creates a Next.js application from our open-source example. You can also try the live demo first.

Video Input

StreamDiffusionV2 is a video-to-video (V2V) model that requires a video input stream to transform. You must provide a video source (typically from a webcam or screen capture) for the model to process.

Using WebcamStream Component

The easiest way to provide video input is using the built-in WebcamStream component:

import { 
  ReactorProvider, 
  ReactorView, 
  WebcamStream 
} from "@reactor-team/js-sdk";

export default function App() {
  return (
    <ReactorProvider
      modelName="stream-diffusion-v2"
      insecureApiKey={process.env.REACTOR_API_KEY!}
      autoConnect={true}
    >
      <div className="flex gap-4">
        {/* Your webcam input */}
        <WebcamStream 
          className="w-1/2 aspect-video rounded-lg"
          videoObjectFit="cover"
        />
        
        {/* AI-transformed output */}
        <ReactorView 
          className="w-1/2 aspect-video rounded-lg"
          videoObjectFit="cover"
        />
      </div>
    </ReactorProvider>
  );
}

The WebcamStream component automatically:

Requests camera permissions
Publishes the video stream when Reactor connects
Unpublishes when disconnected
Handles errors and cleanup

Manual Video Publishing

For more control, you can manually publish video streams using the imperative API:

import { Reactor } from "@reactor-team/js-sdk";

const reactor = new Reactor({
  modelName: "stream-diffusion-v2",
  insecureApiKey: process.env.REACTOR_API_KEY
});

// Capture webcam
const stream = await navigator.mediaDevices.getUserMedia({
  video: { width: 1280, height: 720 }
});

// Connect and publish
await reactor.connect();

reactor.on("statusChanged", async (status) => {
  if (status === "ready") {
    await reactor.publishVideoStream(stream);
  }
});

The model requires video input to function. Make sure to publish a video stream after connecting and before starting generation.

Getting Started

When you first connect to the StreamDiffusionV2 model, it will be ready to receive commands but won’t start processing video until you follow the proper initialization sequence:

Set Initial Prompt: Before starting, you must set at least one prompt using set_prompt
Start Generation: Once you have a prompt set, call start to begin the video transformation
Dynamic Control: While running, you can change prompts in real-time or reset the system as needed

If you call start before setting an initial prompt, the command will be ignored and the model won’t begin processing.

Model Name

stream-diffusion-v2

Commands

Once connected, send commands using reactor.sendMessage() to control the video transformation process. Below are all available commands:

set_prompt
start
reset
set_denoising_step_list

set_prompt

Description: Set the prompt for video generation and transformation.Parameters:

prompt (string, required): The text prompt describing the desired video transformation

Behavior:

Sets the active prompt that will be used to transform the incoming video stream
Can be called at any time to change the transformation style
Longer, more detailed prompts typically produce better results
Changes take effect immediately if generation is already running

Best Practices:

Describe the desired scene: Focus on what should be present in the final video, not the transformation process
Provide context and setting: Include details about the environment, lighting, atmosphere, and overall composition
Specify style and mood: Describe the artistic style, color palette, lighting conditions, and emotional tone
Be descriptive about elements: Instead of “a dog turns into a cat,” write “a cat is sitting in the scene”
Include scene details: Mention backgrounds, objects, textures, and visual elements that should be present
Use comprehensive descriptions: Longer, more detailed prompts typically produce better and more consistent results

Example:

// Set a detailed scene description prompt
await reactor.sendMessage({
  type: "set_prompt",
  data: {
    prompt: "A cyberpunk cityscape at night with towering skyscrapers covered in neon signs, rain-soaked streets reflecting purple and blue lights, flying cars moving between buildings, and a person in a futuristic coat walking through the scene with dramatic lighting and atmospheric fog"
  }
});

// Change to a different scene and style
await reactor.sendMessage({
  type: "set_prompt",
  data: {
    prompt: "A serene watercolor painting scene with a person sitting by a peaceful lake surrounded by cherry blossom trees, soft pastel colors throughout, gentle brushstroke textures, warm golden hour lighting, and mountains in the distant background"
  }
});

start

Description: Begin the video transformation process using the currently set prompt.Parameters: NoneRequirements:

A prompt must be set using set_prompt before calling this command
If no prompt has been set, the command will be ignored

Behavior:

Initiates real-time video processing and transformation
Applies the current prompt to incoming video frames
Continues processing until reset is called or the connection is closed
Video output begins immediately after successful start

Example:

// Start the video transformation
await reactor.sendMessage({
  type: "start"
});

reset

Description: Stop video generation and reset the model to its initial state.Parameters: NoneEffects:

Halts any ongoing video transformation
Clears the current prompt from memory
Stops processing incoming video frames
Returns the model to a clean state ready for new configuration

Use Cases:

Stop the current transformation session
Clear all settings before starting a new session
Recover from any processing issues

Note: After calling reset, you’ll need to set a new prompt and call start again to resume video processing.Example:

// Reset the model
await reactor.sendMessage({
  type: "reset"
});

set_denoising_step_list

Description: Configure the number of denoising passes and their noise timesteps to control inference speed and quality.Parameters:

denoising_step_list (array of integers, required): List of noise timestep values
- Can contain 0 to 5 values (each value = one denoising pass)
- Each value must be between 0 and 1000
- Values represent noise timesteps processed in sequential order

How It Works:

Array length = Number of passes: More values = more denoising passes = slower but higher quality
Values = Noise timesteps: Each number represents the noise level at that denoising step
- High values (700-1000): High-noise steps that establish overall composition and interpret the prompt creatively
- Low values (0-300): Low-noise steps that refine details, textures, and final image quality
Order matters: Start with high-noise steps, then progress to low-noise steps for best results

Common Configurations:

Fast (1-2 passes): [500] or [600, 300]
Balanced (3 passes): [700, 500, 200]
High Quality (4-5 passes): [800, 600, 400, 100] or [900, 700, 500, 300, 100]

Example:

// Fast inference (2 passes)
await reactor.sendMessage({
  type: "set_denoising_step_list",
  data: {
    denoising_step_list: [600, 300]
  }
});

// High quality (4 passes)
await reactor.sendMessage({
  type: "set_denoising_step_list",
  data: {
    denoising_step_list: [800, 600, 400, 100]
  }
});

Credits

StreamDiffusionV2 is developed by Tianrui Feng, Zhi Li, Haocheng Xi, Muyang Li, Shuo Yang, Xiuyu Li, Lvmin Zhang, Kelly Peng, Song Han, Maneesh Agrawala, Kurt Keutzer, Akio Kodaira, and Chenfeng Xu (UC Berkeley, MIT, Stanford University, First Intelligence, UT Austin) Project Page - View on GitHub

Available Models

​Key Features

Real-Time V2V

Dynamic Prompting

Flexible Input Sources

​Quick Start

​Video Input

​Using WebcamStream Component

​Manual Video Publishing

​Getting Started

​Model Name

​Commands

​set_prompt

​start

​reset

​set_denoising_step_list

​Credits

Key Features

Quick Start

Video Input

Using WebcamStream Component

Manual Video Publishing

Getting Started

Model Name

Commands

set_prompt

start

reset

set_denoising_step_list

Credits