Real-time video-to-video transformation with ultra-low latency
The StreamDiffusionV2 model is Reactor’s first Video-to-Video (V2V) transformation system that enables real-time editing of video streams with ultra-low latency. Connect any video source—whether it’s your webcam, a live stream, or any other video feed—and transform it in real-time based on your text prompts.
Transform live video streams in real-time with minimal latency
Dynamic Prompting
Change prompts on the fly and see instant transformations in your video feed
Flexible Input Sources
Connect webcams, streaming sources, or any video input for transformation
The model processes video frames continuously, applying AI-driven transformations based on your text prompts. Longer, more detailed prompts tend to produce the best results, and you can update prompts dynamically to see the video transformation change in real-time.
StreamDiffusionV2 is a video-to-video (V2V) model that requires a video input stream to transform. You must provide a video source (typically from a webcam or screen capture) for the model to process.
When you first connect to the StreamDiffusionV2 model, it will be ready to receive commands but won’t start processing video until you follow the proper initialization sequence:
Set Initial Prompt: Before starting, you must set at least one prompt using set_prompt
Start Generation: Once you have a prompt set, call start to begin the video transformation
Dynamic Control: While running, you can change prompts in real-time or reset the system as needed
If you call start before setting an initial prompt, the command will be
ignored and the model won’t begin processing.
Description: Set the prompt for video generation and transformation.Parameters:
prompt (string, required): The text prompt describing the desired video transformation
Behavior:
Sets the active prompt that will be used to transform the incoming video stream
Can be called at any time to change the transformation style
Longer, more detailed prompts typically produce better results
Changes take effect immediately if generation is already running
Best Practices:
Describe the desired scene: Focus on what should be present in the final video, not the transformation process
Provide context and setting: Include details about the environment, lighting, atmosphere, and overall composition
Specify style and mood: Describe the artistic style, color palette, lighting conditions, and emotional tone
Be descriptive about elements: Instead of “a dog turns into a cat,” write “a cat is sitting in the scene”
Include scene details: Mention backgrounds, objects, textures, and visual elements that should be present
Use comprehensive descriptions: Longer, more detailed prompts typically produce better and more consistent results
Example:
Copy
// Set a detailed scene description promptawait reactor.sendMessage({ type: "set_prompt", data: { prompt: "A cyberpunk cityscape at night with towering skyscrapers covered in neon signs, rain-soaked streets reflecting purple and blue lights, flying cars moving between buildings, and a person in a futuristic coat walking through the scene with dramatic lighting and atmospheric fog" }});// Change to a different scene and styleawait reactor.sendMessage({ type: "set_prompt", data: { prompt: "A serene watercolor painting scene with a person sitting by a peaceful lake surrounded by cherry blossom trees, soft pastel colors throughout, gentle brushstroke textures, warm golden hour lighting, and mountains in the distant background" }});
StreamDiffusionV2 is developed by Tianrui Feng, Zhi Li, Haocheng Xi, Muyang Li, Shuo Yang, Xiuyu Li, Lvmin Zhang, Kelly Peng, Song Han, Maneesh Agrawala, Kurt Keutzer, Akio Kodaira, and Chenfeng Xu (UC Berkeley, MIT, Stanford University, First Intelligence, UT Austin)Project Page - View on GitHub