A guided tour of the open-source LingBot Interactive reference app, which demonstrates every important pattern in the LingBot SDK. By the end you’ll know how to start a scene from an image, drive it with WASD, snap clips, and surface model errors.Documentation Index
Fetch the complete documentation index at: https://docs.reactor.inc/llms.txt
Use this file to discover all available pages before exploring further.
Installation and setup
Get the example running locally before reading further. Every section below points back at code in the example repo. You will need:- Node.js 18+.
- pnpm (the example pins lockfiles to pnpm;
npmoryarnwill work but you’ll regenerate the lockfile). - A Reactor API key (starts with
rk_). - Familiarity with the Next.js App Router.
Clone the example
The example lives alongside our other reference apps in
reactor-team/reactor-experiments.Add your API key
Your
rk_… key must never reach the browser; the example reads it server-side and mints a
short-lived JWT for the client. We’ll cover the broker pattern below; for now, drop the key
into .env:How LingBot works
Building with LingBot is different from calling a typical generative API. There’s no image-in / video-out request. You open a long-lived connection, send a seed image plus a paragraph-length prompt, and the model begins producing a continuous stream of chunks that you steer in real time with WASD. The image anchors the scene and is locked at start; the prompt and movement axes drive everything that happens after. Opening the connection isn’t instant. Reactor provisions a GPU for your session, so the client moves through the same four states as every other Reactor model before media starts flowing (disconnected → connecting → waiting → ready). See
Sessions for the full breakdown.
Three properties of the LingBot API are worth internalizing before you read on, since the rest of
this tutorial assumes them:
- Commands are asynchronous; events are the source of truth. Calling
setImagedoesn’t mean the next chunk uses it; the model confirms by emittingimage_acceptedwhen the upload has been decoded and is ready to use. - Errors arrive out-of-band. A broken precondition like
startbeforesetImagesurfaces later as acommand_errorevent, not as a thrown exception. - Movement axes are persistent state, not pulses.
set_movementand the two look axes hold their last value forever; every keydown needs a matching keyup that sendsidle.
Authentication
LingBot uses the same broker pattern as every browser-side Reactor app: yourrk_… key stays on the
server and the client receives a short-lived JWT minted from it. The server-side route at
app/api/reactor/token/route.ts and the mount-time fetch in LingbotApp.tsx are
character-for-character the Helios setup with the provider swapped to <LingbotProvider>. See
Authentication for the full concept page, including the Express equivalent and
the Python path that skips the broker entirely.
Starting a scene from an image
The canonical LingBot launch flow lives inScenePicker.tsx. Picking a curated scene fires a
five-step sequence: uploadFile → setImage → await image_accepted → setPrompt → start. The wait in
the middle is the part that matters. setImage carries an upload the runtime has to decode and
VAE-encode, but start carries nothing and sails past on the same data channel. Skip the wait and
the first chunk is generated from the prompt alone, with the image landing one chunk later. The
scene visibly “corrects itself” at the first chunk boundary.
setConditioning; the explicit wait is the answer. The example
uses useLingbotImageAccepted with a one-shot ref resolver to gate setPrompt + start on the right
event:
app/components/ScenePicker.tsx
imageReadyRef is the standard pattern for waiting on an event-bus event
from inside an async function. The hook callback is the resolver; the ref makes it one-shot. Without
the ref reset, a second image_accepted from a later run would resolve a Promise nobody is
awaiting.
The curated scenes live in app/lib/scenes.ts. Each entry pairs a hand-tuned starting prompt with a
reference image in public/. The prompts follow the rules in the Prompt Guide
above: FOV and subject declared up front, near / mid / far object layers filled in, position-only
camera framing, one atmosphere phrase.
Custom uploads
CustomStart.tsx handles the second launch path: the user uploads their own image and types their
own prompt. The trick here is to upload as soon as the file is picked, so image_accepted lands
while the user is still typing. When they click Start, the example fires setPrompt + start
directly; no await needed, because the human typing delay is orders of magnitude longer than the
image decode. Contrast with Starting a scene from an image, where
a one-click launch has to bridge that gap explicitly.
The two halves of the flow are wired to different events:
app/components/CustomStart.tsx
has_image off the snapshot rather than tracking a local boolean keeps the UI honest across
edge cases (a reset() from elsewhere in the app, a disconnect mid-upload). The
state payload is the canonical source for what the model thinks is set.
Going live
Oncesnapshot.started === true, the setup panels (ScenePicker and CustomStart) hide and the
live UI takes over: a status badge, a now-playing panel with transport controls, and the video pane.
StatusBadge.tsx is the user’s window into the four-state connection machine. Every state,
including the multi-second waiting step where Reactor is provisioning a GPU, gets a visible label
and color:
app/components/StatusBadge.tsx
useLingbot() exposes status, connect, disconnect, and lastError. The Connect / Disconnect
toggle is purely on status === "disconnected"; every other state renders Disconnect.
NowPlaying.tsx is the canonical example of how the rest of the app reads model state: subscribe
once with useLingbotState, hold the latest snapshot in useState, read fields off it. No event
aggregation, no derived booleans, no useReducer over chunk_complete events.
app/components/NowPlaying.tsx
current_action is a LingBot-specific snapshot field: a +-joined composite like "w+left" that
reflects what the model is currently moving / looking. It updates per chunk, so it lags the user’s
key presses by one chunk; that’s fine for a status readout, but as
Driving the scene with WASD covers, it’s the wrong source for button
highlights.
The video pane itself is one component:
app/components/Video.tsx
<LingbotMainVideoView /> is a typed wrapper around <ReactorView track="main_video"> that handles
<video> element setup, srcObject binding, and browser autoplay policy quirks. Style the outer
container; never reach for the underlying element.
Driving the scene with WASD
This is LingBot’s signature feature, andMovementControls.tsx is the largest component in the
example. The model exposes three persistent-state axes: set_movement
(forward/back/strafe_left/strafe_right/idle), set_look_horizontal, and set_look_vertical, plus
set_rotation_speed_deg as a slider.
The crucial invariant: axes hold their last value forever until you send a new one. Every
keydown must be paired with a keyup that sends idle, or the camera will keep moving after the user
lets go of the key. This is not a pulse API; the model walks forward at every chunk boundary until
you explicitly tell it to stop.
The component owns three pieces of local state for highlighting:
app/components/MovementControls.tsx
snapshot.movement reflects what the model is currently
generating with, not what was just pressed. It lags every press by a chunk. If the buttons read
from the snapshot, a quick W tap would never light up; by the time the highlight wanted to appear,
the user has already released the key and the snapshot is back to idle. Local state is instant and
matches what the user just did.
The keyboard handler attaches a single keydown / keyup pair to window so the pad responds without
the user having to click into anything:
preventDefaulton arrow keys. Without it, the arrow keys scroll the page while the user is looking around. Browsers don’t scroll on WASD by default, but thepreventDefaultis harmless there and keeps the handler symmetric.- Ignore events in inputs and textareas. Otherwise typing “wad” into the custom-prompt textarea
drives the camera around. The check against
tagNameandisContentEditablecovers both bases. - Don’t filter repeat events. Holding a key fires repeated keydowns; the handler re-sends the same axis value. That’s a no-op at the model (same value, same axis), and trying to filter duplicates adds complexity for zero benefit.
rotation_speed_deg to 0 disables look-axis rotation entirely, even with look_h or
look_v non-idle. This is the lever to expose if you want to detune look responsiveness for a
particular scene.
Snapping a clip
The SDK ships recording primitives so you don’t have to wire upMediaRecorder yourself. The
example’s SnapClip.tsx captures the last 10 seconds of the live stream and opens a modal with the
SDK’s built-in preview player and a download button.
app/components/SnapClip.tsx
@reactor-team/js-sdk, not @reactor-models/lingbot. Recording
is a base-SDK feature. It works identically for every Reactor model, and the typed model packages
don’t re-export the recording surface. So direct base-SDK imports are idiomatic in this one place,
and you can drop the file into any other Reactor example unchanged. The Helios tutorial uses the
same file.
reactor.requestClip(durationSeconds) is the whole capture API. It returns a Clip value that
you hand to <ClipPlayer> to preview and <ClipDownloadButton> to save. The getJwt prop is a
resolver those components call when they need an auth token to fetch the clip. The example reuses
the same cached /api/reactor/token route from Authentication, so repeat
captures don’t trigger new token mints. Errors come back as a RecordingError with a typed code
and reason, distinct from the command_error events covered next.
Surfacing command_error
Every LingBot command can fail a precondition check (most commonlystart before both setImage
and setPrompt have landed). The example never lets these fail silently.
app/components/CommandError.tsx
useLingbotCommandError is the typed wrapper for the command_error message: it fires when LingBot
rejects a command, carrying the failing command name and a human-readable reason. The component sits
in the sidebar, renders nothing until an error arrives, and clears itself on the next state snapshot
so a stale banner can’t pile up.
A few LingBot-specific failure modes worth knowing about:
startbefore conditions are set. The model rejectsstartunless both a prompt AND a reference image have been registered. The setup-phase UI (ScenePicker,CustomStart) prevents this in practice by disabling the Start button on!snapshot.has_prompt || !snapshot.has_image, but a programmaticstartfrom elsewhere surfaces ascommand_error.setImageduring generation is a silent no-op. Unlikestart, sendingsetImagemid-run does not emitcommand_error; the seed image is locked once a session starts and the new image is just dropped. If you want a “swap reference image” affordance, it has to be a Setup-phase control gated on!snapshot.started, or you have to callreset()first.setPromptduring generation is fine. It’s not an error path at all. The new prompt takes effect at the next chunk boundary. Useful baseline when distinguishing “rejected” from “applied later.”
What’s intentionally left out
The demo covers the launch + steer + capture loop. Several LingBot features are deliberately out of scope:- Mid-stream prompt swap:
useLingbot().setPrompt({ prompt })during the live phase. The reference image stays locked; the new prompt picks up at the next chunk boundary. - Reproducible runs:
setSeedbeforestart. - Movement-aware prompt schedule: react to
useLingbotChunkCompleteand firesetPromptwhen a target chunk fires. LingBot has no native chunk schedule like Helios’sschedule_prompt. - Gamepad input: same shape as the keyboard handler; press = direction, release =
idle.
skill/SKILL.md in the example repo.