agentic canvas
A canvas-native agent system that converts natural language into validated, editable, executable creative DAGs.
A canvas-native agent system that converts natural language into validated, editable, executable creative DAGs.
The planning backend uses route-specific flows instead of one monolithic prompt. Small requests take a fast path, complex briefs run through collaborative creative planning, and edit/chat turns bypass unnecessary work when the user is only modifying an existing canvas.
For complex briefs, the debate phase produces a creative direction document: scene sequence, transitions, production phases, asset dependency map, and style bible. Continuity rules are explicit - for example, a later video scene can start from the extracted last frame of the previous scene instead of regenerating a new frame and breaking visual continuity.
The planner receives model capability metadata, current uploads, selected context assets, and relevant session history. Current-turn uploads are prioritized, old uploads are ignored unless referenced, and context assets are treated as fixed references so follow-up prompts stay grounded in the canvas.
A plan is not sent to the canvas just because the LLM produced JSON. It has to pass graph validation, model compatibility checks, field checks, media wiring checks, and continuity checks first. If validation fails, a repair pass attempts to fix the graph before the user sees it.
The backend separates structural checks from LLM judgment. Deterministic validation catches broken graph structure, missing dependencies, incompatible references, cycles, unsupported utility steps, and media merge issues without another model call.
The executor does a second pass against model schemas before running a step. It normalizes values, removes unsupported fields, fixes common output-format mismatches, and corrects near-miss model references to known catalog entries.
Agent behavior is configuration-driven: prompts, model preferences, feature flags, and evaluation variants can be tuned without a backend redeploy.
The canvas preserves the user's selection context. If the user selects one generated image and says "make this dusk", the backend treats it as a node-scoped edit instead of re-planning the whole canvas. If the user selects the whole plan, the same chat box becomes a plan-level editing surface.
Graph edits are structured operations, not free-form string rewrites. The backend validates edits before they land, keeps enough history to reason about changed nodes, and avoids discarding completed outputs when only a small part of the graph changed.
Graph edits propagate state. When an upstream node changes, descendants are marked stale while their previous outputs remain visible. The frontend renders those stale cascades on nodes and wires, and lets the user rerun only the affected subtree instead of paying to regenerate completed independent work.
The executor uses Kahn's algorithm — in-degree tracking with an adjacency list of dependents. All steps with zero in-degree are launched as concurrent async tasks. When a task completes, its dependents' in-degrees are decremented, and newly-ready steps are added to the next batch. This naturally maximizes parallelism without ever violating dependency order.
Dependency wiring is fully automatic. When a step completes, its output is classified by media type and injected into downstream fields by introspecting the target model schema. The same resolver handles scalar URL fields, array inputs for merge models, literal context URLs, and local utility outputs such as extracted video frames.
Each step's result updates durable session state and a realtime canvas projection. The canvas state channel drives node status, thumbnails, phase labels, discussion turns, and progress overlays; a lightweight event stream carries agent messages and plan proposals. The split keeps the UI resilient if one stream reconnects.
The planner is stateless — it takes a prompt and outputs a DAG. But it lives inside a multi-turn chat with history, uploaded files, and previous generations. A user says "now make that image into a video" — and "that image" refers to something from 5 messages ago.
Before calling either planner, a context window is assembled: conversation history, file references with their storage URLs (labeled by media type), and previous workflow outputs. Uploaded files are collected across the entire session, not just the current message, so multi-turn file references resolve correctly. This context is injected into the planner prompt so it can resolve anaphoric references ("that image", "the video from before") to actual asset URLs.
Session state is structured around plans — each plan has its steps, step results, approval status, and previous execution results. Generated outputs feed back into the chat as assistant messages, maintaining conversational continuity across plan boundaries.
The filmmaking module decomposes high-level episode briefs into a production-ready hierarchy: Episodes → Scenes → Shots. Each shot carries its own magnification (CU, MCU, WS, LS), action description, composition reference, and explicit links to character, location, and prop assets. A state machine tracks every entity through its lifecycle — from creation through asset readiness, generation, review, and final approval.
Character consistency is enforced through reference-based generation — the same turnaround sheet is passed to every shot featuring that character, with the prompt explicitly instructing the model to match facial features, clothing, and build. Shots within the same scene share location and character assets, enforcing visual continuity across sequential frames.
Bulk generation supports large shot batches queued and dispatched asynchronously. Generated assets pass through iterative review cycles with conversational refinement and visual markup. Quality gates enforce approval before downstream use while keeping production teams in the loop.
Every AI job — from the Agentic Canvas, episodic pipeline, or direct API — routes through a single inference gateway. Jobs are enqueued per tenant with priority and dispatched through a fair scheduler across the model catalog. Before dispatch, the gateway checks tenant and provider capacity so one customer or one saturated model cannot monopolize the system.
The router is built for multi-replica operation. Jobs are accepted into a durable queue, workers claim work with time-bounded ownership records, and long-running provider calls are tracked so completion, cancellation, retry, and recovery all converge through one handler.
Crash recovery: the gateway records accepted work before dispatch and runs background recovery workers for expired claims, orphaned jobs, queue/datastore drift, and concurrency drift. Failed jobs retry with backoff; exhausted jobs move into an operator-visible failure queue with structured error detail.
Failures can happen at every level of the pipeline — the LLM can truncate a plan, a model can refuse a prompt, or a provider can become unavailable. The system is designed so that planning, execution, and completion each have a controlled fallback path.
Error classification: provider errors are normalized into user-facing categories like capacity, rate limiting, content policy, input image issues, invalid parameters, and unknown failures. That lets the UI offer the right next action instead of showing raw provider text.
The system uses a dual-write pattern — a durable datastore remains the source of truth, while a realtime projection powers the canvas experience. Job completion, messages, and canvas step progress are mirrored asynchronously so the UI updates quickly without putting the realtime layer on the critical path.
Realtime writes are merge-based so partial updates do not clobber unrelated canvas state. If a realtime write fails, the durable API remains authoritative and the frontend self-heals on the next fetch.
The system runs on managed cloud infrastructure with containerized services, a durable primary datastore, a coordination/cache layer, blob storage for generated media, and a realtime projection for frontend sync.
Agent prompts are stored in the database and hot-swappable without redeploy — allowing A/B testing of planner, validator, and verifier instructions in production. Agent configs also track operational metrics: total invocations, token usage (in/out), average latency, error count, and error rate. This makes it possible to compare prompt variants quantitatively, not just by vibes.
I lead on-call rotation, maintain runbooks, own monitoring and alerting, and handle incident response for the production infrastructure.