Mohammad Ausaf Logo Image
Mohammad Ausaf

AI Studio

Enterprise Agentic AI Platform for Autonomous Content Generation

multi-tenant agentic AI for content generation


AI Studio is a multi-tenant agentic AI platform at Galleri5 where users describe what they want in natural language and the system builds it autonomously — generating images, videos, audio, 3D assets, and lip-synced content across a large multi-provider model catalog.

The core of the platform is the Agentic Canvas: a React + tldraw workspace backed by a second-generation agentic backend. The backend turns creative briefs into validated DAGs, the canvas renders every step as an editable node, and users can target a single node, a selected group, or the whole plan for follow-up instructions without rebuilding the entire workflow.

A user can ask for a product image, animate it into a short video, and add a voiceover. The system plans dependent steps, validates model schemas, wires outputs into downstream inputs, executes ready branches in parallel, and streams state back to the canvas. If a step fails, only affected descendants are blocked; independent branches keep moving.

Stack
Python
FastAPI
MongoDB
Redis
Azure
React
tldraw
Firestore
Azure Blob
Docker
SSE
What I Built
Agentic Canvas
Agentic Backend
DAG Executor
Canvas Edits
Targeted Edits
Job Router
Fair Scheduler
Concurrency Control
Provider Adapters
Error Classifier
Recovery Workers
Crash Recovery
Live Platform
Core System

agentic canvas

A canvas-native agent system that converts natural language into validated, editable, executable creative DAGs.

Stage 01
director
Routes the turn as quick generation, full creative planning, edit, chat, or bulk generation. Full runs can launch persona debate before planning.
Stage 02
planner
Builds steps, dependencies, and visual groups using the live model catalog, uploaded assets, context library, and model capability metadata.
Stage 03
edit agent
Repairs invalid plans and handles targeted edits from selected canvas nodes. Every graph edit is tracked so affected downstream nodes can be marked for regeneration.
Stage 04
executor
Runs ready DAG levels concurrently, syncs realtime canvas state, calls the job router, and retries or reroutes failed nodes without stopping independent branches.
Agentic Canvas UI — DAG of connected generation nodes with chat panel

validated agentic planning


The planning backend uses route-specific flows instead of one monolithic prompt. Small requests take a fast path, complex briefs run through collaborative creative planning, and edit/chat turns bypass unnecessary work when the user is only modifying an existing canvas.

For complex briefs, the debate phase produces a creative direction document: scene sequence, transitions, production phases, asset dependency map, and style bible. Continuity rules are explicit - for example, a later video scene can start from the extracted last frame of the previous scene instead of regenerating a new frame and breaking visual continuity.

The planner receives model capability metadata, current uploads, selected context assets, and relevant session history. Current-turn uploads are prioritized, old uploads are ignored unless referenced, and context assets are treated as fixed references so follow-up prompts stay grounded in the canvas.

A plan is not sent to the canvas just because the LLM produced JSON. It has to pass graph validation, model compatibility checks, field checks, media wiring checks, and continuity checks first. If validation fails, a repair pass attempts to fix the graph before the user sees it.

Robustness
The frontend only sees a proposed plan after validation succeeds. That avoids a common failure mode in agentic systems: showing a confident-looking plan that cannot actually execute.

schema-aware plan hardening


The backend separates structural checks from LLM judgment. Deterministic validation catches broken graph structure, missing dependencies, incompatible references, cycles, unsupported utility steps, and media merge issues without another model call.

The executor does a second pass against model schemas before running a step. It normalizes values, removes unsupported fields, fixes common output-format mismatches, and corrects near-miss model references to known catalog entries.

dependency wiring
The executor resolves upstream step references, maps media outputs into downstream model fields, and preserves literal uploaded URLs for context references.
media preparation
Local utility nodes extract frames from videos, oversized images are compressed before video/lip-sync calls, and mixed-aspect video merge inputs are normalized before they hit downstream provider APIs.

Agent behavior is configuration-driven: prompts, model preferences, feature flags, and evaluation variants can be tuned without a backend redeploy.

scoped graph editing


The canvas preserves the user's selection context. If the user selects one generated image and says "make this dusk", the backend treats it as a node-scoped edit instead of re-planning the whole canvas. If the user selects the whole plan, the same chat box becomes a plan-level editing surface.

Graph edits are structured operations, not free-form string rewrites. The backend validates edits before they land, keeps enough history to reason about changed nodes, and avoids discarding completed outputs when only a small part of the graph changed.

Graph edits propagate state. When an upstream node changes, descendants are marked stale while their previous outputs remain visible. The frontend renders those stale cascades on nodes and wires, and lets the user rerun only the affected subtree instead of paying to regenerate completed independent work.

DAG execution & incremental streaming


The executor uses Kahn's algorithm — in-degree tracking with an adjacency list of dependents. All steps with zero in-degree are launched as concurrent async tasks. When a task completes, its dependents' in-degrees are decremented, and newly-ready steps are added to the next batch. This naturally maximizes parallelism without ever violating dependency order.

Dependency wiring is fully automatic. When a step completes, its output is classified by media type and injected into downstream fields by introspecting the target model schema. The same resolver handles scalar URL fields, array inputs for merge models, literal context URLs, and local utility outputs such as extracted video frames.

Each step's result updates durable session state and a realtime canvas projection. The canvas state channel drives node status, thumbnails, phase labels, discussion turns, and progress overlays; a lightweight event stream carries agent messages and plan proposals. The split keeps the UI resilient if one stream reconnects.

scoped failure recovery
On failure, a repair pass can adjust the failed step and retry it without blocking unrelated branches. Downstream dependents are the only nodes blocked by that failure.
partial reruns & eta
Users can rerun one node, a stale subtree, all stale nodes, or the whole plan. The executor estimates duration by critical path, not by naively summing every parallel branch.

multi-turn context & session state


The planner is stateless — it takes a prompt and outputs a DAG. But it lives inside a multi-turn chat with history, uploaded files, and previous generations. A user says "now make that image into a video" — and "that image" refers to something from 5 messages ago.

Solution
A lightweight routing step decides whether the new message should modify the current canvas or start a fresh workflow. Existing-plan turns carry the current graph and completed outputs into the planner so follow-up prompts can reuse context instead of starting from scratch.

Before calling either planner, a context window is assembled: conversation history, file references with their storage URLs (labeled by media type), and previous workflow outputs. Uploaded files are collected across the entire session, not just the current message, so multi-turn file references resolve correctly. This context is injected into the planner prompt so it can resolve anaphoric references ("that image", "the video from before") to actual asset URLs.

Session state is structured around plans — each plan has its steps, step results, approval status, and previous execution results. Generated outputs feed back into the chat as assistant messages, maintaining conversational continuity across plan boundaries.

episodic content pipeline


The filmmaking module decomposes high-level episode briefs into a production-ready hierarchy: Episodes → Scenes → Shots. Each shot carries its own magnification (CU, MCU, WS, LS), action description, composition reference, and explicit links to character, location, and prop assets. A state machine tracks every entity through its lifecycle — from creation through asset readiness, generation, review, and final approval.

Character consistency is enforced through reference-based generation — the same turnaround sheet is passed to every shot featuring that character, with the prompt explicitly instructing the model to match facial features, clothing, and build. Shots within the same scene share location and character assets, enforcing visual continuity across sequential frames.

Bulk generation supports large shot batches queued and dispatched asynchronously. Generated assets pass through iterative review cycles with conversational refinement and visual markup. Quality gates enforce approval before downstream use while keeping production teams in the loop.

inference gateway


Every AI job — from the Agentic Canvas, episodic pipeline, or direct API — routes through a single inference gateway. Jobs are enqueued per tenant with priority and dispatched through a fair scheduler across the model catalog. Before dispatch, the gateway checks tenant and provider capacity so one customer or one saturated model cannot monopolize the system.

The router is built for multi-replica operation. Jobs are accepted into a durable queue, workers claim work with time-bounded ownership records, and long-running provider calls are tracked so completion, cancellation, retry, and recovery all converge through one handler.

Crash recovery: the gateway records accepted work before dispatch and runs background recovery workers for expired claims, orphaned jobs, queue/datastore drift, and concurrency drift. Failed jobs retry with backoff; exhausted jobs move into an operator-visible failure queue with structured error detail.

durable acceptance
Queue metadata is written before the job becomes visible to dispatchers, reducing races where workers see a job without enough context to execute it.
multi-provider abstraction
Provider adapters sit behind a unified interface. Each adapter handles authentication, polling or callback completion, and output normalization so the canvas executor does not need provider-specific code.

fallback at every layer


Failures can happen at every level of the pipeline — the LLM can truncate a plan, a model can refuse a prompt, or a provider can become unavailable. The system is designed so that planning, execution, and completion each have a controlled fallback path.

LLM fallback
Agent calls use primary and secondary model choices based on task type. Retriable model errors can fall back without exposing raw provider failures to the user.
execution fallback
Execution failures are classified before retry. Some are retried on the same model, some are routed to an alternate model, and some are surfaced as user-actionable validation errors.

Error classification: provider errors are normalized into user-facing categories like capacity, rate limiting, content policy, input image issues, invalid parameters, and unknown failures. That lets the UI offer the right next action instead of showing raw provider text.

dual-write architecture


The system uses a dual-write pattern — a durable datastore remains the source of truth, while a realtime projection powers the canvas experience. Job completion, messages, and canvas step progress are mirrored asynchronously so the UI updates quickly without putting the realtime layer on the critical path.

Realtime writes are merge-based so partial updates do not clobber unrelated canvas state. If a realtime write fails, the durable API remains authoritative and the frontend self-heals on the next fetch.

event-driven sync
Canvas progress, job completions, and chat messages mirror as background work. Request latency is kept separate from realtime update latency.
event streaming
Agent messages and plan proposals stream separately from canvas state, which keeps execution progress reliable even when the conversational stream reconnects.

production infrastructure


The system runs on managed cloud infrastructure with containerized services, a durable primary datastore, a coordination/cache layer, blob storage for generated media, and a realtime projection for frontend sync.

Agent prompts are stored in the database and hot-swappable without redeploy — allowing A/B testing of planner, validator, and verifier instructions in production. Agent configs also track operational metrics: total invocations, token usage (in/out), average latency, error count, and error rate. This makes it possible to compare prompt variants quantitatively, not just by vibes.

async-first architecture
The backend uses async request handling, async database access, async provider calls, and background workers for non-critical side effects.
caching & performance
Hot-path metadata and model capability summaries are cached, with durable fallback reads when needed. Rate limits and coordination state expire automatically to reduce operational cleanup.

I lead on-call rotation, maintain runbooks, own monitoring and alerting, and handle incident response for the production infrastructure.

Built with AI Studio Production content generated using this platform.