Skip to content
AV
Back to work
02 · CASE STUDY

Kheiron

Personalized video lessons, generated from a 2-minute conversation.

ROLE
Builder
DURATION
24h
TEAM
4 engineers
STACK
Voice AI · LLM orchestration · Video generation · Next.js
Kheiron — Speak. Watch. Learn. The most tailored tutor ever. Three app screens: voice onboarding, course draft, and the generated video feed.
§ CONTEXT · why this exists

Most online courses are built once and watched a million times. We wanted the opposite: a course built once, for you, the moment you ask for it.

The question we started from was narrow on purpose — what's the smallest gap between "I want to learn X" and a video that actually teaches you X? Today that gap is hours of searching, half-relevant playlists, and content pitched at someone who isn't you. A beginner and an expert asking the same question deserve different lessons, and nothing online gives them that.

So we built for the gap itself: a two-minute voice conversation in, a personalized video lesson out — pitched at your level, delivered the way you already watch everything else.

§ WHAT WE BUILT

Kheiron is a three-stage pipeline that turns a short spoken conversation into a finished, research-backed video lesson — built end to end in 24 hours.

  1. Voice onboarding. You talk to an agent for about two minutes. A real-time STT → LLM → TTS loop runs over a WebSocket, so it listens, asks follow-ups, and figures out your background and what you're actually trying to learn.
  2. Course generation. From that conversation an LLM writes its own search queries. Tavily pulls sources, GLiNER2 filters them by entity density to keep only the dense, on-topic material, and GPT-5.5 turns what's left into a structured course with real citations.
  3. Video production. A director agent writes a timestamped script. Seedream generates the images, Seedance animates them, Gradium narrates, and ffmpeg stitches the whole thing into a vertical, subtitled video — delivered in the app in the TikTok-style format people already scroll.
§ STACK & ARCHITECTURE

The backend is FastAPI on uv, with the orchestration built on the OpenAI Agents SDK — agents calling Tavily for retrieval, GLiNER2 for filtering, and the FAL.ai models (Seedream, Seedance) for generation, with ffmpeg doing the final assembly. The frontend is Next.js 16 / React 19 with Tailwind 4; voice capture runs in the browser through an AudioWorklet streaming 24 kHz PCM straight into the onboarding WebSocket.

The hard part wasn't any single model — it was the wiring between them: three stages that each had to hand clean, structured state to the next while a user waited.

§ HIGHLIGHTS / OUTCOMES

Twelve hours from "can we even glue these models together?" to a watchable lesson, and a full working demo by the end of the hackathon: speak for two minutes, get a personalized video back.

End-to-end demo — voice conversation in, personalized video lesson out.
TODO · Add anything the jury said — placement, feedback, what landed. If you'd ship one more thing with another week, name it here.
§ WHAT I LEARNED

The challenge was never the models — it was orchestration. With a pipeline this long, the real questions are which agent owns each decision, and where to draw the line between deterministic code and model-driven steps. Lean too far on the model and the whole thing turns flaky; lean too far on hardcoded logic and you lose the flexibility that made the idea work in the first place.

Building it in a day with people I'd just met made that lesson concrete: the team moved fastest once the contract between stages was pinned down and each person could own their stage behind a clean interface.

§ VISUALS
Kheiron product pipeline — onboarding, course creation and video generation agents with the tools each one orchestrates.
Product pipeline — three agents, each orchestrating its own tools.