Most online courses are built once and watched a million times. We wanted the opposite: a course built once, for you, the moment you ask for it.
The question we started from was narrow on purpose — what's the smallest gap between "I want to learn X" and a video that actually teaches you X? Today that gap is hours of searching, half-relevant playlists, and content pitched at someone who isn't you. A beginner and an expert asking the same question deserve different lessons, and nothing online gives them that.
So we built for the gap itself: a two-minute voice conversation in, a personalized video lesson out — pitched at your level, delivered the way you already watch everything else.
Kheiron is a three-stage pipeline that turns a short spoken conversation into a finished, research-backed video lesson — built end to end in 24 hours.
- Voice onboarding. You talk to an agent for about two minutes. A real-time STT → LLM → TTS loop runs over a WebSocket, so it listens, asks follow-ups, and figures out your background and what you're actually trying to learn.
- Course generation. From that conversation an LLM writes its own search queries. Tavily pulls sources, GLiNER2 filters them by entity density to keep only the dense, on-topic material, and GPT-5.5 turns what's left into a structured course with real citations.
- Video production. A director agent writes a timestamped script. Seedream generates the images, Seedance animates them, Gradium narrates, and ffmpeg stitches the whole thing into a vertical, subtitled video — delivered in the app in the TikTok-style format people already scroll.
The backend is FastAPI on uv, with the orchestration built on the OpenAI Agents SDK — agents calling Tavily for retrieval, GLiNER2 for filtering, and the FAL.ai models (Seedream, Seedance) for generation, with ffmpeg doing the final assembly. The frontend is Next.js 16 / React 19 with Tailwind 4; voice capture runs in the browser through an AudioWorklet streaming 24 kHz PCM straight into the onboarding WebSocket.
The hard part wasn't any single model — it was the wiring between them: three stages that each had to hand clean, structured state to the next while a user waited.
Twelve hours from "can we even glue these models together?" to a watchable lesson, and a full working demo by the end of the hackathon: speak for two minutes, get a personalized video back.
The challenge was never the models — it was orchestration. With a pipeline this long, the real questions are which agent owns each decision, and where to draw the line between deterministic code and model-driven steps. Lean too far on the model and the whole thing turns flaky; lean too far on hardcoded logic and you lose the flexibility that made the idea work in the first place.
Building it in a day with people I'd just met made that lesson concrete: the team moved fastest once the contract between stages was pinned down and each person could own their stage behind a clean interface.
