Game of YU Postmortem: Building an AI-Driven RPG as a Graduation Project
A 4-month student project layered ChatGPT-driven NPCs and ElevenLabs voice cloning on a Unity RPG. Won Most Original Graduation — taught me production AI.
3 dk okuma#games #ai #graduation #postmortem
Bu yazı yalnızca İngilizce olarak yayında.
Early version. I'll add the architecture diagram and Unity build screenshots in a later pass.
Game of YU was my senior capstone at Yaşar University's Software Engineering programme. Four months, solo, in Unity / C#. The brief was open: anything substantive, anything you'd defend at the jury. I picked an RPG with three twists — every NPC was driven by ChatGPT, every line of dialogue was generated and then spoken by an ElevenLabs voice clone, and the player's choices were graded by another LLM running a "consequences" judge in the background.
It won the Most Original Graduation Project award at the department's end-of-year defence. That feels worth a few paragraphs of honest reflection.
What the AI layer actually did
Three things, in roughly increasing order of how-much-it-mattered-in-the-end:
Dialogue generation. Each NPC had a character card (background, voice, current goal, what they know). On player interaction I'd send the card plus a short conversation buffer to GPT-4 and stream back response candidates. The first candidate that passed a small validator (no breaking-the-fourth-wall, no naming the player out of character, sane length) was used. Latency was the enemy — I masked it with character animation and an "...thinking" bubble.
Voice synthesis. ElevenLabs had just opened up voice cloning. I trained a small voice for each main NPC from public-domain audio. Generation was per-line, cached aggressively. A single playthrough's worth of audio cost roughly $4 in API credits; I was fine eating that cost during demos and presented it honestly to the jury.
Consequence grading. This was the genuinely novel piece. After each major player choice, a background LLM call would evaluate "did the player just do something that should cost them later?" — and tag the save file with a delayed consequence. The villain in chapter 3 might bring up your bar fight in chapter 1, because GPT had read both transcripts.
What worked
The novelty landed. The jury hadn't seen anything similar from a Yaşar student. The presentation focused on engineering, not on game polish, and that framing matched the project's actual strengths.
Caching saved me. Every LLM call and every TTS call hit a SQLite cache keyed on prompt hash. On demo day my live demo ran from cache; if Wi-Fi died I would have still been fine.
ElevenLabs voices held up. The uncanny valley is real but for an RPG that's 70% reading-text-while-voice-plays, it worked. Players described characters as "kind of unsettling" — which, for a dark fantasy RPG, was a feature.
What broke
Latency made the game un-shippable. Even at GPT-3.5-Turbo speeds in 2024, a "Hello, traveller" line took 1.5-3 seconds end-to-end. Players don't read dialogue, they hit X to skip — and skipping a 3-second generation meant the audio finished playing in an empty bubble. For a real game release I would have had to pre-generate the top 80% of dialogue branches offline. Live generation worked for a 15-minute demo, not a 15-hour game.
No safety net on character drift. Around minute 12 of my longest test play, one of the NPCs started referring to the player as "Metehan" — my name, because GPT had picked up something from an earlier session leak in my prompt construction. I added stricter prompt sanitization, but the moment of "oh, this thing is leaky" stayed with me.
Pricing didn't scale. $4 of API per playthrough is fine for a demo. It's not fine for a $20 indie game. If I were doing this again with Asansoft money, I'd run a small fine-tuned local model for the high-frequency NPCs and reserve frontier models for boss dialogue.
What this taught me about production AI
Three lessons I carried directly into Asansoft.
Cache like it's free. The single biggest lever for cost and latency is "did I already generate this?" Anything LLM-shaped I build gets a cache layer in front of it. The cache key matters more than the model choice.
Streaming hides everything. Token streaming + thoughtful UI is the difference between "this is broken" and "this is thinking." I reuse the exact pattern Game of YU used anywhere I put an LLM behind a UI — a placeholder bubble, then streamed text, then a fade.
Sanitize your prompt construction. Anything that comes from user state into a prompt is an injection surface. I write little Zod schemas now for every LLM-facing payload and reject anything weird before it gets in.
Why I still talk about it
Game of YU is a student project. I'm not pretending it's a shipped product. But it's the place where I learned the engineering shape of AI features, before AI features became table stakes. Every "AI-native engineer" claim I make on this site routes back through those four months — what cached, what didn't, what broke, and what taught me the difference.
Talk AI? Always.
Tartış: X · LinkedIn. Ya da e-posta: hello@metehanugus.com.