youngReader
A two-year personal project on agentic LLMs, image generation, and getting a finished children's book out of a story seed.
youngReader is a personal project I built and iterated on between 2023 and 2025: an end-to-end platform that takes a story seed for a child — their name, what they’re into, a vibe — and produces a finished, KDP-printable book. Cover, layout, illustrations, prose, the whole thing.
It started as a curiosity. I wanted a place to push on the parts of agentic AI that don’t get nice demo videos — tool calling that has to actually work, image generation that has to keep the same protagonist’s face across twenty pages, hierarchical orchestration that has to hold a story together, codegen that has to obey real constraints. Children’s books are a small enough domain to fit in your head and big enough to make all of those problems show up at the same time.
The stack settled into Python 3.11 with Poetry, Pydantic v2 for every data model, LangChain + LangGraph for orchestration, ReportLab for PDF layout, Pillow for image composition, and Selenium at the end for the actual KDP publishing handoff. Four chapters below, each one a problem that shaped the architecture.
The blank-page problem.
The earliest version was a hierarchical generator: take a story seed, expand it into a multi-chapter outline with rising action and resolution, then expand each outline node into prose tuned to a target reading level. The orchestration is a small graph of specialised agents, each subclassing a BaseAgent and producing structured output through PydanticOutputParser:
SeriesCoordinator— turns the seed into a series overview JSON.BookPlanningAgent— turns the overview into a chapter-by-chapter plan with scene blocks.ContentGenerationAgent— turns each plan node into prose, given a reading level and aprevious_chapters_summaryfor continuity.
Default model is gpt-4o through langchain_openai.ChatOpenAI, with a config layer that also supports Anthropic’s Claude 3.x family without code changes. No silent fallbacks — the provider/model is explicit per agent and the system fails fast if a key is missing.
The story seed
A child's name, their interests, and a general world-vibe to ground the generation.
> Theme: "Space exploration, bravery"
> Vibe: "Whimsical, Dr. Seuss tone"
What broke was voice. Drafting individual chapters in isolation produced prose that drifted — chapter one was Dr. Seuss, chapter five had crept toward a corporate explainer. The fix was bounded re-passes using a ConsistencyAgent that reads finished chapters with the original tone-spec and the running summary in hand and flags inconsistencies for a targeted rewrite. The model wasn’t asked to “be consistent”; it was asked to find the inconsistencies, which is a much easier prompt.
Holding a face steady.
The hardest part of automated illustrated books is keeping a character’s appearance consistent across twenty images. Diffusion models will happily give you a different protagonist on every page if you let them. The naive prompt — “a young girl with red hair” — converges on a different person every roll.
The model that fixed it is a Pydantic CharacterProfile — not a markdown doc, a typed object — serialised as JSON and cached by a CharacterManager. Roughly:
{
"character_id": "luna_brave",
"physical_description": "curly red hair, green eyes, freckles",
"outfit": "yellow space helmet, mustard suit",
"art_style": "watercolor, studio ghibli inspired",
"color_palette": ["#dc2626", "#fde047", "#0f766e"]
}Character description engineering
A hyper-detailed Identity Doc anchors the protagonist's appearance in latent space.
"hair": "curly, vivid red, yellow space helmet",
"eyes": "large, green",
"style": "watercolor, studio ghibli inspired"
Three independent mechanisms work together to enforce consistency, layered so each compensates for the others’ limits:
- Identity DNA lives outside any single prompt. Scene prompts reference the protagonist by
character_id; a small compiler joins the profile to the scene description right before the API call. No hand-written prompt has to remember to describe Luna; that’s a lookup, not a creative choice. - DALL-E 3 Generation IDs carry across calls inside an OpenAI session, anchoring the protagonist to the same visual identity across chapters at ~99% consistency in practice.
- Midjourney’s
--crefparameter does the same for the Midjourney provider, with a separateMidjourneyImageServerkeeping a reference image stable across requests.
A consistency_method enum on the request lets the same pipeline switch providers without the rest of the system caring whether it’s Generation IDs, character refs, LoRAs, or fixed seeds. The lesson generalises beyond books: anytime you need an LLM or diffusion model to be consistent, the answer is rarely “ask harder.” It’s externalising the thing that needs to stay the same, treating it as an addressable piece of state, and composing it into every relevant call.
Discipline as the load-bearing piece.
By late 2024 the codebase was big enough that I was leaning hard on Cursor + Claude as a coding partner. The fashionable name is vibe coding; a more honest description is “AI-assisted, type-safety-first.” The agents are good at generating code; they are bad at being right. The discipline around them is what makes the output actually ship.
What that looked like, concretely:
- Pydantic v2 everywhere. Every domain model —
CharacterProfile,BookPlan,ContentGenerationRequest,KDPCoverDimensions— is a Pydantic class. Untyped dicts are not allowed past the agent boundary. If the LLM hands back malformed structured output, parsing fails loudly rather than silently producing garbage downstream. mypystrict mode.disallow_untyped_defs = Trueinpyproject.toml, applied to the whole package. The static type system is a second reviewer that doesn’t get bored.- Pytest with an 80% coverage gate.
tests/unit,tests/integration,tests/e2e— and the e2e tier actually generates a small book end-to-end against real APIs (gated behind@pytest.mark.slow). When a generated change breaks book generation, the suite says so before I do. - Black, isort, flake8, pre-commit. The boring layer. Every code change normalises before it lands; review never argues about formatting.
1. Supervisor intent
Natural-language architecture direction.
2. Rigid bounds & tests
Type strictness, outcome-driven testing.
3. Autonomous codegen
Agents iterate against the test suite.
This is the part that surprised me most. The bench is the load-bearing piece, not the model. Once the typing is strict, the tests are real, and the agent boundaries are typed, AI-assisted code is just faster code — the failure modes that matter all get caught in CI rather than by a confused reader six weeks later.
Print-ready, deterministically.
The final hurdle was the least AI-shaped one. A finished book has to be print-grade: bleed margins, trim sizes, gutter calculations from page count, spine width derived from page thickness, CMYK-correct cover artwork. None of that wants a language model anywhere near it — it wants math and a layout engine.
The pipeline does three deterministic things, all in Python:
- Layout — ReportLab Platypus maps prose to pages, sizes images to fit text blocks, and applies consistent typography. Trim sizes are KDP enums (
KDP_5X8,KDP_506); page size is resolved at run time. - Bleed and trim — 0.125” bleed on all sides, gutter (inside) margin of 0.375”, outside/top 0.375”, bottom 0.25”, computed at 300 DPI. KDP rejects anything off-spec; the math has to be exact.
- Cover wrap — a
KDPCoverGeneratorcomposes back, spine, and front into a single image with the spine width derived from the formula KDP publishes (page_count * 0.0025"cream,* 0.002252"white). Pillow handles RGBA→RGB normalisation, the 650 MB upload ceiling, and the actual composite. Title typography is overlaid programmatically.
Algorithm-driven formatting
Internal layout maps prose to pages, sizes images to fit text blocks, and applies consistent typography.
The output of the pipeline is two files — book_N_kdp_ready.pdf and book_N_complete_cover.png — that pass Amazon’s automated checks without human intervention. A KDPSeriesAgent handles the actual upload via Selenium, including 2FA, with a KDPPublishingAuditLog model recording every step. That last sentence is the whole project. The early demo of “look, an AI can write a chapter” was easy; the last 20% was doing it deterministically enough to ship to a real distribution channel.
The CLI is a series of session-based commands — generate-idea, generate-book, generate-pdf, prepare-kdp, publish-kdp — each writing artifacts into output/session_<timestamp>/ with a structured registry. State lives on disk, not in memory; any step can be re-run from its inputs.
The codebase is open source on GitHub. It’s a snapshot of how to put fast-moving AI components together into a thing that actually produces a finished artifact — less a tutorial, more a worked example.