Engineeringbeadsagentic-coding+4

We Fixed Genie's Memory Problem, Here's How

A field note on Beads and what persistent task memory does for agentic coding. How we solved the coordination problem in multi-agent workflows.

8 min read
Devabeads, agentic-coding, multi-agent, task-memory, genie, AI-agents

We Fixed Genie's Memory Problem, Here's How

A field note on Beads and what persistent task memory does for agentic coding

We Fixed Genie's Memory Problem
We Fixed Genie's Memory Problem

There's a specific kind of frustration that comes with agentic coding. You set up a task. The agent starts strong. Forty minutes in, it's doing something you already asked it to fix an hour ago. Or it marks a task complete that clearly isn't. Or two agents working in parallel quietly overwrite each other's progress and neither notices.

The agent isn't broken. It isn't hallucinating wildly. It just has no memory of what came before.

That's the problem Beads solves. And once you've used it, the before feels embarrassingly fragile.

The Memory Problem in Agentic Coding

Most developers working with coding agents hit this wall eventually. Short tasks — write this function, fix this bug, refactor this file — work fine. The context fits inside one session. The agent holds it comfortably.

Longer tasks are where things unravel. The agent starts managing its own memory through markdown files. A TODO.md here. A PLAN.md there. It feels organised. It isn't.

Markdown is just text — unstructured, unvalidated, and completely unaware of task dependencies. An agent updating its own plan file mid-session will, with enough complexity, accidentally delete sections, create conflicts when a second agent touches the same file, or simply misread its own previous entries and proceed as if completed work is still pending.

The deeper issue: there's no enforcement. Nothing stops an agent from starting Task B while Task A is still blocked. Nothing surfaces that two agents are about to work on the same file from different directions. Nothing persists reliably across session restarts.

Markdown is a suggestion. What agentic coding needs is a constraint layer.

What Beads Is

Beads is a distributed, Git-backed task database built specifically for AI agents — not for humans to read, but for agents to operate against.

Three properties define it:

Persistence. An agent's task state survives session restarts, crashes, and model switches. The work context doesn't live in the conversation window — it lives in the repository. An agent coming back after an interruption reads the task state from Beads and continues from exactly where things were.

Graph structure. Tasks have dependencies that Beads enforces. An agent cannot mark Task B in progress if Task A is still blocked. This isn't a convention or a comment in a markdown file — it's a constraint the system applies. Agents work in the right order because the structure requires it.

Agent-optimised interface. The CLI and tool outputs are designed to use minimal tokens and return unambiguous state. This isn't a nice-to-have — in agentic workflows, every unnecessary token in a tool response is noise that degrades reasoning quality over long sessions.

Under the hood, tasks are stored in a .beads/ directory using JSONL format — append-only, line-by-line, which makes Git merges conflict-resistant by design. A local SQLite database sits alongside it as a read-model cache for performance.

It installs in three lines and initialises in any project directory.

Beads task status and dependency graph
Beads task status and dependency graph

Before Beads — What Genie's Setup Actually Looked Like

We run Genie — our autonomous agent platform — across multiple concurrent sessions. Agents handling research, development, testing, documentation, running in parallel, reading shared context, building toward the same product goals.

Before Beads, the memory layer was markdown. MEMORY.md for high-level context. CURRENT_WORK.md for active tasks. Date-stamped log files for history. Agents wrote to these files, read from them, and updated them as work progressed.

It worked well enough for single-agent, short-horizon tasks. At scale, the cracks were consistent.

Agents would complete a session mid-task, restart, re-read the markdown, and either repeat work they'd already done — because the file hadn't been updated cleanly before the restart — or skip work that was marked complete but wasn't.

Two agents working in parallel on related tasks would occasionally reach the same file from different directions. The conflict wasn't always caught immediately.

The most common failure mode wasn't dramatic. It was quiet. An agent confidently working on the wrong priority because the markdown state had drifted from reality.

Genie was moving. It wasn't always moving in the right direction.

Before Beads vs After Beads
Before Beads vs After Beads

After Beads — What Actually Changed

The immediate difference was restarts. An agent interrupted mid-task — session timeout, manual stop, model switch — comes back to the same task state it left. Not because we wrote better handoff prompts. Because the state is stored in Git, not in the conversation window. The agent reads Beads, understands exactly where things are, and continues.

The subtler difference was dependency enforcement. With markdown, task ordering was advisory. An agent could read the plan and decide Task B was ready even if Task A had an open blocker — because nothing in the system prevented that decision. With Beads, the dependency graph enforces order. Agents work on what's actually unblocked. The queue reflects reality, not intent.

For parallel agents, the change was most visible. Two Genie agents working concurrently now operate against the same Beads task board. Each agent knows what the other has claimed. JSONL's append-only structure means simultaneous writes don't create merge conflicts — each agent's update is a new line, not an overwrite.

The coordination that previously required careful prompt engineering and manual task assignment happens at the infrastructure level.

Genie's agents didn't get smarter. The environment got more honest.

After Beads — structured workflow
After Beads — structured workflow

Why It Matters More in Multi-Agent Setup

Single-agent Beads integration is useful. Multi-agent is where it earns its place.

The core challenge in multi-agent orchestration isn't intelligence — it's coordination. Agents need to know what's been claimed, what's blocked, what's complete, and what's next — without that information living inside any single agent's context window, where it's invisible to everyone else.

Beads is the shared task board that makes this possible. Every agent reads from the same Git-backed state. Every update is immediately visible to the others. Role Beads give agents persistent identities within the system — not just session-level awareness but project-level continuity.

Multi-agent coordination with Beads
Multi-agent coordination with Beads

This is also the architecture that GasTown — Steve Yegge's multi-agent workspace environment — runs on at its core. In GasTown, the Mayor generates work as Beads tickets and delegates them to worker rigs. The entire coordination layer runs through Beads.

GasTown needs Beads to function. Beads doesn't need GasTown. That's a meaningful distinction. Beads is a standalone tool. It works with any coding agent — Claude, Cursor, Windsurf — in any project. You're not adopting a framework. You're adding a memory layer.

The Practical Setup

Beads runs as a CLI. Installation is minimal:

# Install
curl -fsSL https://raw.githubusercontent.com/steveyegge/beads/main/scripts/install.sh | bash

# Initialise in your project
cd your-project
bd init

# Tell your agent to use it
echo "Use 'bd' for task tracking" >> AGENTS.md

For teams working on shared repositories without wanting to commit Beads state to the main branch, bd init --stealth runs it locally without touching the repo. For contributors on forked open-source projects, --contributor routes planning state to a separate location entirely.

There's also a community Rust port — br — for those who want a stable, frozen version of the core architecture independent of GasTown's ongoing development. Steve Yegge endorsed it directly.

What This Changes About Agentic Coding

The narrative around AI coding agents focuses heavily on model capability. Which model reasons better. Which one writes cleaner code. Which one handles edge cases.

That's real. But it's not the bottleneck in longer, multi-agent workflows.

The bottleneck is the memory layer. Specifically, the absence of one that's structured, persistent, and enforced.

An agent with good reasoning but no reliable task state will repeat work, skip dependencies, and drift from the actual project state as sessions accumulate. An agent with average reasoning and a well-structured Beads setup will outperform it on anything beyond a single sitting.

The memory layer isn't a quality-of-life improvement. It's a reliability primitive.

Beads is currently the most focused solution to that specific problem. Not because it's complex — it isn't — but because it's built for exactly this use case and nothing else.


Agentic coding is only as reliable as the infrastructure underneath it. Right now, for most teams, that infrastructure is a handful of markdown files and optimistic prompt engineering.

Beads is a more honest foundation.

We integrated Beads into Genie's multi-agent setup for NOXX. The observations above are from that experience. If you're running parallel agents on anything beyond short-horizon tasks, it's worth the three-line install to find out what you've been missing.

Share

Post on social or copy the link to share anywhere.