momentum

Building Blog-Archive Regeneration

24 April 2026 · Ian · 6 min read

What We’re Building

I’m regenerating the future blog archive on debrief.github.io from the specs directory in one shot. The script walks every shipped spec under specs/, emits a Building [Feature] post per standalone spec, a single rollup post per complete epic (replacing per-spec posts for its members), and composite posts where two or three standalone specs shipped close together on a shared theme. The output is a set of generated post files plus one ARCHIVE-REBUILD.md at the repo root — an index, an unresolved-groupings section, and a runbook the debrief.github.io maintainer uses to wipe and republish the archive without a follow-up question.

The reason this exists is that the current archive is a layered mix of planning posts, shipped posts, and LinkedIn drafts written under drifting conventions. It predates the Building [Feature] title pattern and the cached-opener contract that PR #511 introduced. Rather than hand-editing a hundred-plus files, I’d rather regenerate from the source of truth — the specs themselves — and accept that a handful of edge cases will surface in the index for human adjudication.

How It Fits

This is infrastructure, not a platform feature. It lives at scripts/regenerate-blog-archive.py for exactly one PR and is deleted in the same PR that commits its output — FR-009 is explicit about that. It sits one layer above the media workflow already documented in .claude/agents/media/content.md: same voice, same three-section opener structure, same evidence directory conventions. The script reads specs/*/, BACKLOG.md, and docs/ideas/E*.md; writes only new files; refuses to overwrite anything; and stages every write in a tempdir so a mid-run failure rolls back cleanly.

Key Decisions

BACKLOG.md is the primary epic charter source, not NNN-epic-*/spec.md. The spec assumes charter directories exist; the repo doesn’t actually have them. Rather than invent synthetic charters or silently paper over the gap, the planner shifted to BACKLOG.md’s Epics table, with docs/ideas/E*.md as enrichment and [Ex] title prefixes as fallback. The spec’s intent — surface mismatches to the author rather than reconcile them silently — survives; the mechanism changed.
Verbatim copy when a cached opener exists; deterministic synthesis when it doesn’t. Synthesised openers get a visible HTML comment and an index flag so the maintainer knows which posts were written from spec slices rather than planning-time framing. No paraphrasing of existing cached openers, ever.
Composite clustering is narrow on purpose. ≤ 5 day proximity plus ≥ 1 shared tag (after filtering tracer-bullet, shipped, debrief). 6–10 day near-misses land in the index for manual promotion rather than auto-grouping, because a wrong composite is harder to unpick than a missing one.
Offline-safe, gh optional. When the CLI isn’t available, the script falls back to the committed shipped-post.md as the PR-description proxy and records the provenance source per spec in the index.
One-shot, not productised. Tests, golden fixtures, the dry-run smoke — all go when the script goes. The cost of maintaining a blog regenerator forever is higher than the cost of rerunning this exercise if we ever need to.

Screenshots

Not applicable — this is infrastructure; the handoff artefact is the output. The real screenshot is the committed diff: 73 new markdown files under specs/*/media/ + one ARCHIVE-REBUILD.md at repo root. The index table is the visual; see specs/228-regenerate-blog-archive/evidence/dry-run-index.md for a snapshot.

By the Numbers


Spec directories scanned	155
Shipped (eligible)	129
Unified posts generated	56
Epic rollups generated	3
Composite posts generated	14
Epic members (absorbed into rollups)	29
In-flight specs skipped	26
Unresolved groupings surfaced	43
Total generated files	73 posts + 1 index
Existing files modified by the generator	0
Run duration	0.3 s
Tests passing	54 / 54
Coverage	77 %
Contract tests (C1–C11)	11 / 11 green
New runtime dependencies	0
Lines of generator code	~1100 (deleted same PR)

Lessons Learned

The sharp edge was NNN collisions. The speckit numbering scheme was not strictly unique — 001-debrief-stac, 001-shared-react-components, and four others all share the leading number. The first pass keyed everything by number: int, and the coverage invariant tripped on the first real run: “spec 001 classified twice.” The fix was to key every lookup by the full slug ({number:03d}-{slug}) rather than the numeric prefix. It touched dataclasses, the classifier, the union-find clusterer, and the index row lookup — but the invariant surfaced it at the first execution, not the tenth.

The other lesson was tag noise. The initial NOISE_TAGS list (tracer-bullet, shipped, debrief) wasn’t enough; the first dry run produced a 13-member composite cluster because specs broadly share tags like stac, vscode-extension, and typescript. I widened NOISE_TAGS once in situ — the Open Question the spec left open was real, and the remediation path the plan documented worked exactly as written. Two clusters still exceed the 5-member preferred cap; they warn loudly and the human author can split them before publication if desired.

Third: the tense-inverted twin heading pattern. Many shipped posts start with ## What We Built or ## What Shipped, which reads fine on its own but sits awkwardly next to the cached opener’s ## What We're Building. The data-model review patch added a stitch rule that detects the twin, strips the duplicate heading, and splices its opening paragraph onto the tail of ## Key Decisions. Small detail, big readability win.

What’s Next

The generator gets deleted in this same PR (FR-009). The debrief.github.io maintainer wipes the existing _posts/future/ directory, copies the 73 generated files across, applies the four-step runbook from ARCHIVE-REBUILD.md, and rebuilds the Jekyll site. The source is preserved in git history for the single-digit probability we need to regenerate again.

Two near-term follow-ups I’d expect to see surface after the archive is live:

Composite quality review. The 7-member comp-052…098 cluster and the 6-member comp-185…190 cluster are technically valid by the FR-003 rules but thin as narrative. The author may want to break them into smaller pairs or promote the larger one into a retroactive epic rollup. The index flags them clearly; the editorial call is outside the script.
Legacy charter handling for E07 and E10. Both epics exist in BACKLOG with shipped members but no docs/ideas/Exx-*.md companion. They’re flagged as legacy-charter in Unresolved Groupings. The rollup still generates from the BACKLOG description; whether to author the idea docs retroactively is a judgement call.

If either of those turns into ongoing editorial work, the option to rerun this script lives one git show away from the commit that deleted it.