<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://debrief.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://debrief.github.io/" rel="alternate" type="text/html" /><updated>2026-03-06T12:27:26+00:00</updated><id>https://debrief.github.io/feed.xml</id><title type="html">Debrief Website</title><subtitle>Debrief Maritime Analysis Tool – Powerful, Fast, Free, and Intuitive</subtitle><entry><title type="html">Shipped: Dual-Platform E2E Tests — 18 Spec Files With Real Services</title><link href="https://debrief.github.io/shipped-dual-platform-e2e-tests" rel="alternate" type="text/html" title="Shipped: Dual-Platform E2E Tests — 18 Spec Files With Real Services" /><published>2026-03-06T00:00:00+00:00</published><updated>2026-03-06T00:00:00+00:00</updated><id>https://debrief.github.io/shipped-dual-platform-e2e-tests</id><content type="html" xml:base="https://debrief.github.io/shipped-dual-platform-e2e-tests"><![CDATA[<h2 id="what-we-built">What We Built</h2>

<p>A month ago, the VS Code E2E test suite had 8 spec files – all skipped. The web-shell had 81 tests across 13 categories, all passing, but those tests exercised orchestration through mock data. Nothing verified that a scientist could open a real REP file in the real extension, see real tracks parsed by real Python services, select features, and run analysis tools end-to-end.</p>

<p>Now the VS Code E2E suite has 18 active spec files. Four previously-skipped specs have been restored with live assertions. Ten new spec files cover selection sync, time controller, drawing tools, catalog browsing, log panel, edit face, event propagation, styling tools, undo/redo, and evidence capture. The <code class="language-plaintext highlighter-rouge">DebriefWebview</code> page object gained 40+ new selectors and methods to support all of this.</p>

<p>Both suites run in parallel CI. The web-shell tests (~30 seconds, mock data, 13 specs) catch orchestration regressions fast. The VS Code E2E tests (~3 minutes, real services, 18 specs) catch the integration problems that only surface when debrief-io parses an actual REP file and debrief-stac stores actual STAC Items.</p>

<h2 id="how-it-works">How It Works</h2>

<p>The VS Code E2E tests drive openvscode-server with the Debrief extension sideloaded. Behind the scenes, three Python services – debrief-io, debrief-stac, and debrief-calc – are running and reachable. When a test opens a REP file, the extension calls debrief-io to parse it, debrief-stac to catalogue it, and debrief-calc to run analysis. The test then inspects the webview DOM to verify tracks rendered and results appeared.</p>

<div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">test</span><span class="p">(</span><span class="dl">'</span><span class="s1">loads REP file and shows tracks</span><span class="dl">'</span><span class="p">,</span> <span class="k">async</span> <span class="p">({</span> <span class="nx">codeServerPage</span> <span class="p">})</span> <span class="o">=&gt;</span> <span class="p">{</span>
  <span class="k">await</span> <span class="nx">codeServerPage</span><span class="p">.</span><span class="nx">openFile</span><span class="p">(</span><span class="dl">'</span><span class="s1">samples/boat1.rep</span><span class="dl">'</span><span class="p">);</span>

  <span class="kd">const</span> <span class="nx">frame</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">codeServerPage</span><span class="p">.</span><span class="nx">getWebviewFrame</span><span class="p">();</span>
  <span class="k">await</span> <span class="nx">frame</span><span class="p">.</span><span class="nx">locator</span><span class="p">(</span><span class="dl">'</span><span class="s1">.leaflet-container</span><span class="dl">'</span><span class="p">).</span><span class="nx">waitFor</span><span class="p">({</span> <span class="na">state</span><span class="p">:</span> <span class="dl">'</span><span class="s1">visible</span><span class="dl">'</span> <span class="p">});</span>

  <span class="kd">const</span> <span class="nx">trackCount</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">frame</span><span class="p">.</span><span class="nx">locator</span><span class="p">(</span><span class="dl">'</span><span class="s1">.leaflet-interactive</span><span class="dl">'</span><span class="p">).</span><span class="nx">count</span><span class="p">();</span>
  <span class="nx">expect</span><span class="p">(</span><span class="nx">trackCount</span><span class="p">).</span><span class="nx">toBeGreaterThan</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="p">});</span>
</code></pre></div></div>

<p>That last assertion – <code class="language-plaintext highlighter-rouge">toBeGreaterThan(0)</code> rather than <code class="language-plaintext highlighter-rouge">toEqual(3)</code> – is deliberate. Real service output varies. Structural assertions (“at least one track exists”) are resilient to changes in sample data or parsing improvements. Value-exact assertions against real data break constantly for the wrong reasons.</p>

<h2 id="by-the-numbers">By the Numbers</h2>

<table>
  <thead>
    <tr>
      <th> </th>
      <th> </th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>VS Code E2E spec files</td>
      <td>18</td>
    </tr>
    <tr>
      <td>VS Code E2E active tests</td>
      <td>~25</td>
    </tr>
    <tr>
      <td>VS Code E2E fixme tests</td>
      <td>~28</td>
    </tr>
    <tr>
      <td>Web-shell spec files</td>
      <td>13</td>
    </tr>
    <tr>
      <td>Web-shell active tests</td>
      <td>81+</td>
    </tr>
    <tr>
      <td>New page object methods</td>
      <td>40+</td>
    </tr>
    <tr>
      <td>Platforms tested in parallel</td>
      <td>2</td>
    </tr>
  </tbody>
</table>

<h2 id="the-testfixme-strategy">The test.fixme() Strategy</h2>

<p>Of the 18 VS Code E2E spec files, 10 contain tests marked <code class="language-plaintext highlighter-rouge">test.fixme()</code>. These cover features that don’t yet exist in the extension – time controller, drawing tools, styling, undo/redo, and others. The tests are written. The assertions are specified. The features aren’t implemented yet.</p>

<div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">test</span><span class="p">.</span><span class="nx">fixme</span><span class="p">(</span><span class="dl">'</span><span class="s1">time scrubber updates map display</span><span class="dl">'</span><span class="p">,</span> <span class="k">async</span> <span class="p">({</span> <span class="nx">codeServerPage</span> <span class="p">})</span> <span class="o">=&gt;</span> <span class="p">{</span>
  <span class="c1">// Time controller not yet implemented in VS Code extension</span>
  <span class="c1">// See backlog: time-controller feature</span>
<span class="p">});</span>
</code></pre></div></div>

<p>We chose <code class="language-plaintext highlighter-rouge">test.fixme()</code> over <code class="language-plaintext highlighter-rouge">.skip()</code> for a specific reason: fixme tests appear in Playwright reports as known gaps. They’re visible. They cross-reference backlog items. When someone implements the time controller feature, the test is already waiting – remove the <code class="language-plaintext highlighter-rouge">.fixme()</code> wrapper and it either passes or tells you what’s broken. With <code class="language-plaintext highlighter-rouge">.skip()</code>, these tests would vanish from reports entirely, and the gaps they represent would be invisible.</p>

<p>This turned the E2E expansion into a feature-completeness audit. Writing 28 fixme tests documented exactly what the extension doesn’t do yet, in executable form.</p>

<h2 id="page-object-architecture">Page Object Architecture</h2>

<p>Two page objects handle the VS Code E2E environment:</p>

<ul>
  <li><strong>CodeServerPage</strong> manages VS Code chrome – command palette, Quick Open, file navigation, the Welcome tab focus trap, keyboard shortcuts</li>
  <li><strong>DebriefWebview</strong> manages the extension’s webview – iframe traversal, Leaflet map interactions, feature list, tools panel, selection state</li>
</ul>

<p>This separation matters because VS Code’s webview sits inside nested iframes. The test code that opens a file through Quick Open is fundamentally different from the code that clicks a track on the map. Mixing them makes tests fragile. Keeping them in separate page objects makes the iframe boundary explicit.</p>

<h2 id="lessons-learned">Lessons Learned</h2>

<p><strong>Structural assertions save maintenance time.</strong> Early drafts of the restored specs used exact value checks against real REP file output. Those broke immediately when we updated a sample file. Switching to existence-based assertions (“at least one track”, “tool result contains measurement text”) made the tests resilient without sacrificing confidence. If zero tracks render, the test still fails.</p>

<p><strong>Writing tests for unimplemented features is useful work.</strong> The 28 fixme tests forced us to think through what each feature’s testable behaviour should look like, before writing any implementation code. Several of those tests revealed UX questions we hadn’t considered – what should the time controller’s DOM look like? How should drawing tool state be inspectable from Playwright? Those questions are now documented in the test files themselves.</p>

<p><strong>Dual-platform testing catches different bugs.</strong> During development, the web-shell tests passed consistently while a VS Code E2E test failed on catalog browsing. The issue was an extension-specific activation timing problem that the web-shell’s simpler lifecycle couldn’t reproduce. Two test surfaces, two classes of bugs caught.</p>

<h2 id="whats-next">What’s Next</h2>

<p>The 28 fixme tests are now a prioritised implementation queue. As each extension feature ships – time controller, drawing tools, styling panel – the corresponding fixme tests activate and immediately verify the feature works end-to-end with real services.</p>

<p>The CI pipeline runs both suites in parallel, so new features get validated against mock data in 30 seconds and against real services in 3 minutes, without blocking each other.</p>

<p>-&gt; <a href="https://github.com/debrief/debrief-future/tree/main/specs/005-e2e-workflow-tests">See the spec</a>
-&gt; <a href="https://github.com/debrief/debrief-future/tree/main/specs/005-e2e-workflow-tests/evidence">View the evidence</a></p>]]></content><author><name>Ian</name></author><category term="tracer-bullet" /><category term="testing" /><category term="e2e" /><category term="playwright" /><category term="vscode" /><category term="web-shell" /><summary type="html"><![CDATA[VS Code E2E suite expanded from 8 skipped specs to 18 active files, driven by real Python services parsing real REP data.]]></summary></entry><entry><title type="html">Shipped: Logical Result ID Registry</title><link href="https://debrief.github.io/logical-result-id-registry" rel="alternate" type="text/html" title="Shipped: Logical Result ID Registry" /><published>2026-02-13T00:00:00+00:00</published><updated>2026-02-13T00:00:00+00:00</updated><id>https://debrief.github.io/logical-result-id-registry</id><content type="html" xml:base="https://debrief.github.io/logical-result-id-registry"><![CDATA[<h2 id="what-we-built">What We Built</h2>

<p>The Logical Result ID Registry provides the indirection layer that lets result views stay synchronized with tool outputs. When a tool produces <code class="language-plaintext highlighter-rouge">bt_plot_001_v1.png</code>, the registry maps the stable ID <code class="language-plaintext highlighter-rouge">bt_plot_001</code> to that file. When the analyst tunes a parameter and the tool produces v2, the registry updates the mapping and emits a change event. Views that subscribed to that result ID get notified. No polling, no scanning, no coupling between tool execution and result display.</p>

<p>37 new tests, 521 total tests passing, zero regressions.</p>

<h2 id="how-it-works">How It Works</h2>

<p>The registry lives in <code class="language-plaintext highlighter-rouge">@debrief/session-state</code> as a pure in-memory Map with callback subscriptions. It populates from two sources: STAC asset metadata when a plot loads, and Log Service entries when tools run.</p>

<p>On plot load, <code class="language-plaintext highlighter-rouge">hydrateFromAssets()</code> scans STAC assets for <code class="language-plaintext highlighter-rouge">debrief:resultId</code> and <code class="language-plaintext highlighter-rouge">debrief:version</code> metadata. For each logical ID, it selects the highest version. A plot with <code class="language-plaintext highlighter-rouge">bt_plot_001_v1</code> and <code class="language-plaintext highlighter-rouge">bt_plot_001_v2</code> assets hydrates with v2 as the current mapping. Hydration is bulk initialization – no change events fire, keeping the initial load quiet.</p>

<p>When a tool executes, the Log Service records a <code class="language-plaintext highlighter-rouge">RecordResult</code> entry with <code class="language-plaintext highlighter-rouge">generatedResultId</code> and <code class="language-plaintext highlighter-rouge">generated</code> fields. The registry observes these entries through <code class="language-plaintext highlighter-rouge">registerFromRecordResult()</code>, extracts the result ID and file path, and updates its mapping. This time a change event fires, notifying any subscribers that the result has updated.</p>

<p>Subscriptions come in two flavors. <code class="language-plaintext highlighter-rouge">subscribe(resultId, callback)</code> delivers notifications only for a specific result ID. <code class="language-plaintext highlighter-rouge">subscribeAll(callback)</code> delivers notifications for any change. Both return an unsubscribe function. When a result view closes, it unsubscribes, and the registry cleans up the callback. When the plot closes, <code class="language-plaintext highlighter-rouge">clear()</code> wipes all mappings and subscriptions.</p>

<p>The VS Code extension creates the registry in <code class="language-plaintext highlighter-rouge">extension.ts</code>, hydrates it in <code class="language-plaintext highlighter-rouge">openPlot.ts</code>, updates it in <code class="language-plaintext highlighter-rouge">executeTool.ts</code> after tool runs, and again in <code class="language-plaintext highlighter-rouge">logPanelView.ts</code> after replay operations like tune and revert.</p>

<h2 id="integration-points">Integration Points</h2>

<p>The registry provides three registration methods, each tailored to a different data source:</p>

<p><code class="language-plaintext highlighter-rouge">registerFromLogEntry()</code> handles raw Log entries from the Log Service. Used internally but exposed for completeness.</p>

<p><code class="language-plaintext highlighter-rouge">registerFromRecordResult()</code> consumes the structured result from <code class="language-plaintext highlighter-rouge">logService.recordToolResult()</code>. This is what <code class="language-plaintext highlighter-rouge">executeTool.ts</code> calls after a tool finishes – one line of code, registry updates automatically.</p>

<p><code class="language-plaintext highlighter-rouge">registerFromReplayResult()</code> processes artifacts created during replay operations. When the analyst tunes a parameter, the tuned tool execution creates new artifacts. <code class="language-plaintext highlighter-rouge">logPanelView.ts</code> passes <code class="language-plaintext highlighter-rouge">replayResult.artifactsCreated</code> to this method, and the registry updates with the new versions.</p>

<p>All three methods produce identical change events. A subscriber doesn’t know or care whether an update came from a tool run, a tune, or a revert. It just knows the result changed.</p>

<h2 id="lessons-learned">Lessons Learned</h2>

<p>Synchronous operations simplified everything. The registry uses simple Maps and callbacks – no promises, no async, no queues. The JavaScript event loop guarantees ordering, so rapid successive updates for the same result ID produce correctly sequenced change events. We tested this explicitly: register v1, immediately register v2, verify two change events fire in order.</p>

<p>Observing Log entries rather than hooking tool execution preserved separation of concerns. The registry knows nothing about MCP, tool definitions, or parameter schemas. It consumes structured data from the Log Service. The Log Service knows nothing about result ID mappings or subscriptions. It emits structured data. Neither depends on the other’s internals.</p>

<p>Bulk hydration without events was non-negotiable. Early prototypes fired change events during <code class="language-plaintext highlighter-rouge">hydrateFromAssets()</code>. Every asset produced a notification. For a plot with 20 result artifacts, that meant 20 callbacks before the plot even opened. The current design treats hydration as initialization, not as live updates. Only runtime changes emit events.</p>

<p>Storing MIME type and version during hydration turned out useful. The registry started as a pure ID-to-path map. Testing revealed that views often need the MIME type to choose a renderer and the version number to display staleness indicators. We extended the mapping to include these fields when available from STAC metadata, keeping them null when registering from Log entries.</p>

<h2 id="whats-next">What’s Next</h2>

<p>Feature #089 (Auto-Refresh) consumes the registry’s change events. When a result view is open and its underlying result updates, the view subscribes to that result ID. The change event arrives, the view checks the <code class="language-plaintext highlighter-rouge">debrief.autoRefreshArtifacts</code> setting, and if enabled, reloads the content while preserving viewport state.</p>

<p>The registry provides the notification mechanism. Auto-refresh provides the response behavior. Together they complete the workflow where an analyst tunes a parameter, watches the chart update automatically, and never thinks about versioned file paths.</p>

<blockquote>
  <p><a href="https://github.com/debrief/debrief-future/tree/main/specs/087-logical-result-id-registry">See the code</a>
<a href="https://github.com/debrief/debrief-future/blob/main/specs/087-logical-result-id-registry/evidence/test-summary.md">Test summary</a></p>
</blockquote>]]></content><author><name>Ian</name></author><category term="tracer-bullet" /><category term="results-visualization" /><category term="provenance" /><summary type="html"><![CDATA[Maps stable result IDs to current files, emits change events when tools re-run, sets foundation for auto-refresh]]></summary></entry><entry><title type="html">Shipped: Point and Rectangle Drawing</title><link href="https://debrief.github.io/shipped-point-rectangle-drawing" rel="alternate" type="text/html" title="Shipped: Point and Rectangle Drawing" /><published>2026-02-13T00:00:00+00:00</published><updated>2026-02-13T00:00:00+00:00</updated><id>https://debrief.github.io/shipped-point-rectangle-drawing</id><content type="html" xml:base="https://debrief.github.io/shipped-point-rectangle-drawing"><![CDATA[<h2 id="what-we-built">What We Built</h2>

<p>The shape palette from feature 093 let you click ‘+’, pick point or rectangle, and enter drawing mode. But that palette knew nothing about our data model — it just handed back raw Leaflet layers. Feature 094 is the glue: a pure factory function called <code class="language-plaintext highlighter-rouge">createDrawnFeature()</code> that converts Geoman’s raw GeoJSON into schema-compliant features.</p>

<p>Two shape types are supported. Points become <code class="language-plaintext highlighter-rouge">ReferenceLocation</code> features (kind=POINT) with green markers. Rectangles become <code class="language-plaintext highlighter-rouge">RectangleAnnotation</code> features (kind=RECTANGLE) with blue polygons. Each gets a UUID, default styling, and required schema properties. If you click without dragging in rectangle mode, the zero-area geometry is silently discarded — no error, no minimum-size snapping. The analyst didn’t intend a shape, so we don’t create one.</p>

<p>The function is pure. No side effects, no DOM access, no state mutations. You call it with GeoJSON plus the active drawing mode, and you get back a schema-compliant feature or null. This made testing straightforward — 33 unit tests without needing a map or browser.</p>

<h2 id="screenshots">Screenshots</h2>

<p>Point drawing in action:</p>

<p><img src="/assets/images/future-debrief/shipped-point-rectangle-drawing/storybook-screenshot-point.png" alt="Point drawn on map" /></p>

<p>Rectangle drawing with multiple features:</p>

<p><img src="/assets/images/future-debrief/shipped-point-rectangle-drawing/storybook-screenshot-rectangle.png" alt="Rectangle and point on map" /></p>

<h2 id="callback-based-integration">Callback-Based Integration</h2>

<p>The <code class="language-plaintext highlighter-rouge">LeafletToolbar</code> listens for Geoman’s <code class="language-plaintext highlighter-rouge">pm:create</code> event, extracts the raw GeoJSON, and fires an <code class="language-plaintext highlighter-rouge">onShapeCreated</code> callback. The consumer — VS Code webview or Storybook story — calls <code class="language-plaintext highlighter-rouge">createDrawnFeature()</code> and decides what to do with the result. The shared component library stays generic. The consumer owns the state update.</p>

<p>In VS Code, drawn features are immediately added to the active plot’s feature collection and auto-selected. In Storybook, they go into a <code class="language-plaintext highlighter-rouge">useState</code> array for demonstration purposes. Same conversion function, different destinations.</p>

<h2 id="what-held-up">What Held Up</h2>

<p>The default colour choices — green points, blue rectangles — were deliberately distinct from track colours (blue for ownship, red for contacts). Drawn annotations needed to be visually distinguishable from loaded data. We made these constants in a <code class="language-plaintext highlighter-rouge">drawingDefaults.ts</code> module. They’re fixed for now, but could be made configurable later if users have strong opinions about annotation colours.</p>

<p>The degenerate rectangle case (click without drag) came up during testing. Geoman fires <code class="language-plaintext highlighter-rouge">pm:create</code> even when the drag distance is zero. We added an area check to <code class="language-plaintext highlighter-rouge">isValidDrawnGeometry()</code> and return null if the rectangle has zero width or height. The shape is discarded silently. No toast, no error dialog. The analyst didn’t drag, so we assume they didn’t mean to create a shape.</p>

<p>Auto-selecting the newly drawn feature felt right — you’ve just placed a point or drawn a rectangle, you probably want to inspect or label it. The selection happens in the <code class="language-plaintext highlighter-rouge">onShapeCreated</code> handler, right after adding the feature to the collection. One less click for the analyst.</p>

<h2 id="test-coverage">Test Coverage</h2>

<p>33 unit tests, all passing. These covered geometry validation (zero-area rejection, coordinate pass-through, closed polygon rings), schema compliance (correct kind discriminators, required properties, UUID uniqueness), and default styling (green points, blue rectangles, opacity values).</p>

<p>13 e2e tests via Playwright against the Storybook story. These verified rendering across three theme variants (light, dark, VS Code), point and rectangle creation via actual map clicks, and screenshot capture for visual regression. Total duration 24.8 seconds.</p>

<p>No regressions. The existing ToolMatchHarness e2e suite (12 tests) still passes, confirming that drawing doesn’t interfere with tool matching or existing map interactions.</p>

<h2 id="whats-next">What’s Next</h2>

<p>Drawn features currently live in React state. They disappear when you close the webview. Persistence to STAC is feature 096 — we kept it separate so drawing could ship without coupling to the storage layer. Once 096 lands, drawn annotations will survive across sessions and appear in the STAC catalog alongside loaded data.</p>

<p>The pure function approach paid off. The core conversion logic is in four small files (<code class="language-plaintext highlighter-rouge">drawingDefaults.ts</code>, <code class="language-plaintext highlighter-rouge">isValidDrawnGeometry.ts</code>, <code class="language-plaintext highlighter-rouge">createDrawnFeature.ts</code>, and a barrel export), fully tested in isolation. When we wire up persistence, the conversion logic won’t change. We’ll just add a STAC write after the feature is created.</p>

<blockquote>
  <p><a href="https://github.com/debrief/debrief-future/tree/094-point-rectangle-drawing/shared/components/src/MapView/drawing">See the code</a></p>
</blockquote>]]></content><author><name>Ian</name></author><category term="tracer-bullet" /><category term="shape-drawing" /><category term="annotations" /><category term="geoman" /><summary type="html"><![CDATA[Analysts can now annotate maps with points and rectangles. The implementation is a pure function sitting between Geoman and the schema, with 46 tests.]]></summary></entry><entry><title type="html">Shipped: VS Code E2E Tests in a Sandboxed Environment</title><link href="https://debrief.github.io/shipped-e2e-tests-sandboxed" rel="alternate" type="text/html" title="Shipped: VS Code E2E Tests in a Sandboxed Environment" /><published>2026-02-07T00:00:00+00:00</published><updated>2026-02-07T00:00:00+00:00</updated><id>https://debrief.github.io/shipped-e2e-tests-sandboxed</id><content type="html" xml:base="https://debrief.github.io/shipped-e2e-tests-sandboxed"><![CDATA[<h2 id="what-we-built">What We Built</h2>

<p>Running Playwright against a full VS Code workbench is straightforward on a developer laptop. Running it inside Claude Code’s sandboxed environment – where CDN downloads return 403, snap packages fail, and multi-process Chromium crashes mid-render – took a week of dead ends before anything worked.</p>

<p>The result is an <code class="language-plaintext highlighter-rouge">ensure-chromium.sh</code> script, a set of Chromium flags, and a switch from code-server to openvscode-server. Together they let us drive VS Code through its command palette, Quick Open dialog, and file navigation from within the sandbox. Four screenshots prove it.</p>

<p><img src="/assets/images/future-debrief/shipped-e2e-tests-sandboxed/02-command-palette.png" alt="VS Code command palette running in Claude Code's sandbox" />
<em>Command palette (F1) responding to keyboard input inside the sandboxed browser. This was the moment we knew the infrastructure worked.</em></p>

<h2 id="four-dead-ends">Four Dead Ends</h2>

<p>Each approach seemed reasonable in isolation. Each failed for a different reason.</p>

<p><strong>1. Standard Playwright install.</strong> <code class="language-plaintext highlighter-rouge">npx playwright install chromium</code> downloads from <code class="language-plaintext highlighter-rouge">cdn.playwright.dev</code>. The sandbox returns <code class="language-plaintext highlighter-rouge">403 Forbidden - host_not_allowed</code>. Every Playwright CDN mirror we tried hit the same firewall. Non-starter.</p>

<p><strong>2. @sparticuz/chromium.</strong> This npm package bundles a minimal Chromium binary designed for AWS Lambda. It extracts to <code class="language-plaintext highlighter-rouge">/tmp/chromium</code> and works for simple pages – we had it rendering HTML and running DOM tests within an hour. But the VS Code workbench is not a simple page. The minimal build crashed consistently when rendering VS Code’s complex DOM. The workbench never got past the initial paint.</p>

<p><strong>3. code-server with Playwright.</strong> code-server wraps VS Code as a web application and seemed like the natural host. But its WebSocket authentication depends on <code class="language-plaintext highlighter-rouge">vsda</code>, a proprietary WASM module that isn’t open source. In our environment, the connection handshake failed silently. We spent a day tracing WebSocket frames before finding the dependency.</p>

<p><strong>4. Multi-process Chromium.</strong> Even after solving the browser and server problems, Chromium’s default multi-process architecture caused renderer crashes in the container. Taking screenshots – the thing we needed most for evidence – would kill the renderer process. The workbench would load, we’d call <code class="language-plaintext highlighter-rouge">page.screenshot()</code>, and the browser would crash.</p>

<h2 id="what-actually-worked">What Actually Worked</h2>

<p><strong>GitHub Release browser hosting.</strong> Instead of fighting CDN restrictions, we uploaded a full Chromium build (matching Playwright’s expected version) as a GitHub Release asset under the tag <code class="language-plaintext highlighter-rouge">playwright-browsers-v1</code>. The <code class="language-plaintext highlighter-rouge">ensure-chromium.sh</code> script tries the standard Playwright install first. When that fails, it downloads from the GH release, places the binary where Playwright expects it, and writes a <code class="language-plaintext highlighter-rouge">.chromium-path</code> file that <code class="language-plaintext highlighter-rouge">playwright.config.ts</code> picks up. The script is idempotent – run it twice, it skips the download.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Resolution order in ensure-chromium.sh:</span>
<span class="c"># 1. Already installed? → done</span>
<span class="c"># 2. npx playwright install chromium → try CDN</span>
<span class="c"># 3. CDN blocked? → download from GH release</span>
</code></pre></div></div>

<p><strong>openvscode-server.</strong> Gitpod’s open-source VS Code server, without the vsda dependency. The global setup script checks for it first, falls back to code-server if needed. No authentication tokens, no proprietary modules. It just starts and serves the workbench.</p>

<p><strong>Single-process Chromium.</strong> The flags <code class="language-plaintext highlighter-rouge">--single-process --no-zygote --disable-software-rasterizer</code> collapse Chromium’s process tree into one process. This prevents the renderer crashes that plagued multi-process mode in the container. The trade-off is that a crash in any component takes down the whole browser, but for testing that’s acceptable – a crash is a test failure either way.</p>

<div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// playwright.config.ts -- sandboxed launch options</span>
<span class="nx">args</span><span class="p">:</span> <span class="p">[</span>
  <span class="dl">'</span><span class="s1">--no-sandbox</span><span class="dl">'</span><span class="p">,</span>
  <span class="dl">'</span><span class="s1">--disable-setuid-sandbox</span><span class="dl">'</span><span class="p">,</span>
  <span class="dl">'</span><span class="s1">--disable-gpu</span><span class="dl">'</span><span class="p">,</span>
  <span class="dl">'</span><span class="s1">--disable-dev-shm-usage</span><span class="dl">'</span><span class="p">,</span>
  <span class="dl">'</span><span class="s1">--disable-software-rasterizer</span><span class="dl">'</span><span class="p">,</span>
  <span class="dl">'</span><span class="s1">--single-process</span><span class="dl">'</span><span class="p">,</span>
  <span class="dl">'</span><span class="s1">--no-zygote</span><span class="dl">'</span><span class="p">,</span>
<span class="p">]</span>
</code></pre></div></div>

<p><strong>Welcome tab workaround.</strong> VS Code’s Getting Started tab renders inside an iframe that captures keyboard focus. The command palette (Ctrl+Shift+P) and Quick Open (Ctrl+P) won’t respond because keystrokes go to the iframe instead of the main window. The fix is two-part: machine-level settings to disable the Welcome tab (<code class="language-plaintext highlighter-rouge">workbench.startupEditor: none</code>), and a <code class="language-plaintext highlighter-rouge">Ctrl+W</code> keystroke on load to close it if it appears anyway, followed by clicking the title bar to return focus to the main window.</p>

<h2 id="evidence">Evidence</h2>

<p>Four screenshots taken during a test run inside the sandbox, each proving a different layer works.</p>

<p><img src="/assets/images/future-debrief/shipped-e2e-tests-sandboxed/01-workbench-loaded.png" alt="Workbench loaded" />
<em>The full VS Code workbench rendered inside headless Chromium. Activity bar, editor area, status bar all present.</em></p>

<p><img src="/assets/images/future-debrief/shipped-e2e-tests-sandboxed/02-command-palette.png" alt="Command palette" />
<em>F1 opens the command palette and it responds to typed input. This requires keyboard focus to be on the main window, not trapped in an iframe.</em></p>

<p><img src="/assets/images/future-debrief/shipped-e2e-tests-sandboxed/03-quick-open-search.png" alt="Quick Open search" />
<em>Ctrl+P opens Quick Open with file search suggestions. The workbench’s keyboard shortcut handling is fully functional.</em></p>

<p><img src="/assets/images/future-debrief/shipped-e2e-tests-sandboxed/04-file-opened.png" alt="File search" />
<em>Typing a filename into Quick Open. The search executes against the workspace. File navigation works end-to-end.</em></p>

<h2 id="what-we-learned">What We Learned</h2>

<p><strong>The research note saved days.</strong> Early in the project we documented every Playwright installation approach we tried in <code class="language-plaintext highlighter-rouge">docs/project_notes/playwright-installation-research.md</code>. That note ruled out three dead ends immediately when we circled back to E2E testing. Writing down what doesn’t work is as valuable as writing down what does.</p>

<p><strong>@sparticuz/chromium is for simple pages.</strong> It’s optimized for Lambda functions that render PDFs or take screenshots of single-page apps. VS Code’s workbench – with its nested iframes, service workers, and complex layout engine – overwhelms the minimal build. The right tool for the wrong job.</p>

<p><strong>Single-process mode contradicts most advice.</strong> Chromium documentation and StackOverflow answers consistently warn against <code class="language-plaintext highlighter-rouge">--single-process</code>. For production browsers, they’re right. For headless testing in containers, it’s the only configuration that doesn’t crash. Context matters more than best practices.</p>

<p><strong>The Welcome tab is a focus trap.</strong> This cost half a day. Everything looked correct – the workbench loaded, the browser was stable, screenshots worked – but keyboard shortcuts did nothing. No error messages, no visible problem. The Getting Started tab’s iframe was silently eating every keystroke.</p>

<h2 id="whats-next">What’s Next</h2>

<p>The infrastructure is ready for real test content. When specs 043 (file loading) and 001 (tool execution) ship their TypeScript implementations, we can write tests that exercise the full analyst workflow: open a REP file, see tracks on the map, run analysis, check results. The page object models (<code class="language-plaintext highlighter-rouge">CodeServerPage</code> for VS Code chrome, <code class="language-plaintext highlighter-rouge">DebriefWebview</code> for Debrief components) are waiting.</p>

<p>More immediately, any feature branch can now run <code class="language-plaintext highlighter-rouge">bash tests/e2e/scripts/ensure-chromium.sh</code> and have a working Playwright environment in seconds, even inside Claude Code.</p>

<blockquote>
  <p><a href="https://github.com/debrief/debrief-future/tree/claude/speckit-specify-005-zJrC6/tests/e2e">See the infrastructure code</a>
<a href="https://github.com/debrief/debrief-future/tree/claude/speckit-specify-005-zJrC6/specs/005-e2e-workflow-tests">View the spec</a></p>
</blockquote>]]></content><author><name>Ian</name></author><category term="tracer-bullet" /><category term="testing" /><category term="e2e" /><category term="playwright" /><category term="infrastructure" /><category term="sandbox" /><summary type="html"><![CDATA[Playwright driving a full VS Code workbench inside Claude Code's sandbox, after four dead ends.]]></summary></entry><entry><title type="html">Shipped: STAC Catalog Overview Panel</title><link href="https://debrief.github.io/shipped-stac-catalog-overview-panel" rel="alternate" type="text/html" title="Shipped: STAC Catalog Overview Panel" /><published>2026-01-30T00:00:00+00:00</published><updated>2026-01-30T00:00:00+00:00</updated><id>https://debrief.github.io/shipped-stac-catalog-overview-panel</id><content type="html" xml:base="https://debrief.github.io/shipped-stac-catalog-overview-panel"><![CDATA[<h2 id="what-we-built">What We Built</h2>

<p>The catalog overview panel turns a STAC directory into a navigable spatial-temporal index. Double-click a catalog node in the STAC Stores tree view, and VS Code opens a read-only panel showing two synchronized views: a Leaflet map with bounding box rectangles for every item, and an SVG timeline with horizontal bars representing temporal spans.</p>

<p><img src="/assets/images/future-debrief/stac-catalog-overview-panel/catalog-view.png" alt="Catalog Overview panel showing bounding boxes on a Leaflet map with a timeline below" /></p>

<p>Double-clicking an item opens the full plot view with tracks, layers, and time controls:</p>

<p><img src="/assets/images/future-debrief/stac-catalog-overview-panel/plot-view.png" alt="Plot view opened from catalog overview showing vessel tracks on the map" /></p>

<p>The component lives in <code class="language-plaintext highlighter-rouge">shared/components/</code> as a reusable React element, tested in Storybook with 11 stories covering nominal cases and edge conditions (empty catalogs, missing bbox, missing temporal metadata). The panel integrates into VS Code as a <code class="language-plaintext highlighter-rouge">WebviewPanel</code>, using the same message-passing pattern as the existing plot view.</p>

<h2 id="how-it-works">How It Works</h2>

<p>The architecture keeps concerns separated: the React component handles all rendering and layout logic, receiving catalog data as props. The VS Code extension provides a thin wrapper that manages the panel lifecycle, retrieves item metadata via <code class="language-plaintext highlighter-rouge">stacService.listItems()</code>, and bridges the webview’s double-click messages back to the “open plot” flow.</p>

<p>The map uses the same Leaflet instance configuration as the plot view, so it feels familiar. The timeline is hand-rolled SVG—rectangles for time ranges, circles for single-point temporal markers, and graceful fallback labels when metadata is absent.</p>

<p>A horizontal drag bar lets users resize the split between map and timeline. The ratio persists across sessions via VS Code’s <code class="language-plaintext highlighter-rouge">Memento</code> API, so analysts can save their preferred proportion.</p>

<h2 id="decisions-and-trade-offs">Decisions and Trade-offs</h2>

<p><strong>Shared component, not a custom editor provider.</strong> STAC catalogs are directories, not files. <code class="language-plaintext highlighter-rouge">CustomReadonlyEditorProvider</code> requires backing documents and would have forced a virtual URI workaround. <code class="language-plaintext highlighter-rouge">WebviewPanel</code> is clearer about representing multi-file structures.</p>

<p><strong>Lightweight metadata loading.</strong> Rather than reading full GeoJSON assets, we extract just <code class="language-plaintext highlighter-rouge">bbox</code>, <code class="language-plaintext highlighter-rouge">start_datetime</code>, <code class="language-plaintext highlighter-rouge">end_datetime</code>, and <code class="language-plaintext highlighter-rouge">datetime</code> from each <code class="language-plaintext highlighter-rouge">item.json</code>. This keeps the panel responsive even with hundreds of items. The trade-off is that we don’t show asset previews—the overview is intentionally aggregate, not detailed.</p>

<p><strong>Overlapping timeline bars by default.</strong> Items with overlapping temporal ranges are rendered in a single row. This is compact and shows temporal density clearly, but can occlude labels. We considered auto-stacking (separate rows per item) but found it consumed too much vertical space for typical catalogs. The Storybook knobs let us test both approaches for different data distributions.</p>

<p><strong>Offline rendering as default.</strong> Bounding box rectangles and timeline bars render without map tiles. If you’re disconnected, you still get the structure. The tile layer is additive—it loads asynchronously if available, but doesn’t block rendering.</p>

<h2 id="what-surprised-us">What Surprised Us</h2>

<p>Testing revealed that many items in the wild have incomplete metadata. A plot might have a complete <code class="language-plaintext highlighter-rouge">bbox</code> but no <code class="language-plaintext highlighter-rouge">start_datetime</code>, or vice versa. Rather than hide incomplete items or throw errors, we show them with a “no time data” label on the timeline or omit them from the map view. Items remain clickable regardless.</p>

<p>The Storybook coverage—11 stories with different fixture configurations—caught edge cases that manual testing would have missed. One story tests an empty catalog (should show a friendly message, not crash). Another tests a single item (should still auto-fit the map, not zoom to maximum). These seem obvious in retrospect but required explicit fixtures to verify.</p>

<h2 id="tests-and-acceptance">Tests and Acceptance</h2>

<p>13 unit tests cover metadata extraction from <code class="language-plaintext highlighter-rouge">item.json</code> files, timeline bar positioning logic, and edge cases like missing fields and empty collections. The Storybook stories serve as both documentation and interaction tests—reviewers can spin up the component in isolation and verify visual behavior without spinning up the full extension.</p>

<p>Acceptance criteria all pass: double-click opens the panel, map shows extent, timeline shows ranges, drag bar resizes, item navigation works, styling adapts to VS Code theme, and missing metadata is handled gracefully.</p>

<h2 id="whats-next">What’s Next</h2>

<p>The overview panel becomes a launchpad for catalog-level operations. Future work could add filtering (show only items from a date range or geographic region), statistics (count items, total coverage area, temporal span), or export capabilities (save the current view as a GeoJSON collection). The shared component architecture means improvements benefit both Storybook and VS Code simultaneously.</p>

<p>The panel is also a foundation for the aggregate analysis track. Imagine comparing two catalogs’ spatial distributions side-by-side, or querying across 100 exercises to find patterns no single analysis could reveal. That’s further down the roadmap, but the catalog overview is where that vision becomes tangible.</p>

<p>→ <a href="https://github.com/debrief/debrief-future/blob/claude/speckit-start-041-GPj21/shared/components/src/CatalogOverview">See the code</a>
→ <a href="https://debrief.github.io/debrief-future/storybook/?path=/story/catalogoverview--default">Storybook stories</a></p>]]></content><author><name>Ian</name></author><category term="stac" /><category term="vscode-extension" /><category term="visualization" /><category term="shared-components" /><summary type="html"><![CDATA[Double-click a STAC catalog to see every item's spatial bounds and temporal range on one screen.]]></summary></entry><entry><title type="html">Shipped: Tool Results Architecture</title><link href="https://debrief.github.io/tool-results-architecture" rel="alternate" type="text/html" title="Shipped: Tool Results Architecture" /><published>2026-01-30T00:00:00+00:00</published><updated>2026-01-30T00:00:00+00:00</updated><id>https://debrief.github.io/tool-results-architecture</id><content type="html" xml:base="https://debrief.github.io/tool-results-architecture"><![CDATA[<h2 id="what-we-built">What We Built</h2>

<p>When a calculation tool smooths a track or computes a closest point of approach, something needs to classify that result, persist it to the STAC catalog, and update the display. We’ve now built the machinery that connects these pieces.</p>

<p>The system introduces four top-level result types: mutation (modified features), addition (new features), deletion (removed features), and artifact (reports, images, datasets). Every tool response is an MCP-compliant content array containing one or more items, each classified into one of these types and carrying three required annotations: the result type path, source feature IDs, and a human-readable label.</p>

<p>On the storage side, debrief-stac exposes four atomic operations: update features, add features, delete features, and store artifact. The service has no knowledge of result types. The orchestrator (frontend or LLM) interprets each content item’s type and calls the appropriate operation. After each operation, a diff utility compares the old and new FeatureCollections and the display updates incrementally.</p>

<h2 id="how-it-works">How It Works</h2>

<p>Here’s a Python tool returning a smoothed track:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">debrief_calc.result_builder</span> <span class="kn">import</span> <span class="n">build_mutation</span><span class="p">,</span> <span class="n">build_response</span>

<span class="n">smoothed</span> <span class="o">=</span> <span class="p">{</span><span class="s">"type"</span><span class="p">:</span> <span class="s">"Feature"</span><span class="p">,</span> <span class="s">"id"</span><span class="p">:</span> <span class="s">"track_a"</span><span class="p">,</span> <span class="s">"geometry"</span><span class="p">:</span> <span class="p">{...},</span> <span class="s">"properties"</span><span class="p">:</span> <span class="p">{...}}</span>
<span class="n">mutation_items</span> <span class="o">=</span> <span class="n">build_mutation</span><span class="p">(</span>
    <span class="n">features</span><span class="o">=</span><span class="p">[</span><span class="n">smoothed</span><span class="p">],</span>
    <span class="n">result_subtype</span><span class="o">=</span><span class="s">"track/smoothed"</span><span class="p">,</span>
    <span class="n">source_feature_ids</span><span class="o">=</span><span class="p">[</span><span class="s">"track_a"</span><span class="p">],</span>
    <span class="n">label</span><span class="o">=</span><span class="s">"Smoothed Track A"</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">build_response</span><span class="p">(</span><span class="n">mutation_items</span><span class="p">)</span>
</code></pre></div></div>

<p>The orchestrator receives the response, sees the <code class="language-plaintext highlighter-rouge">mutation</code> type, and calls:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">debrief_stac.features</span> <span class="kn">import</span> <span class="n">update_features</span>
<span class="kn">from</span> <span class="nn">debrief_stac.provenance</span> <span class="kn">import</span> <span class="n">write_provenance</span>

<span class="n">write_provenance</span><span class="p">(</span><span class="n">smoothed</span><span class="p">,</span> <span class="s">"track-smoother"</span><span class="p">,</span> <span class="s">"1.0.0"</span><span class="p">,</span> <span class="p">[</span><span class="s">"track_a"</span><span class="p">])</span>
<span class="n">count</span> <span class="o">=</span> <span class="n">update_features</span><span class="p">(</span><span class="s">"/data/catalog"</span><span class="p">,</span> <span class="s">"plot_001"</span><span class="p">,</span> <span class="p">[</span><span class="n">smoothed</span><span class="p">])</span>
</code></pre></div></div>

<p>For multi-result responses, the content array is processed sequentially. A tool that trims outliers might return two items: a deletion for the removed contacts and an artifact with the analysis report. The orchestrator calls delete_features, then store_artifact, diffing and updating the display after each.</p>

<h2 id="result-type-hierarchy">Result Type Hierarchy</h2>

<p>Result types use slash-delimited paths like <code class="language-plaintext highlighter-rouge">artifact/report/ssa_assessment</code>. The four top-level types are fixed and schema-validated. Below that, organisations can introduce sub-types without registration.</p>

<p>A contrib-aware viewer might recognise the full path and open a specialised report viewer. The generic Debrief UI matches <code class="language-plaintext highlighter-rouge">artifact/report</code> and shows a standard report preview. An LLM matches just <code class="language-plaintext highlighter-rouge">artifact</code> and reports “The tool produced a report artifact.” Each consumer degrades to the deepest match it understands.</p>

<p>TypeScript provides utilities for this:</p>

<div class="language-typescript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">import</span> <span class="p">{</span> <span class="nx">matchesResultType</span><span class="p">,</span> <span class="nx">getTopLevelType</span> <span class="p">}</span> <span class="k">from</span> <span class="dl">"</span><span class="s2">@debrief/diff</span><span class="dl">"</span><span class="p">;</span>

<span class="nx">matchesResultType</span><span class="p">(</span><span class="dl">"</span><span class="s2">artifact/report/ssa_assessment</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">artifact</span><span class="dl">"</span><span class="p">);</span>         <span class="c1">// true</span>
<span class="nx">matchesResultType</span><span class="p">(</span><span class="dl">"</span><span class="s2">artifact/report/ssa_assessment</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">artifact/report</span><span class="dl">"</span><span class="p">);</span>   <span class="c1">// true</span>
<span class="nx">getTopLevelType</span><span class="p">(</span><span class="dl">"</span><span class="s2">artifact/report/ssa_assessment</span><span class="dl">"</span><span class="p">);</span>                        <span class="c1">// "artifact"</span>
</code></pre></div></div>

<h2 id="lessons-learned">Lessons Learned</h2>

<p>The separation of concerns took a few iterations to settle. Initially, I considered embedding result type interpretation inside debrief-stac. That would have made the persistence service brittle and coupled it to frontend concerns. Moving all type awareness into the orchestrator keeps debrief-stac simple: it receives features, writes them, returns updated FeatureCollections.</p>

<p>Multi-result responses turned out to be more common than I expected. A single tool invocation might remove outliers, update the remaining track, and produce a diagnostic plot. Returning these as separate content items, processed sequentially, is cleaner than trying to bundle them into a single compound result.</p>

<p>The diff utility in TypeScript was straightforward but essential. After each atomic STAC operation, the frontend needs to know what changed without re-rendering the entire plot. The utility compares feature IDs and geometries, returning three sets: added, removed, modified. 24 tests confirm it handles edge cases like identical collections, disjoint collections, and partial overlaps.</p>

<h2 id="test-coverage">Test Coverage</h2>

<p>88 tests passing across Python and TypeScript:</p>

<ul>
  <li>41 tests in debrief-calc (result types, builders, MCP responses)</li>
  <li>23 tests in debrief-stac (provenance, artifacts, feature updates/deletions)</li>
  <li>24 tests in @debrief/diff (FeatureCollection diffing, type matching)</li>
</ul>

<p>The test suite covers all four result types, multi-result responses, hierarchical type matching, atomic STAC operations with provenance, and diff utility correctness.</p>

<h2 id="whats-next">What’s Next</h2>

<p>This architecture supports the workflow where a calculation tool produces results, the orchestrator persists them, and the display updates. The next step is wiring a real calculation tool (track smoothing or CPA analysis) end-to-end through this flow in the VS Code extension.</p>

<p>The hierarchical type system is designed for contrib extensions, but we haven’t tested it with a real organisation-specific sub-type yet. That will be valuable validation once we have contrib partners.</p>

<p>→ <a href="https://github.com/debrief/debrief-future/blob/main/specs/041-document-tool-results/spec.md">See the spec</a>
→ <a href="https://github.com/debrief/debrief-future/pull/136">View the PR</a></p>]]></content><author><name>Ian</name></author><category term="tracer-bullet" /><category term="tool-results" /><category term="mcp" /><category term="stac" /><summary type="html"><![CDATA[Typed result system connecting calculation tools to storage, with 88 tests passing across Python and TypeScript.]]></summary></entry><entry><title type="html">Shipped: TimeController Now Drives Map Track Rendering</title><link href="https://debrief.github.io/timecontroller-temporal-track-rendering" rel="alternate" type="text/html" title="Shipped: TimeController Now Drives Map Track Rendering" /><published>2026-01-29T00:00:00+00:00</published><updated>2026-01-29T00:00:00+00:00</updated><id>https://debrief.github.io/timecontroller-temporal-track-rendering</id><content type="html" xml:base="https://debrief.github.io/timecontroller-temporal-track-rendering"><![CDATA[<p>The time slider in Debrief’s VS Code sidebar now controls what you see on the map. Scrub to any moment and every track updates instantly — either showing a position marker (full mode) or growing as a snail-trail from its start point.</p>

<h2 id="what-we-built">What We Built</h2>

<p>This fix completes the temporal interaction pipeline that was half-wired. The TimeController UI already existed, the TemporalTrackLayer rendering logic already existed, and the message pipeline between them already existed — but the final receiver (the map webview) had a TODO stub that ignored incoming time updates.</p>

<p>We added:</p>

<ul>
  <li><strong>Temporal rendering in TrackRenderer</strong> — the vanilla JS Leaflet map now responds to <code class="language-plaintext highlighter-rouge">setCurrentTime</code> and <code class="language-plaintext highlighter-rouge">setDisplayMode</code> messages</li>
  <li><strong>Binary search algorithms</strong> — ported from the shared React components into a standalone <code class="language-plaintext highlighter-rouge">temporalUtils.ts</code> module with 15 unit tests</li>
  <li><strong>Highlight markers</strong> — <code class="language-plaintext highlighter-rouge">L.circleMarker</code> per track in full mode, efficiently repositioned on each frame</li>
  <li><strong>DisplayMode forwarding</strong> — the <code class="language-plaintext highlighter-rouge">setDisplayMode</code> message type was added to the extension protocol, and MapPanel now forwards mode changes from SessionStore</li>
</ul>

<h2 id="how-it-works">How It Works</h2>

<p>The pipeline is now complete end-to-end:</p>

<ol>
  <li>User scrubs the TimeController slider</li>
  <li>The webview sends a time change message to the extension host</li>
  <li>SessionStore captures the new time</li>
  <li>MapPanel’s temporal subscription fires, forwarding <code class="language-plaintext highlighter-rouge">setCurrentTime</code> and <code class="language-plaintext highlighter-rouge">setDisplayMode</code> to the map webview</li>
  <li>TrackRenderer performs a binary search to find the nearest track point, then updates the polyline coordinates and marker position</li>
</ol>

<p>All timestamp parsing (ISO to epoch) happens once on track load. The binary search is O(log n) per track per frame. Leaflet’s <code class="language-plaintext highlighter-rouge">setLatLngs()</code> handles efficient DOM updates.</p>

<h2 id="lessons-learned">Lessons Learned</h2>

<p><strong>Porting beats importing</strong> — Rather than pulling in the shared React components (which would have required React in the vanilla JS webview), we copied the two pure functions (60 lines) and unit-tested them independently. Simple, no dependency chain, easy to verify.</p>

<p><strong>The message pipeline was the easy part</strong> — The infrastructure from Feature #029 (session state integration) meant the wiring was already 80% done. The real work was in TrackRenderer: managing cached timestamps, highlight markers, display mode state, and coordinate updates without flicker.</p>

<h2 id="whats-next">What’s Next</h2>

<ul>
  <li><strong>#030</strong>: Add replay mode and time acceleration to the temporal state schema</li>
  <li><strong>#026</strong>: Add annotation shape renderers to the VS Code extension</li>
  <li><strong>#038</strong>: Context-sensitive tool offering integration</li>
</ul>]]></content><author><name>Ian</name></author><category term="vscode" /><category term="temporal" /><category term="leaflet" /><category term="track-rendering" /><summary type="html"><![CDATA[The time slider in Debrief's VS Code sidebar now controls what you see on the map. Scrub to any moment and every track updates instantly.]]></summary></entry><entry><title type="html">Shipped: Context-Sensitive Tool Offering in VS Code</title><link href="https://debrief.github.io/context-sensitive-tool-offering" rel="alternate" type="text/html" title="Shipped: Context-Sensitive Tool Offering in VS Code" /><published>2026-01-27T00:00:00+00:00</published><updated>2026-01-27T00:00:00+00:00</updated><id>https://debrief.github.io/context-sensitive-tool-offering</id><content type="html" xml:base="https://debrief.github.io/context-sensitive-tool-offering"><![CDATA[<h2 id="what-we-built">What We Built</h2>

<p>The VS Code extension now shows you which analysis tools work with your current selection. Select two tracks on the map, get tools that operate on two tracks. Select one track and a reference point, different tools appear. Right-click for quick access, use Command Palette for keyboard workflows, or browse the sidebar panel.</p>

<p>ToolMatchAdapter bridges session-state’s selection to the matching service we built in #027. The adapter converts feature IDs to kind counts — session-state tracks which features are selected, the adapter looks up what kinds they are (TRACK, POINT, CIRCLE, etc), and ToolMatchService evaluates tool requirements against those counts.</p>

<p>Tools panel shows active tools by default. Enable “show inactive tools” and you see explanations: “Range &amp; Bearing (inactive): Need 2 TRACK, have 1”. Click an active tool, it executes via debrief-calc, results appear on the map with full provenance metadata.</p>

<h2 id="how-it-connects">How It Connects</h2>

<p>Three pieces came together:</p>

<p><strong>ToolMatchService</strong> (#027) — the matching logic that evaluates tools against selections. We built and tested this in Storybook weeks ago. Now it’s integrated.</p>

<p><strong>SessionManager</strong> (#029) — centralised selection state. When you select features in the map panel, session-state broadcasts that change. The tool adapter subscribes and updates immediately.</p>

<p><strong>CalcService</strong> — the bridge to debrief-calc’s MCP server. Caches tool metadata with a 60-second TTL. Executes tools, creates result layers with provenance, persists to STAC.</p>

<p>The ToolMatchAdapter converts between these contexts. Session-state deals in feature IDs. ToolMatchService deals in kind counts. The adapter has a callback to look up feature kinds from the active collection and does the conversion in-memory.</p>

<h2 id="three-access-points">Three Access Points</h2>

<p><strong>Sidebar Tools Panel</strong>: Browse all available tools, see descriptions, enable the inactive toggle to understand requirements. Click a tool to execute.</p>

<p><strong>Context Menu</strong>: Right-click selected features, see applicable tools in a “Tools” submenu. Fastest path for repetitive workflows.</p>

<p><strong>Command Palette</strong>: Type “Debrief:” and see tool commands. VS Code’s <code class="language-plaintext highlighter-rouge">when</code> clauses hide inapplicable tools automatically. Keyboard-driven workflows work.</p>

<p>All three surfaces share the same underlying state. Selection changes propagate through session-state, adapter recomputes matches, UI surfaces update.</p>

<h2 id="provenance-on-every-result">Provenance on Every Result</h2>

<p>Tool execution produces result layers with inline provenance metadata:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Feature"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"properties"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"provenance"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
      </span><span class="nl">"toolName"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Range &amp; Bearing"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"toolVersion"</span><span class="p">:</span><span class="w"> </span><span class="s2">"1.0.0"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"executionTime"</span><span class="p">:</span><span class="w"> </span><span class="s2">"2026-01-27T23:15:42.123Z"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"sourceFeatureIds"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"track-hms-defender"</span><span class="p">,</span><span class="w"> </span><span class="s2">"track-uss-freedom"</span><span class="p">],</span><span class="w">
      </span><span class="nl">"duration"</span><span class="p">:</span><span class="w"> </span><span class="mi">523</span><span class="w">
    </span><span class="p">}</span><span class="w">
  </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>Every computed result traces back to its inputs. No separate provenance documents to maintain. Query the feature, get its lineage.</p>

<h2 id="testing-approach">Testing Approach</h2>

<p>237 tests pass across 15 test files. The new adapter tests (14 tests) verify selection conversion, kind lookups, and fallback handling when features aren’t found. CalcService tests (8 tests) verify tool execution lifecycle and result layer creation.</p>

<p>Extended test data in <code class="language-plaintext highlighter-rouge">apps/vscode/test-data/local-store/</code> to include all supported feature kinds: tracks, circles, rectangles, lines, vectors, and point types. The adapter needs variety to verify kind grouping works correctly.</p>

<h2 id="lessons-learned">Lessons Learned</h2>

<p>The adapter pattern worked well. Session-state stays generic (just feature IDs), the adapter adds Debrief-specific knowledge (feature kinds). Testing the adapter in isolation was straightforward because it’s pure conversion logic with a callback for kind lookup.</p>

<p>VS Code’s Command Palette doesn’t support disabled commands with explanations. You can only show or hide commands via <code class="language-plaintext highlighter-rouge">when</code> clauses. That’s why inactive tools live in the sidebar panel only — the panel can show grayed items with text. Command Palette just hides inapplicable tools.</p>

<p>Inline provenance was simpler than linking to separate documents. The Constitution requires provenance; we chose the simplest implementation that satisfies it. Future work might add queryable provenance relationships, but for now, results carry their own metadata.</p>

<p>CalcService caching (60-second TTL) prevents re-fetching tool metadata on every selection change. Tool inventories don’t change frequently. If debrief-calc restarts or tools are added, waiting 60 seconds for refresh is acceptable.</p>

<h2 id="whats-next">What’s Next</h2>

<p>This completes the tool offering integration. The matching service (Phase 0-2 in #027) is now wired into VS Code (Phase 3). Analysts can discover tools, execute them, and trace results.</p>

<p>Future enhancements might include tool search/filtering, tool categories, or parameter UI for tools that accept user inputs. This iteration covers parameterless tools operating on selection only.</p>

<p>→ <a href="https://github.com/debrief/debrief-future/pull/125">See the feature PR</a></p>]]></content><author><name>Ian</name></author><category term="tracer-bullet" /><category term="vscode-extension" /><category term="tool-matching" /><category term="analysis-tools" /><summary type="html"><![CDATA[Analysis tools now appear based on selection in VS Code, with three access points and provenance on every result.]]></summary></entry><entry><title type="html">Shipped: Epic Support for Large Feature Breakdown</title><link href="https://debrief.github.io/epic-workflow-support" rel="alternate" type="text/html" title="Shipped: Epic Support for Large Feature Breakdown" /><published>2026-01-23T00:00:00+00:00</published><updated>2026-01-23T00:00:00+00:00</updated><id>https://debrief.github.io/epic-workflow-support</id><content type="html" xml:base="https://debrief.github.io/epic-workflow-support"><![CDATA[<p>We’ve shipped the <code class="language-plaintext highlighter-rouge">/epic</code> command—AI-assisted breakdown of large features into deliverable backlog items with full traceability.</p>

<h2 id="what-we-built">What We Built</h2>

<p>Large features like Storyboarding don’t fit single backlog items. The <code class="language-plaintext highlighter-rouge">/epic</code> command uses Opus as both Business Analyst and Technical Architect to decompose them into 3-10 independently valuable items.</p>

<h3 id="the-workflow">The Workflow</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/epic docs/feature-spec.md
    ↓
Parse input (text, local path, or GitHub URL)
    ↓
Opus analysis (BA + Architect roles)
    ↓
Generate 3-10 items with:
  - [Ex] prefix for traceability
  - Category, complexity, dependencies
    ↓
Update BACKLOG.md (Epics table + Items table)
    ↓
Create GitHub issues (with offline fallback)
</code></pre></div></div>

<h3 id="key-features">Key Features</h3>

<p><strong>Three input modes</strong>: Feed it a text description, local markdown file, or GitHub URL. The command fetches and parses whatever you provide.</p>

<p><strong>Opus-powered analysis</strong>: The breakdown follows clear principles—vertical slices over horizontal layers, infrastructure first if it unblocks, research spikes early to reduce uncertainty.</p>

<p><strong>Full traceability</strong>: Items get an <code class="language-plaintext highlighter-rouge">[E01]</code> prefix linking back to their parent epic. The Epics table tracks status and child items.</p>

<p><strong>Offline support</strong>: Core breakdown works without network. GitHub issues are created when <code class="language-plaintext highlighter-rouge">gh</code> is available, with fallback to local files.</p>

<h2 id="example-output">Example Output</h2>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gu">## Epic Created: E01 - Storyboarding Briefings</span>

<span class="gu">### Breakdown (7 items)</span>

| ID | Category | Title | Complexity |
|----|----------|-------|------------|
| 024 | Infrastructure | [E01] Add storyboard schema | Low |
| 025 | Feature | [E01] Create storyboard panel | Medium |
| 026 | Feature | [E01] Implement scene capture | Medium |
...

<span class="gu">### Next Steps</span>
<span class="p">1.</span> Score items with backlog-prioritizer
<span class="p">2.</span> Approve with the-ideas-guy
<span class="p">3.</span> Start with /speckit.start {ID}
</code></pre></div></div>

<h2 id="technical-details">Technical Details</h2>

<p>The command is implemented as a Claude Code skill at <code class="language-plaintext highlighter-rouge">.claude/commands/epic.md</code>. It’s a prompt-based workflow that orchestrates:</p>

<ul>
  <li>Input parsing and document fetching</li>
  <li>Opus model analysis with structured guidelines</li>
  <li>BACKLOG.md table manipulation</li>
  <li>GitHub issue creation via <code class="language-plaintext highlighter-rouge">gh</code> CLI</li>
</ul>

<p>No new code dependencies—just workflow orchestration using existing tools.</p>

<h2 id="whats-next">What’s Next</h2>

<p>The first epic to use this workflow will be Storyboarding Briefings—a feature complex enough to validate the decomposition approach. Stay tuned for that breakdown.</p>

<hr />

<p><em>Epic workflow support is live. Try <code class="language-plaintext highlighter-rouge">/epic</code> with your next large feature.</em></p>]]></content><author><name>Ian</name></author><category term="ai" /><category term="workflow" /><category term="agile" /><summary type="html"><![CDATA[AI-assisted breakdown of large features into deliverable backlog items with full traceability.]]></summary></entry><entry><title type="html">Shipped: Focused Analysis Environment</title><link href="https://debrief.github.io/focused-analysis-environment" rel="alternate" type="text/html" title="Shipped: Focused Analysis Environment" /><published>2026-01-23T00:00:00+00:00</published><updated>2026-01-23T00:00:00+00:00</updated><id>https://debrief.github.io/focused-analysis-environment</id><content type="html" xml:base="https://debrief.github.io/focused-analysis-environment"><![CDATA[<h2 id="what-we-built">What We Built</h2>

<p>The Debrief VS Code extension now automatically hides non-essential activity bar items when it activates. Instead of seeing Explorer, Search, Source Control, Debug, Extensions, and Testing, analysts see just two activities: <strong>Explorer</strong> (for browsing STAC stores) and <strong>Debrief</strong> (for analysis tools).</p>

<p>Five distractions removed. Zero functionality lost. The hidden activities still work if you need them — right-click the activity bar to restore any of them.</p>

<h2 id="how-it-works">How It Works</h2>

<p>On first activation, the extension modifies VS Code’s <code class="language-plaintext highlighter-rouge">workbench.activity.pinnedViewlets2</code> setting to hide:</p>
<ul>
  <li>Search</li>
  <li>Source Control</li>
  <li>Run and Debug</li>
  <li>Extensions</li>
  <li>Testing</li>
</ul>

<p>The hiding is <strong>reversible</strong> and <strong>respects user choice</strong>:</p>
<ul>
  <li>Right-click activity bar → Show hidden activities</li>
  <li>Command Palette → <code class="language-plaintext highlighter-rouge">Debrief: Restore Default Activities</code></li>
  <li>Settings → <code class="language-plaintext highlighter-rouge">debrief.hideActivities.enabled: false</code></li>
</ul>

<p>If you re-enable an activity manually, it stays visible on subsequent launches. We track your choices and don’t override them.</p>

<h2 id="technical-details">Technical Details</h2>

<p>The implementation adds an <code class="language-plaintext highlighter-rouge">ActivityBarService</code> that:</p>
<ol>
  <li>Checks if hiding is enabled (default: yes)</li>
  <li>Checks if this is the first run</li>
  <li>Modifies visibility for target activities only</li>
  <li>Stores a snapshot to detect user overrides later</li>
</ol>

<p><strong>76 tests pass</strong>, including tests for:</p>
<ul>
  <li>First-run hiding behavior</li>
  <li>Protected views (Explorer and Debrief never hidden)</li>
  <li>User override detection</li>
  <li>Restore command functionality</li>
</ul>

<p>All operations are local. No network calls. Works completely offline.</p>

<h2 id="configuration">Configuration</h2>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"debrief.hideActivities.enabled"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span><span class="w">
  </span><span class="nl">"debrief.hideActivities.viewIds"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
    </span><span class="s2">"workbench.view.search"</span><span class="p">,</span><span class="w">
    </span><span class="s2">"workbench.view.scm"</span><span class="p">,</span><span class="w">
    </span><span class="s2">"workbench.view.debug"</span><span class="p">,</span><span class="w">
    </span><span class="s2">"workbench.view.extensions"</span><span class="p">,</span><span class="w">
    </span><span class="s2">"workbench.view.testing"</span><span class="w">
  </span><span class="p">]</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>Advanced users can customize which activities get hidden.</p>

<h2 id="whats-next">What’s Next</h2>

<p>This is the first of several UX improvements for the analysis environment. Next up: workspace configuration and panel layouts that make better use of screen real estate for map-centric workflows.</p>

<blockquote>
  <table>
    <tbody>
      <tr>
        <td><a href="https://github.com/debrief/debrief-future/pull/74">View the PR</a></td>
        <td><a href="https://github.com/debrief/debrief-future/blob/main/specs/017-vscode-hide-activities/spec.md">Read the spec</a></td>
      </tr>
    </tbody>
  </table>
</blockquote>]]></content><author><name>Ian</name></author><category term="vscode-extension" /><category term="ux" /><category term="tracer-bullet" /><summary type="html"><![CDATA[VS Code activity bar now shows only what matters for maritime analysis]]></summary></entry></feed>