When Your Own Framework Calls You Out

CivicBrief is a tool we built to solve a straightforward problem: city council agendas are published as PDFs that most residents never read. Not because they don't care, but because a 47-page packet full of petition numbers, legal references, and procedural language isn't designed for someone checking in between school pickup and dinner. CivicBrief takes those PDFs, extracts each agenda item, and generates plain-language briefings in the format and language a resident actually wants. Written summary, Q&A, audio, visual. Sixty-plus languages, with all the caveats about translation quality that implies.

The engineering works. Upload a PDF, get structured briefings, cache them so the next person gets instant results. We were proud of it.

Then we ran it through our own Governance Assessment tool.

The assessment uses CivicWork's Verifiability Framework and Trust Stack to evaluate municipal AI use cases. We built it for cities evaluating vendor proposals or internal pilots. We didn't build it to evaluate ourselves. But the use case fit, so we typed in the description and hit submit.

The result was uncomfortable in the best possible way.

The framework classified it as an edge case

Not fully automatable, not purely augmentable. The PDF extraction, formatting, and caching? Automatable. The “plain-language interpretation layer”? That's where it gets complicated. The framework's own language: “involves editorial judgment about what to emphasize, how to simplify legal language, and whether context is preserved across languages.”

That's accurate. When CivicBrief turns a rezoning petition into a two-paragraph summary, someone (or something) decided what to include and what to leave out. When it generates that summary in Spanish, it made choices about how to convey context that doesn't translate literally. Those aren't bugs. They're editorial decisions baked into the prompts we wrote.

The Trust Stack went further

Process Trust and Outcome Trust were tractable. We can audit data practices and measure accuracy. But the assessment flagged Layer 3, Representation Trust, as the critical gap:

“Did Spanish-speaking and Polish-speaking residents have input into what ‘useful’ briefings look like for their communities, or did English-speaking developers decide?”

We hadn't asked. We built the tool, chose the output formats, wrote the prompts, and assumed that “plain language” meant the same thing across communities. That's a classic Layer 3 failure: solving a Representation Trust problem with Layer 1 and 2 tools.

Layer 4, Sovereignty Trust, was equally pointed. CivicBrief is a third-party tool. If it becomes the primary way residents understand council business, the city has effectively outsourced a core democratic communication function to something it doesn't govern, can't modify, and can't guarantee the continuity of. The assessment's summary: “The biggest risk isn't that the tool is bad. It's that it's good enough to become essential while remaining outside the city's control.”

This is why we build open source. Not as a branding decision, but because the assessment is right. A tool that sits between government and residents in the democratic communication chain should be forkable, hostable, and governable by the city that uses it. That's not a feature. It's a requirement.

What we're doing about it

We're running a two-week accuracy audit, comparing AI-generated briefings against actual agenda content across multiple meeting cycles. We're having a native Spanish speaker evaluate the Spanish outputs (starting with my wife, who has already agreed to be brutally honest). We're documenting the prompt decisions we made and why, so that when a city asks “who decided what plain language means for our residents?” we have an answer that's better than “we did.”

We're also leaving the tool imperfect. Not because we're lazy, but because the imperfection is the point. CivicBrief demonstrates what AI can do for municipal transparency. The governance assessment demonstrates what's still missing. Together, they're a more honest pitch than a polished product with no self-awareness.

The Verifiability Framework asks: can we verify this? The Trust Stack asks: should we trust this, and at what layer? We built both frameworks for municipalities evaluating AI. We didn't expect them to be most useful when pointed at ourselves.