Castles in the sandbox

Published February 9, 2026

clocksmith: 88% | AI: 12%

The browser is the most ubiquitous runtime in history. It is also a sandbox by default. With WebGPU, the browser is now a portable accelerator runtime, which is exactly why deterministic boundaries matter when building castles.

But a sandbox is not a boundary on its own. The browser restricts capabilities, but it does not enforce our invariants. We need constraints, and a deterministic checker between intent and side effects, whether the intent is human or machine.

Thesis

For any workflow that a sandbox can contain, SaaS is shimware today and vaporware tomorrow. "Proofware" is now: formally verified boundaries with traceable rejection.

Pythons don't prefer sand

You can build castles in the sky and win the pitch: infinite backend, infinite trust, infinite rent. Or you can build them in the sandbox, where the castles are cheap to rebuild and the constraints outlast every one of them.

You lean into formal methods and formal verification. You want the rules to be part of the product, not a comment beside the code. Proof assistants like Lean 4 are one way to do that.

Maybe you run the tests and everything is green. Maybe you run them ten times because they are flaky. A proof closes the gap by guaranteeing a property holds for all inputs, not just the ones you or your agent decided to check.

Interface vs boundary

LLMs are great at inference, which makes them great at interface. They will decode your intent into something that looks right, and usually it is. An energy-based reasoning model (EBRM) takes that something and optimizes it until the checker agrees.

interface (LLM):   intent     -> candidate
reasoning (EBRM):  candidate  -> low-energy state
checker (proof):   state      -> accept | reject(trace)

The 4-color problem asks whether you can color every region on a map using at most four colors so that no two adjacent regions share the same color. Finding a coloring is hard (NP-hard, in fact!); verifying one is easy. This toy sandbox is an odd wheel: a hub plus a 15-region ring. The hub touches every ring region, which forces all four colors into play. The left panel generates a full candidate and hopes the checker accepts. The right panel scores the whole state and repairs conflicts until it converges.

Compare token-by-token decoding vs state scoring on a tight 4-color map. Run both and watch boundary rejections spike for the autoregressive path while the energy-based path repairs conflicts until it converges.

Boundary rule

If we treat model output as untrusted input, then every side effect flows through a deterministic checker that can accept or reject with a reason trail.

What the verifier sees

LLM inference output: accept or reject structured claims.
Graphing high fidelity data: verify the data before it hits the canvas.
Monitoring a P2P network: parsers that cannot lie and filters that cannot silently drop.

Shimware

Shimware ships and it works, but it wraps capabilities without moving the trust boundary. Most software falls somewhere on the spectrum from vaporware to proofware.

Term	Value	What it is
Vaporware	zero	announced and delayed into irrelevance
Vapidware (Nothingware)	negative	ships, but mostly filler
Shimware	minuscule	workflow wrapper; trust boundary unchanged
Proofware	maximum	formally verified boundaries; constraints enforced by a proof

A lot of modern tooling is shimware: glossy UI on top of remote execution. SaaS dashboards, collaborative design tools, agentic app makers. If the core is probabilistic, shimware just moves the uncertainty around.

Agentic coding tools differentiate like wine labels: enormous discourse about which is better, strong brand loyalty, and most users cannot tell them apart in a blind test.

SaaS is not dead as a billing model, but as an architecture for anything that can be generated and verified locally, it is shimware yesterday and vapidware today.

"Proof-by-definition": SaaS is shimware

Toy proof: once you pin down what you mean by shimware, you can say which layer you are shipping.

namespace Shimware

inductive Product where
  | saas
  | cli
  | localTool
  | boundary

def StrengthensInvariants : Product -> Prop
  | Product.boundary => True
  | _ => False

def IsShimware (p : Product) : Prop :=
  Not (StrengthensInvariants p)

theorem saas_is_shimware : IsShimware Product.saas := by
  simp [IsShimware, StrengthensInvariants]

end Shimware

The proof is trivial by design. If the definition says SaaS does not strengthen invariants, there is nothing left to prove. Either we start building at the boundary, or we keep iterating on the recursive excretion of pure vibe-slop.

The stack

If the browser is the sandbox and proofware is the product, the stack can use as few as three languages:

JavaScript: orchestration, UI, storage, networking, and audit logs. The interface layer lives here.
WGSL: accelerator kernels. Local inference, scoring, and constraint evaluation get fast.
Lean: boundaries. We write constraints as code and produce proofs and checkers that can reject unsafe actions.

LLMs should dominate interface and EBRMs should rise wherever constraint satisfaction matters. Build the castles out of affiliate links and free trials if you want, as long as the sandbox is formally verified.