Contents

Web · WebGPU · WebRTC Consensus Economics Author: Cody Boyd · Light Studios · cody@lightstudios.io

Abstract

We present BIM, a browser‑native, tokenized inference network that executes machine‑learning models on contributor devices using WebGPU with WASM/CPU fallback, connected via WebRTC. BIM introduces (i) a subnet abstraction that binds a specific model build to policy, pricing, and verification rules; (ii) a tolerance‑aware consensus scheme that validates outputs under hardware‑ and browser‑induced nondeterminism; and (iii) a lightweight reputation and staking mechanism resistant to Sybil and collusion attacks. We describe the control plane (registry), contributor and user clients, job contracts, routing, and economics (40/40/20 splits). We outline an evaluation plan and discuss privacy, licensing, and ethical trade‑offs. A minimal viable prototype (embeddings subnet) demonstrates feasibility with deterministic verification and low overhead.

1. Introduction #

Server‑side inference for large models strains compute supply and centralizes control. Simultaneously, modern browsers provide access to heterogeneous, globally distributed compute via WebGPU and WebRTC. We propose BIM, a practical design for a zero‑install in‑browser inference mesh where contributors supply compute by visiting a web app, and users pay tokenized credits for verified inference results. BIM targets models that can be reduced or adapted to browser‑friendly artifacts (quantized LLMs, embeddings, small vision models, diffusion variants), while being agnostic to model family.

1.1 Problem #

Distributed compute markets face three recurring challenges: (1) verification of results on untrusted, heterogeneous nodes; (2) allocation that is fair, efficient, and collusion‑resistant; and (3) practical distribution of model weights and kernels to resource‑constrained devices. BIM addresses all three in the browser context.

1.2 Contributions #

  • Subnet abstraction. Each model build (weights checksum + runtime/kernels) is a subnet with its own tolerances, pricing, license, and policies.
  • Tolerance‑aware consensus. Outputs are verified under non‑bit‑exact conditions using task‑appropriate similarity metrics and K‑of‑N adjudication (default 2‑of‑2 with 3rd‑node tie‑break).
  • Diversity‑aware routing. Weighted rendezvous hashing with diversity constraints (ASN/IP/OS/GPU families) mitigates Sybil and collusion.
  • Browser‑native distribution. CDN + Service Worker deliver content‑addressed model chunks to IndexedDB; WebGPU acceleration with WASM/CPU fallback.
  • Economics and governance. Credits ledger with 40/40/20 payout, soft strikes, time‑boxed bans, and small escrow for slashing on fraud.

2. System Overview #

Control plane (Registry). Maintains: subnets, contributor presence, node capabilities and scores, reputation, job lifecycle, settlements. Exposes websockets/webtransport for control and signaling; integrates STUN/TURN for RTC bootstrapping.

Contributor app. A web client that: (i) downloads and caches model artifacts; (ii) runs a registry‑issued benchmark suite; (iii) advertises availability; (iv) executes jobs in WebGPU/WASM workers; (v) returns signed results and metrics.

User SDK. Submits jobs, streams partial outputs, receives proofs (node IDs, hashes, verification path), and settles credits.

Connectivity. Signaling via WS/WTP; data via WebRTC (SCTP datachannels). Weights always come from CDN; only prompts/results traverse P2P to limit TURN expense.

Figure 1 (conceptual). Registry (control plane) orchestrates users and contributors per subnet; jobs dispatch to two nodes; verification occurs at registry; consensus triggers settlement.

3. Subnets & Model Packaging #

A subnet specifies:

  • model_id, model_checksum, runtime_id (e.g., ORT‑Web/WebGPU, WebLLM build), quantization, tokenizer.
  • Policies: determinism flags (temperature=0, top_p=1; fixed seed for diffusion), tolerances per task.
  • Economics: price per token/step/call; payout split; escrow requirement.
  • Licensing & EULA: model license terms; contributor acceptance records.

Packaging & caching. Artifacts are chunked and content‑addressed (SHA‑256). A Service Worker manages downloads, SRI verification, and stores chunks in IndexedDB (with eviction policy). Pages are cross‑origin isolated to unlock SharedArrayBuffer and WASM threads.

4. Benchmarking & Scoring #

On join, contributors run short, ephemeral benchmarks:

  • LLM: tokens/s on fixed prompts; tail latency (p95), stability across runs.
  • Diffusion: steps/s and time‑to‑first‑image with fixed CFG and scheduler.
  • Embeddings: QPS over fixed batch vectors.

Scores are the median across runs and combined into a node metric:

score = ws · speed + wl · (1/latencyp50) + wu · uptime + wf · (1/fail_rate)

The registry periodically re‑benchmarks (e.g., every N jobs or 24h) and decays stale measurements.

5. Job Contract & Lifecycle #

Job contract (immutable for a job):

  • Model + runtime hashes, determinism flags; tolerances; input hash; max tokens/steps; latency SLO.
  • Node attestations (env fingerprint, benchmark level) and per‑session ephemeral keys.

Lifecycle: (1) user submits; (2) registry selects two nodes via weighted rendezvous hashing with diversity constraints; (3) both execute; (4) registry verifies; (5) on mismatch, a third node adjudicates majority; (6) settle and update reputation.

Streaming. For chat LLMs, streams are tentative until consensus; clients may show provisional text with a small marker (replaced on finalize) to keep UX responsive.

6. Verification Under Nondeterminism #

Bit‑exact equality is unreliable across browsers/GPUs. BIM enforces deterministic decoding and compares results with task‑specific tolerances:

  • Text (LLM): exact token equality preferred; otherwise normalized edit distance ≤ n and identical stop conditions.
  • Embeddings: cosine similarity ≥ τe with fixed tokenizer and normalization.
  • Vision: SSIM ≥ τv or perceptual hash distance ≤ d with fixed schedulers and seeds.

Adjudication. Default 2‑of‑2 acceptance; on mismatch, dispatch a third node (2‑of‑3 majority). Periodic sentinel re‑runs by registry ensure long‑term drift does not accumulate.

Security note. Perceptual checks are robust to small numeric drift but can be adversarially manipulated; BIM uses hidden honeypot tasks and reputation to counter targeted evasion.

7. Routing, Diversity & Sybil Resistance #

Selection. Weighted rendezvous hashing over eligible nodes using w = score × reputation. Apply diversity constraints (different ASNs/IP blocks/OS/GPU families) to limit collusion.

Stake & slashing. A small escrow accumulates from earnings and is slashed on proven fraud (failed honeypots, repeated minority in adjudications). Reputation grows slowly, decays on inactivity, and penalizes recent mismatches.

8. Economics & Policy #

Credits. An off‑chain ledger authorizes jobs; on‑chain settlement is optional and batched. Default split is 40/40/20 (node A/node B/registry); when a tie‑breaker is needed, the majority pair share earnings while the minority receives zero for that job.

Costs. TURN and CDN usage are tracked; surcharges may apply per job. Subnets can set prices independently to reflect model size and expected compute cost.

9. Privacy, Safety, and Licensing #

Data visibility. Contributors see prompts and outputs. Users opt‑in per subnet; sensitive workloads may use trusted subnets operated by the registry with stricter policies and higher registry cuts.

Licensing. Subnets encode model licenses; contributors must accept per‑model terms. The registry enforces incompatible licenses (e.g., disallowing redistribution or paid hosting).

Safety. Honeypot tasks; content policy enforcement (e.g., refusal rules) embedded in job contracts for safety‑critical models.

10. Implementation Sketch #

  • Registry: Postgres + Redis; WS/WT for control; job state machine (queued → dispatched → verifying → settled). Metrics via Prometheus‑compatible exports.
  • Contributor: PWA with Service Worker; cross‑origin isolation; workers for compute; model adapters for ORT‑Web/WebLLM; secure ephemeral keys for signing results.
  • User SDK: TypeScript client with retries, streaming, and receipt verification.
Figure 2 (sequence). Submit → match (A,B) → execute → verify (match? yes: settle | no: add C) → finalize → payout.

11. Evaluation Plan #

Workloads. (i) LLM‑embed (e.g., 384‑d vectors), (ii) LLM‑chat (7–8B quantized), (iii) image diffusion small model.

Metrics. Tokens/s, p50/p95 latency, verification success rate, false reject/accept rates under tolerance, TURN/CDN overhead per job, cache hit rates, earnings distribution, node churn resilience, audit coverage.

Methodology. Planet‑scale browser fleet via crowd testers; controlled lab with heterogeneous GPUs/OS; fault injection (adversarial nodes, colluding pairs, packet loss). Ablations on tolerance thresholds and diversity constraints.

Hypotheses. (H1) Tolerance‑aware consensus achieves ≥99.5% agreement without material false accepts; (H2) diversity constraints reduce collusion acceptance probability by >10× at modest cost; (H3) CDN‑first weights + P2P data keeps bandwidth costs sub‑linear in jobs.

12. Limitations & Future Work #

  • Model size & memory. Very large models remain challenging; progressive layer streaming and LoRA adapters can help.
  • Mobile throttling. Background tabs and thermal limits reduce stability; scheduling can down‑rank mobile.
  • Verification leakage. Honeypots must be rotated and concealed; long‑term adversaries may learn heuristics.
  • Formal proofs. Explore zk/optimistic verification for selective high‑value tasks; extend to verifiable randomness and TEE attestation where available.

13. Ethical Considerations #

Energy usage, opt‑in transparency for contributors, clear data‑handling policies for users, and license compliance are preconditions for deployment. BIM’s open marketplace risks misuse without policy enforcement; subnets may embed safety layers and rate limits.

15. Conclusion #

BIM shows that browser‑native inference marketplaces are practical with today’s web standards. By combining a subnet abstraction, tolerance‑aware consensus, and diversity‑aware routing with pragmatic economics, BIM turns the browser into an instantly deployable, verifiable edge swarm. We hope BIM catalyzes research on open, user‑owned AI infrastructure.


Appendix A — Job Contract (JSON Schema, sketch) #

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "BIMJobContract",
  "type": "object",
  "properties": {
    "job_id": {"type": "string"},
    "model_id": {"type": "string"},
    "model_checksum": {"type": "string"},
    "runtime_id": {"type": "string"},
    "determinism": {"type": "object", "properties": {"temperature": {"type": "number"}, "top_p": {"type": "number"}, "seed": {"type": "integer"}}},
    "tolerances": {"type": "object", "properties": {"edit_distance": {"type": "integer"}, "cosine_min": {"type": "number"}, "ssim_min": {"type": "number"}}},
    "input_hash": {"type": "string"},
    "limits": {"type": "object", "properties": {"max_tokens": {"type": "integer"}, "max_steps": {"type": "integer"}, "latency_ms": {"type": "integer"}}},
    "session_key_pub": {"type": "string"}
  },
  "required": ["job_id", "model_id", "model_checksum", "runtime_id", "determinism", "tolerances", "input_hash", "limits", "session_key_pub"]
}

Appendix B — Node Score & Reputation (sketch) #

base = α · s + β · (1/l) + γ · u + δ · (1/(1 + f))

Reputation R ∈ [0,1] updates per job: R ← (1 − λ)R + λ · r_j,
with r_j = 1 for consensus winner, 0 for minority in tie‑break or failed honeypot.

Selection weight: w = base · R. Diversity constraints filter candidates before hashing.

Appendix C — Pseudocode #

function match(job):
  E ← eligible_nodes(job.subnet) ∩ diversity(E)
  ranked ← rendezvous_hash(E, job.key, weight = node.weight)
  A, B ← top2(ranked)
  return [A, B]

function verify(results):
  if task == TEXT:
    return tokens_equal(r1, r2) or edit_distance(norm(r1), norm(r2)) ≤ n
  if task == EMBED:
    return cosine(r1, r2) ≥ τ_e
  if task == VISION:
    return ssim(r1, r2) ≥ τ_v or phash_dist(r1, r2) ≤ d

References #