Membrane
Reliable structured extraction on large documents without context decay or surprise token bills.
Split documents into isolated chunks, run the same analysis in parallel with early validation and pre-flight forecasting, then reduce the results β through a drop-in OpenAI-compatible gateway.
Free for local development forever β’ $29/mo flat for production
Or run python server.py locally
See the architectural difference on real documents.
Left column = direct call to your chosen model with the full document in one context. Right columns = the same task through Membrane's chunked parallel swarm + reduction layer (you supply the key for the underlying model).
Compare structure extraction accuracy, processing latency, and token costs in real time.
Model Agnostic Playground: Membrane vs. Direct LLM
Compare a standard sequential LLM request against Membrane's parallel swarm routing. Upload up to 100 pages and test your keys.
Shared with Docs + Console via your session. Use the same key for real swarm calls.
Choose your target model for the Direct LLM run. Membrane automatically coordinates its own model-agnostic swarm routing profile (membrane-engagement-layer).
Canary Sentinel mode processes Chunk 0 first to protect against malformed prompts/refusals before fanning out in parallel.
Under the Hood: Membrane is a Multi-Agent Execution Layer, Not Just a Cache.
When you query a target model (like gpt-4o-mini) via Membrane's drop-in API proxy, the gateway automatically coordinates a distributed execution sequence:
Parallel Swarm Ingestion
Splits documents into isolated parallel chunks and coordinates concurrent extraction threads. Bypasses model context limits entirely to eliminate attention degradation and prompt noise.
Schema Invariant Gating (Canaries)
Enforces code-AST and strict JSON conformance at early checks. Fails fast, halting the swarm execution immediately to prevent wasteful upstream token charges.
Model-Agnostic Routing & Caching
Routes model-agnostically with automatic fallback and checks L1 semantic history to drop input costs to $0.00 on cache hits.
The Three Levers
This solves the exact painful thing you hit on long documents or bulk structured work.
Pre-flight Planning
GET /v1/swarm/planGet an explicit forecast of estimated tokens, cost, latency, recommended concurrency, and risk score before you spend anything.
Early Rejection Modes
early_gatecanaryBad jobs fail fast and cheap. early_gate validates structure with zero model calls. canary runs only the first chunk and aborts on failure.
Isolation + Schema Controls
POST /v1/swarm/stateEach chunk runs independently. Extraction criteria are validated. Agent-generated code can be sandbox-compiled and signed before you trust it.
Drop-in Compatibility
Your existing OpenAI SDK code works. The gateway adds planning, isolation, early rejection, and caching on top.
Important Context Preservation Note
By default, conversational routes isolate system directives and prune middle messages to prevent agent drift. To keep full conversational history, simply pass the X-Membrane-Preserve-Context: true header.
Self-Host & Control
Full data stays in your environment. No external logging of prompts unless you configure it.
Essential for teams processing proprietary, compliance-heavy, or regulated documents.
Pricing (Open Core)
Membrane is completely free for personal use, experimentation, and development. If you are using Membrane for commercial production work, a paid license is required.
Developer Sandbox
For local development & sandbox testing
- β Full Swarm Map-Reduce access
- β Dynamic Model Routing
- β L1 Semantic Caching
- β Any custom key works locally
Commercial Production
For cloud deployments
Get 20% off: $278.40 billed annually
- β Unrestricted cloud usage
- β Pure honor-based model
- β Commercial use authorization
- β Direct team value reporting
Founding License
Only first 75 buyers
Lifetime commercial license
- β Permanent production authorization
- β No monthly subscription fees
- β Priority founder support channel
- β Limited to first 75 developers
What counts as Commercial Production?
**Commercial Production** is defined as any deployment of Membrane on public cloud infrastructure (such as AWS, Render, GCP, Fly.io, Vercel) that powers an active application, API, or service outside of a developer's local machine (`localhost`) or private personal network.
This is a trust and honor-based model. We do not enforce hard blocks, key truncation, or usage caps in your cloud environmentsβthe software runs fully uninhibited to ensure maximum production stability. We prioritize developer trust and expect production users to subscribe to support our work.
Real applications
Membrane is used in production on document-heavy, high-stakes extraction workloads.
- Contract analysis β Contract Pulseuses Membrane's parallel swarm extraction to surface hidden risks and special clauses across long legal PDFs with strong isolation and early validation.
- Policy & government documents β PennerAI runs Membrane swarms over state audits, city council minutes, and regulatory files to extract structured signals that direct LLM calls frequently miss.
More examples and technical detail are in the docs.
Honest Limitations
We want you to trust Membrane. Here is what it is NOT built for:
- Repetitive Structured Extraction Focus: Membrane is strongest on repetitive structured extraction across many similar items (contracts, logs, transcripts, policy docs).
- Less magic on dynamic chat: It is less magic on highly dynamic, open-ended conversational agents.
- Model Costs Still Apply:You still pay for the underlying model calls β we just try to make fewer of them wasteful.