INVARIANT-FIRST ORCHESTRATION

Membrane

Reliable structured extraction on large documents without context decay or surprise token bills.

Split documents into isolated chunks, run the same analysis in parallel with early validation and pre-flight forecasting, then reduce the results — through a drop-in OpenAI-compatible gateway.

Free for local development forever • $29/mo flat for production

Open the Model Agnostic Playground Get Founding License ($490 lifetime)

$docker run -d -p 8000:8000 membraneapi/gateway

Or run python server.py locally

AGNOSTIC EVALUATION

See the architectural difference on real documents.

Left column = direct call to your chosen model with the full document in one context. Right columns = the same task through Membrane's chunked parallel swarm + reduction layer (you supply the key for the underlying model).

Compare structure extraction accuracy, processing latency, and token costs in real time.

Model Agnostic Playground: Membrane vs. Direct LLM

Compare a standard sequential LLM request against Membrane's parallel swarm routing. Upload up to 100 pages and test your keys.

LLM API CONFIGURATION

GPT (OpenAI) API Key

Shared with Docs + Console via your session. Use the same key for real swarm calls.

Select Model to Test

Choose your target model for the Direct LLM run. Membrane automatically coordinates its own model-agnostic swarm routing profile (membrane-engagement-layer).

Swarm Execution Mode

Canary Sentinel mode processes Chunk 0 first to protect against malformed prompts/refusals before fanning out in parallel.

Verifiable Cost rates:

Input: $0.150/1MOutput: $0.600/1M

Standard RetailRates

SELECT DOCUMENT (MAX 100 PAGES)

MEMBRANE ENTERPRISE SERVICES MASTER AGREEMENT Document Reference: MSA-2026-0524 Effective Date: May 24, 2026 SECTION 1: LICENSE GRANT & USAGE RIGHTS Licensor grants Licensee a non-transferable, non-exclusive, revocable license to access the Membrane Ingestion Layer. Usage is strictly bound to local developer evaluations unless commercial licensing is declared. SECTION 2: SERVICE LEVEL AGREEMENTS (SLA) Licensor guarantees a 99.9% uptime for cloud-hosted routing endpoints. Any downtime extending past 4 consecutive hours will trigger a credit offset of 5% of monthly billing. SECTION 5: INDEMNIFICATION CLAUSES 5.1 Licensee Indemnity: Licensee agrees to defend, indemnify, and hold harmless Licensor against any claims, losses, or damages arising from illegal data payloads passed through the proxy network. 5.2 Licensor Indemnity: Licensor shall indemnify Licensee against intellectual property infringement claims brought by third parties, provided Licensee gives immediate written notice. SECTION 9: GENERAL CAP ON LIABILITY IN NO EVENT SHALL EITHER PARTY'S AGGREGATE LIABILITY ARISING OUT OF OR RELATED TO THIS AGREEMENT, WHETHER IN CONTRACT, TORT, OR UNDER ANY OTHER THEORY OF LIABILITY, EXCEED THE TOTAL FEES PAID BY LICENSEE IN THE TWELVE (12) MONTHS PRECEDING THE CLAIM, UP TO A MAXIMUM OF $50,000. SECTION 10: SPECIAL INDEMNITY EXCLUSIONS (BURIED EXCLUSION) EXCLUSION 10.1: Notwithstanding Section 5.2, Licensor shall NOT indemnify Licensee for any IP claims if the infringement arises from modifications made to the Membrane open-source core by Licensee's developer team. EXCLUSION 10.2: Licensee is fully liable for any downstream LLM token consumption bills incurred due to recursive loop conditions triggered in custom agent routing configurations. SECTION 14: TERMINATION & WIND-DOWN Either party may terminate this agreement upon 30 days written notice. Upon termination, Licensee must delete all cached tokens and local Docker volumes containing proprietary proxy code.

🔒

Privacy & Global Semantic Cache NoteUploaded info is hashed and cached in the global semantic cache to accelerate future lookups for all users. No raw documents, plain text, or sensitive private data is ever retained in a searchable form or exposed to anyone else. All lookups are strictly anonymized, cryptographically secure, and run within a local execution sandbox.

Extraction Task Prompt

Awaiting Comparison Run

1. STRAIGHT LLMDirect

DELIVERED EXTRACTED CONTENT

Awaiting sequence run...

Latency

0.0s

Cost

$0.0000

2. MEMBRANE (COLD)Swarms

DELIVERED EXTRACTED CONTENT

Awaiting sequence run...

Latency

0.0s

Cost

$0.00000

Under the Hood: Membrane is a Multi-Agent Execution Layer, Not Just a Cache.

When you query a target model (like gpt-4o-mini) via Membrane's drop-in API proxy, the gateway automatically coordinates a distributed execution sequence:

STEP 01

Parallel Swarm Ingestion

Splits documents into isolated parallel chunks and coordinates concurrent extraction threads. Bypasses model context limits entirely to eliminate attention degradation and prompt noise.

STEP 02

Schema Invariant Gating (Canaries)

Enforces code-AST and strict JSON conformance at early checks. Fails fast, halting the swarm execution immediately to prevent wasteful upstream token charges.

STEP 03

Model-Agnostic Routing & Caching

Routes model-agnostically with automatic fallback and checks L1 semantic history to drop input costs to $0.00 on cache hits.

CORE MECHANISMS

The Three Levers

This solves the exact painful thing you hit on long documents or bulk structured work.

Pre-flight Planning

GET /v1/swarm/plan

Get an explicit forecast of estimated tokens, cost, latency, recommended concurrency, and risk score before you spend anything.

Early Rejection Modes

early_gatecanary

Bad jobs fail fast and cheap. early_gate validates structure with zero model calls. canary runs only the first chunk and aborts on failure.

Isolation + Schema Controls

POST /v1/swarm/state

Each chunk runs independently. Extraction criteria are validated. Agent-generated code can be sandbox-compiled and signed before you trust it.

DROP-IN REPLACEMENT

Drop-in Compatibility

Your existing OpenAI SDK code works. The gateway adds planning, isolation, early rejection, and caching on top.

from openai import OpenAI

client = OpenAI(

base_url="http://localhost:8000/v1", # or your production gateway

api_key="your_key_here"

)

# Then use it exactly as before

response = client.chat.completions.create(

model="membrane-engagement-layer",

messages=[{"role": "user", "content": "Extract liabilities from my contract."}]

)

Important Context Preservation Note

By default, conversational routes isolate system directives and prune middle messages to prevent agent drift. To keep full conversational history, simply pass the X-Membrane-Preserve-Context: true header.

PRIVACY & COMPLIANCE

Self-Host & Control

Full data stays in your environment. No external logging of prompts unless you configure it.

Docker Run Command

docker run -p 8000:8000 membraneapi/gateway

Essential for teams processing proprietary, compliance-heavy, or regulated documents.

Pricing (Open Core)

Membrane is completely free for personal use, experimentation, and development. If you are using Membrane for commercial production work, a paid license is required.

Developer Sandbox

For local development & sandbox testing

Free

$0 / forever

✓ Full Swarm Map-Reduce access
✓ Dynamic Model Routing
✓ L1 Semantic Caching
✓ Any custom key works locally

Read Developer Docs

Production

Commercial Production

For cloud deployments

$29/mo

$29 / month flat

Get 20% off: $278.40 billed annually

✓ Unrestricted cloud usage
✓ Pure honor-based model
✓ Commercial use authorization
✓ Direct team value reporting

Activate License on Polar.sh

Founding

Founding License

Only first 75 buyers

Lifetime

$490 / one-time

Lifetime commercial license

✓ Permanent production authorization
✓ No monthly subscription fees
✓ Priority founder support channel
✓ Limited to first 75 developers

Purchase Founding License

What counts as Commercial Production?

**Commercial Production** is defined as any deployment of Membrane on public cloud infrastructure (such as AWS, Render, GCP, Fly.io, Vercel) that powers an active application, API, or service outside of a developer's local machine (`localhost`) or private personal network.

This is a trust and honor-based model. We do not enforce hard blocks, key truncation, or usage caps in your cloud environments—the software runs fully uninhibited to ensure maximum production stability. We prioritize developer trust and expect production users to subscribe to support our work.

Case Studies

Real applications

Membrane is used in production on document-heavy, high-stakes extraction workloads.

Contract analysis — Contract Pulseuses Membrane's parallel swarm extraction to surface hidden risks and special clauses across long legal PDFs with strong isolation and early validation.
Policy & government documents — PennerAI runs Membrane swarms over state audits, city council minutes, and regulatory files to extract structured signals that direct LLM calls frequently miss.

More examples and technical detail are in the docs.

LIMITATIONS

Honest Limitations

We want you to trust Membrane. Here is what it is NOT built for:

Repetitive Structured Extraction Focus: Membrane is strongest on repetitive structured extraction across many similar items (contracts, logs, transcripts, policy docs).
Less magic on dynamic chat: It is less magic on highly dynamic, open-ended conversational agents.
Model Costs Still Apply:You still pay for the underlying model calls — we just try to make fewer of them wasteful.