Retail & Operations · AI-native copilot for enterprise SOPs & POs

One AI copilot for every Standard Operating Procedure your operations team runs.

OpenLi OpenSOP is an AI-native, 3rd-generation agentic copilot for the mundane-but-mission-critical work that lives in operations SOPs — from Purchase-Order exception triage in merch to vendor on-boarding, returns disposition, capacity reallocation, and audit-trail compilation.

Three interchangeable agent runners — OpenLi Codex (our OpenCodex runner), the Claude Agent SDK, and a dedicated pgvector RAG agent — emit the same structured recommendation, with the same citations, under the same three guardrails (PII scrub, threshold-contradiction detection, low-retrieval-confidence). Pick the runner that fits your cost / latency / autonomy profile, per call.

Open the live demo View on GitHub Talk to the team

v0.1.9 preview Triple-runner architecture pgvector RAG (HNSW cosine) PII + contradiction + low-confidence guardrails ASOS reference deployment

3Interchangeable agent runners on one recommendation contract

3Production guardrails: PII scrub, contradiction, low-retrieval

20/20Playwright E2E tests pass in Docker (~60s)

9/9Evaluation cases pass — 3 PO scenarios × 3 runners

Operations runs on SOPs. AI now runs SOPs alongside your team.

Every operations team — merch buying, supply chain, customer care, vendor management, internal audit — runs on dozens of Standard Operating Procedures. Most exceptions today are routed to a planner who reads the SOP, reconciles it with live data, and writes a decision. OpenSOP collapses that loop into seconds, with full audit trace and the planner firmly in the loop.

Three runners, one contract

OpenLi Codex, Claude Agent SDK, and a dedicated pgvector RAG agent all emit the same structured recommendation JSON. Switch per call. The contract — closed-enum actions, citations, confidence, role — is the platform’s value-add.

Three guardrails, always on

(1) PII scrub at retrieval AND at output — names and emails from your escalation matrix never surface; (2) threshold-contradiction detection forces escalate when SOPs disagree; (3) low-retrieval-confidence forces escalate when the agent can’t find a relevant policy.

Cited, audit-defensible

Every recommendation carries a closed array of citations of the form <filename> §<section>. The agent’s tool calls and retrieval scores are preserved in the trace panel so any planner audit can verify both sides of the decision in seconds.

How an exception gets triaged

Eight clear steps from a Planner’s natural-language question to a cited, role-routed recommendation — with the planner firmly in control of every escalation.

1. Planner asks

Free-text question on the cockpit page or via POST /triage — e.g. “PO-10355 is 18% short on Womenswear retail, what should I do?”

2. Seed retrieval

pgvector HNSW cosine search with top_k=10 over section-anchored SOP chunks. Generous top-k ensures contradicting sections co-occur in context.

3. PII scrub (in)

Every retrieved passage is scrubbed for names and emails BEFORE it reaches the LLM. Closed-set known PII; regex email pattern; deterministic.

4. Agent reasons

The chosen runner reads PO + forecast data via tool calls, consults the SOPs, and emits a structured JSON. 4-turn agent loop typical, capped at 6.

5. Contradiction detected?

When two retrieved sections cite conflicting numeric thresholds AND the PO’s actual variance sits in the disputed gap, action is forced to escalate, confidence to low.

6. Low retrieval?

If the top similarity score falls below the configurable threshold (default 0.20), the agent is refused permission to act on a possibly-irrelevant policy match.

7. Role override

Deterministic lookup of the escalation role from the PO’s value band — not the LLM’s guess. 0–50k → Senior Planner, 50–200k → Head of Buying, 200k+ → Director.

8. Planner decides

Recommendation, rationale, citations, agent trace, retrieval scores — all rendered in the cockpit. The planner accepts, adjusts, or overrides; everything is logged.

Live demo — ASOS Merch PO Exception Triage

The ASOS reference deployment is the canonical “see it for yourself” on OpenSOP. 20 POs across 6 categories × 2 channels, 5 merch SOPs (~6 pages), and 6 personas modelled on a real ASOS merchandising org. Open the URL, pick a persona, click a PO.

Live URL

sop.openli.ai · once nginx + Let’s Encrypt are wired (post-launch), or via the in-browser VM tunnel meanwhile. Recommendations stream in 5–30s on real gpt-4o-mini.

Best persona for the pitch

senior.planner@asos-demo.com / Senior2026! — full read/write on triage, prompts, and workspaces. Cross-channel split decisions on. The walk-through is documented in the ASOS Demo Guide.

Headline moments

PO-10342 → clean amend on gpt-4o-mini in 4 tool-call turns. PO-10355 → escalate with role Director of Merch (no name). PO-10361 → contradiction banner: §2.1 (10%) vs §3.4 (15%) conflict, forced escalate, confidence low.

Audit defensibility & non-functional posture

The same governance posture every OpenLI product inherits from the OpenLI Codex foundation, tuned for the operations + retail buyer.

Audit defensibility

Every recommendation carries citations[], trace.tool_calls[] and trace.seed_retrieval[] with similarity scores. Planner audit trails are reconstructible from logs alone.

Provider neutrality

Same recommendation contract from OpenLi Codex (our OpenCodex runner), the Claude Agent SDK, and the dedicated RAG agent. Per-call runner toggle. No vendor lock-in.

Cost discipline

MOCK_LLM=true deterministic mock for cost-safe demos — the 9/9 eval suite passes in <60s without an API key. Production runs gpt-4o-mini by default.

Tenant isolation

Each customer is their own tenant with its own group set (e.g. ASOS has 6: Buying Ops, Senior Planners, Planners, Merch Ops, CX Liaison, Internal Audit). Per-tenant pgvector partitioning on the v0.2 roadmap.

Test discipline

32 / 32 pytest (22 unit + 10 integration) · 20 / 20 Playwright E2E in Docker (opentax pattern, ~60s) · 9 / 9 evals across 3 runners · CI grep gate on design-token residue.

Operational maturity

10-service Docker Compose on ports 9400–9409. Sequential-build deploy script with /health smoke between every service. Live on AWS shared VM sibling to OpenCT, GSJ, OpenMPI, OpenTrials.

Three generations of agent paradigms — in one product

The reason we ship three runners is that no single agent paradigm is best for every call. OpenSOP lets you pick the right one per cost / latency / autonomy profile, with the same contract and guardrails applied to all three.

1st gen — RAG

Chunk + embed + retrieve. Deterministic, citation-friendly. Can’t plan or execute on its own. Our dedicated pgvector RAG agent represents this paradigm, with structural guardrails layered on top.

2nd gen — LangGraph-era

Hard-coded DAGs over LLM calls. Predictable, but brittle when SOPs evolve. We’ve deliberately moved past this paradigm; mention only for completeness.

3rd gen — autonomous agents

Claude-Code-style file-aware autonomous agents that plan + execute on their own. OpenLi Codex (our OpenCodex runner) and the Claude Agent SDK both represent this paradigm — can reason past a contradiction the deterministic path would refuse.

Sister products in the OpenLI family

OpenSOP shares the same OpenLI Codex foundation, the same agent runtime, the same governance posture and the same audit story as every other OpenLI product. One security review covers the whole portfolio.

OpenLI Codex foundation

The agentic runtime that powers OpenSOP’s triple-runner architecture, OpenMPI’s rationale layer, HIE’s plain-English authoring, and every other OpenLI product. One security review covers all.

OpenTrials

Sister 3rd-gen agentic product, in the Pharma cluster. Risk-based monitoring for clinical trials — same Claude/OpenAI runner pattern, same 21 CFR Part 11 audit-trail discipline.

OpenCT

Sister 3rd-gen product in Finance. UK Corporation Tax filing direct to HMRC. Same governance and multi-runner pattern; explains the platform’s sector neutrality.

OpenMPI

Sister 3rd-gen product in Healthcare. AI-rationale-assisted Master Patient Index — same “AI proposes, human decides” pattern OpenSOP uses for escalations.

GSJ Platform

Sister partner-led reference deployment, in Travel & Experience. Different sector; same foundation, same lifecycle gates, same operational discipline.

OpenLI HIE

The flagship healthcare integration engine. Demonstrates the foundation’s plain-English authoring at the production-system scale.

Where OpenSOP is today — and what’s next.

v0.1.9 is live on AWS at /zhong/opensop/opensop/, sibling to OpenCT, GSJ, OpenMPI and OpenTrials. The ASOS reference deployment is the demo target. Public DNS sop.openli.ai wires up once nginx + TLS land (operator-side).

Live in v0.1.9

Triple-runner architecture · pgvector HNSW cosine RAG · section-anchored chunking · 3 guardrails (PII + contradiction + low-retrieval) · deterministic role override · in-browser FileBrowser for workspace SOPs · design-token system (residue = 0) · 32/32 pytest · 20/20 Playwright in Docker · 9/9 evals.

Coming v0.2

nginx + DNS + Let’s Encrypt for sop.openli.ai · ChatGPT-style streaming UI for the Triage cockpit · per-tenant pgvector partitioning · layered contradiction guardrail upstream of the Codex / Claude paths · continuous-eval harness (LangSmith hooks).

Beyond v0.2

Operations templates beyond merch PO triage — vendor on-boarding, returns disposition, capacity reallocation, audit-trail compilation, compliance-incident triage. Same foundation, sector-tuned SOP corpus per tenant.

RETAIL & OPERATIONS

Ready to see the AI copilot on your SOPs?

OpenSOP is in pre-release; we are prioritising operations-heavy enterprises with mature SOP corpora and an exception-triage pain point. Talk to the team about your SOP set, the systems you already run, and whether OpenSOP fits the next 12–24 months of your operations roadmap.

Open the live demo View on GitHub Book a conversation