AtlasRAG

Production Memory Platform

AtlasRAG turns retrieval into durable, governed memory for AI agents.

AtlasRAG combines a high-speed C++ vector store with a secure Node gateway so teams can ingest knowledge, retrieve context, and continuously improve memory quality in production without building custom control loops.

This gives product teams a stable memory foundation for customer assistants, internal copilots, and workflow agents, with strong tenant boundaries, operational controls, and measurable runtime behavior.

Full stack placement

C++ vector core Node gateway APIs Tenant + collection isolation AMV-L lifecycle automation

What AtlasRAG solves for client programs

Answer drift over time

Durable memory and recall controls keep agent outputs grounded in approved, reusable context.

Governance and isolation pressure

Tenant boundaries, visibility modes, ACLs, and role checks make data access explicit and auditable.

Cost growth from unmanaged memory

AMV-L lifecycle policies and TTL sweeps help bound memory growth and reduce unnecessary token use.

Opaque production behavior

Gateway telemetry, metrics, and job status endpoints create observability for operators and stakeholders.

Unified ingest, search, and answer APIs

AtlasRAG provides one operational surface for ingestion, retrieval, and answer generation, so implementation teams can ship memory-backed experiences without stitching multiple services together.

/v1/docs and /v1/docs/url for indexing.
/v1/search and /v1/ask for retrieval and answer workflows.
Requests are scoped by tenant and collection to prevent cross-environment leakage.
Idempotency keys on write operations reduce duplicate ingestion risk in distributed clients.

Durable memory model, not just transient context

Beyond chunks, AtlasRAG stores typed memory objects so agents can preserve and reuse knowledge with explicit meaning, provenance, and lifecycle controls.

Six memory types: artifact, semantic, procedural, episodic, conversation, and summary.
/v1/memory/write, /v1/memory/recall, and /v1/memory/reflect.
Memory scoring signals from /v1/feedback and /v1/memory/event.
Async reflection jobs generate derived knowledge while preserving links to source artifacts.

Enterprise governance and access control

Security and governance are built into the request path and data model, so client deployments can enforce policy without custom wrappers.

JWT and service-token auth with role-based endpoint controls.
Visibility modes: tenant, private, and acl.
SSO support for Google, Azure, and Okta with tenant-level auth mode controls.
Principal-aware recall constraints protect private and ACL-scoped memory access.

AMV-L keeps memory quality high over time

AtlasRAG applies Adaptive Memory Value + Lifecycle (AMV-L) to evaluate memory utility continuously and automate retention decisions based on measured behavior.

Lifecycle actions promote, compact, retain, or delete by policy thresholds.
TTL sweep remains the hard retention boundary for expired memory.
Async job processing keeps reflection and lifecycle operations non-blocking.
Value decay and redundancy sweeps reduce stale or duplicate memories in long-running deployments.

Operationally ready for developer teams

Integration and operations are designed for real delivery teams, from pilots through production scale-up.

OpenAPI schema, Swagger UI, and official Node SDK for integration.
Idempotency support for write/index/reflect endpoints and idempotent job reruns.
Metrics, telemetry, and structured request logs for production visibility.
Admin endpoints support service tokens, usage reporting, and tenant policy management.

Runtime architecture

Gateway API Layer

Handles auth, tenancy, ACL policy, idempotency, ingestion, retrieval, ask, and memory endpoints.

Persistent Data Plane

C++ TCP vector index stores embeddings while Postgres stores chunk text, memory items, links, and jobs.

Lifecycle & Telemetry Plane

Scheduled jobs run TTL, AMV-L lifecycle tasks, and telemetry snapshots for ongoing optimization.

Playground

Ingest content, then search or ask with the same context — all in one place.

1Ingest

2Search

3Ask

Ingest

Paste text, upload a file, or index a link.

Collection

Collection is your content bucket/namespace. Use default if you do not need separation. Valid names use only letters, numbers, ., -, and _ (example: team_docs, v2.alpha). Invalid: spaces or characters like /, #, ?.

Doc ID

Use letters, numbers, dot, dash, or underscore (no spaces).

Indexing pipeline

We split the text into chunks, embed each chunk, store vectors in the C++ TCP service, and store chunk text in Postgres for previews and citations.

Document Text

You can also load a local file or index directly from a URL.

Upload file (text, PDF, Word .docx)

Text files are read directly. PDF and .docx are extracted in your browser into Document Text.

Index from URL

If a URL is provided, the server fetches and indexes it.

Raw response (debug)

(no output)

Search

Semantic search across your indexed documents.

Query

Top K

Higher K returns more chunks (slower + more context).

Collection scope

Scope retrieval to one collection, or keep All collections.

Raw response (debug)

(no output)

Ask

RAG answer grounded in your indexed sources.

Question

Top K sources

We retrieve top-K chunks, then ask the model to answer using them.

Collection scope

Scope answer retrieval to one collection, or keep All collections.

Answer length

Choose response depth. Auto adapts to available source detail.

Raw response (debug)

(no output)

Backend metrics

Health, storage, and vector index stats for operators.

Updated: -

Stats JSON

(no output)

Organization usage

Admin view of requests, tokens, storage, and latency for your tenant.

Updated: -

Requires admin privileges.

Per-route usage

(no data)

Usage JSON

(no output)

Job ID (optional)

Leave empty to load all in‑progress jobs for your tenant.

Actions

Job details

(no job loaded)

Recent jobs

(no data)

Collections

All collections for your tenant, with document titles and delete controls.

Updated: -

Collection list

(no data)

Deleting a collection removes stored chunk text and memory items. Vector deletion is not yet supported.

Developer documentation

Everything you need to integrate quickly: quickstart, auth, APIs, SDKs, and Python examples.

Outline

Quickstart Authentication Tenant auth mode Security API Reference AI assistant connections SDKs Examples Client examples Visibility without end-user login Architecture

Quickstart

Get your AtlasRAG base URL and user credentials.
Login to get a JWT (required to create API keys):

curl -X POST https://YOUR_BASE_URL/v1/login \
  -H "Content-Type: application/json" \
  -d '{ "username": "YOUR_USERNAME", "password": "YOUR_PASSWORD" }'

Create a service token (API key) for your app:

curl -X POST https://YOUR_BASE_URL/v1/admin/service-tokens \
  -H "Authorization: Bearer YOUR_ADMIN_JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "app-server",
    "principalId": "svc:app",
    "roles": ["indexer"],
    "expiresAt": "2026-12-31T00:00:00Z"
  }'

Index your first document:

curl -X POST https://YOUR_BASE_URL/v1/docs \
  -H "X-API-Key: YOUR_SERVICE_TOKEN" \
  -H "Idempotency-Key: idx-001" \
  -H "Content-Type: application/json" \
  -d '{
    "docId": "swe_notes",
    "collection": "default",
    "text": "Your document text..."
  }'

Playground upload supports .txt, .md, .json, .csv, .log, .pdf, and .docx. PDF and .docx are extracted to plain text in the browser before indexing. API ingest via POST /v1/docs remains JSON text (no multipart file upload).

Search or Ask with /v1/search and /v1/ask.

GET /v1/health

POST /v1/login

POST /v1/docs

Authentication

Protected endpoints require either a JWT or a service token. Send Authorization: Bearer <jwt> or X-API-Key: <token> (also supports Authorization: ApiKey <token>).

Multi-tenant isolation is enforced by the auth token. JWTs must include a tenant identifier using one of these claims: tenant, tid, or sub. Service tokens inherit the tenant from the admin who created them.

Admins can issue service tokens via POST /v1/admin/service-tokens. Store the returned token securely; it is shown only once.

Create API key (admin)

curl -X POST http://localhost:3000/v1/admin/service-tokens \
  -H "Authorization: Bearer YOUR_ADMIN_JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "ci-pipeline",
    "principalId": "svc:ci",
    "roles": ["indexer"],
    "expiresAt": "2026-12-31T00:00:00Z"
  }'

Requires role admin (or a token whose principal matches the tenant).

Tenant auth mode

Admins can set per-tenant login policy: sso_only, sso_plus_password, or password_only. sso_only disables password login; password_only disables SSO.

You can also restrict which SSO providers are allowed by supplying ssoProviders (google, azure, okta).

Get auth mode (admin)

curl -X GET http://localhost:3000/v1/admin/tenant \
  -H "Authorization: Bearer YOUR_ADMIN_JWT"

Set auth mode (admin)

curl -X PATCH http://localhost:3000/v1/admin/tenant \
  -H "Authorization: Bearer YOUR_ADMIN_JWT" \
  -H "Content-Type: application/json" \
  -d '{ "authMode": "sso_only", "ssoProviders": ["google"] }'

Valid values: sso_only, sso_plus_password, password_only.

Security

Built-in controls

Auth required: JWTs or service tokens protect all sensitive endpoints.
Tenant isolation: data is partitioned by tenant and enforced in every query.
Role-based access: reader/indexer/admin control access to privileged actions.
Tenant auth mode: enforce sso_only or password_only per tenant.
Tenant SSO allowlist: restrict SSO to google, azure, or okta.
Visibility + ACL: tenant, private, acl rules apply to search/recall.
Idempotency: write/index/reflect accept Idempotency-Key to prevent double writes.
Audit logs: sensitive actions (auth policy changes, key revokes, deletes) are recorded with actor + request metadata.
Rate limits + lockout: login throttling + failed-login lockout are enabled.
URL ingestion safety: private IP ranges are blocked (SSRF protection).
Prompt injection guard: source sanitization is enabled by default.

RBAC by endpoint

Endpoint	Role required
GET /v1/health	Public
POST /v1/login	Public
/v1/auth/*	Public (SSO)
GET /v1/stats	Reader+
GET /v1/metrics	Reader+
GET /v1/docs	Reader+
GET /v1/collections	Reader+
GET /v1/search	Reader+
POST /v1/ask	Reader+
POST /v1/memory/recall	Reader+
POST /v1/feedback	Reader+
POST /v1/memory/event	Reader+
POST /v1/docs	Indexer+
POST /v1/docs/url	Indexer+
DELETE /v1/docs/:docId	Indexer+
POST /v1/memory/write	Indexer+
POST /v1/memory/reflect	Indexer+
GET /v1/jobs	Reader+
GET /v1/jobs/:id	Reader+
POST /v1/memory/cleanup	Admin
POST /v1/memory/compact	Admin
DELETE /v1/collections/:collection	Admin
/v1/admin/service-tokens	Admin
/v1/admin/tenant	Admin
/v1/admin/usage	Admin

Reader+ means reader, indexer, or admin. Indexer+ means indexer or admin.

API Reference

Full OpenAPI schema is available at /openapi.json, with Swagger UI at /docs. All versioned routes live under /v1; legacy routes remain as aliases.

File uploads are currently a Playground UI capability. Public ingest APIs expect docId + text JSON payloads; use POST /v1/docs/url for URL-based ingestion.

/v1/docs index + list documents
/v1/docs/url index content from a URL
/v1/docs/:docId delete a document
/v1/search retrieve top‑K chunks
/v1/ask RAG answers with citations and controllable response length
/v1/memory/write durable memory items
/v1/memory/recall filtered recall
/v1/memory/reflect async reflection jobs
/v1/feedback user feedback signals
/v1/memory/event task outcome signals
/v1/metrics Prometheus metrics (per tenant)
/v1/jobs/:id job status
/v1/admin/tenant tenant auth mode (admin)
/v1/admin/service-tokens issue API keys (admin)

POST /v1/ask request fields

Field	Type	Required	Notes
question	string	Yes	User question to answer from retrieved sources.
k	integer	No	Top-K chunks retrieved before answer generation (default: 5).
docIds	string[]	No	Optional doc filter; only these doc IDs are searched.
collection	string	No	Limit retrieval to one collection. Use collectionScope=all to search all collections.
answerLength	enum	No	auto (default), short, medium, long.

auto adapts answer size to available evidence. The response includes data.answerLength with the effective mode used.

Response format (v1): all /v1 endpoints return a consistent envelope.

Success

{
  "ok": true,
  "data": { ... },
  "meta": {
    "tenantId": "acme",
    "collection": "default",
    "timestamp": "2026-02-12T12:00:00.000Z"
  }
}

Error

{
  "ok": false,
  "error": { "message": "Invalid input", "code": "INVALID_INPUT" },
  "meta": { "tenantId": "acme", "collection": "default", "timestamp": "..." }
}

Error codes (common)

Code	Meaning / cause
AUTH_REQUIRED	Missing or malformed auth header.
AUTH_INVALID	Invalid JWT/API key or invalid login credentials.
AUTH_EXPIRED	API key expired.
AUTH_REVOKED	API key revoked.
AUTH_CONFIG	Server auth misconfiguration.
AUTH_LOOKUP_FAILED	Auth lookup failed in the database.
RATE_LIMITED	Too many requests in the current window.
FORBIDDEN	Insufficient role/permissions for the endpoint.
NOT_FOUND	Requested resource does not exist.
INVALID_INPUT	Missing or invalid parameters in the request.
INVALID_DOC_ID	Doc ID failed validation (format/characters).
ACCOUNT_LOCKED	Account locked after repeated failed logins.
ACCOUNT_DISABLED	User account disabled by admin.
SSO_ONLY	Account requires SSO login.
IDEMPOTENCY_KEY_REQUIRED	Missing Idempotency-Key header.
IDEMPOTENCY_KEY_INVALID	Invalid or oversized idempotency key.
IDEMPOTENCY_KEY_REUSED	Idempotency key reused with different payload.
IDEMPOTENCY_IN_PROGRESS	Request with the same key is already running.

Some endpoints also return operation-specific codes like *_FAILED for internal failures.

AI assistant connections

Connect AtlasRAG documentation to your assistants so they can fetch current integration guidance directly from your docs stack. This includes llms.txt, a built-in MCP endpoint, and quick access patterns for ChatGPT and Claude.

Quick access URLs

Documentation page: (loading)
API reference: (loading)
llms.txt: (loading)
MCP server endpoint: (loading)

Use our MCP server

The AtlasRAG docs MCP server is available at (loading).

Once connected, your assistant can search AtlasRAG documentation in real time for API usage, authentication setup, AMV-L and TTL lifecycle behavior, and implementation patterns.

Connect with Claude Code

(loading command)

Project (local) scope: adds the MCP server only for the current working directory.

(loading command)

Connect with Claude Desktop

Open Claude Desktop.
Go to Settings > Connectors.
Add this MCP server URL: (loading).

Connect with Codex CLI

(loading command)

Connect with Cursor

(loading config)

Connect with VS Code

(loading config)

Connect with Antigravity

(loading config)

Quick prompt for ChatGPT and Claude

(loading prompt)

SDKs

Official SDK: sdk/node (Node.js). It supports JWT or API key auth, idempotency headers, and memory APIs. Additional SDKs can be added using the OpenAPI schema.

Examples

cURL:

Playground upload (UI)

In the Playground Ingest tab, choose a local file to auto-fill Document Text. Supported: .txt, .md, .json, .csv, .log, .pdf, .docx. Legacy .doc is not supported.

Index text

curl -X POST http://localhost:3000/v1/docs \
  -H "Authorization: Bearer YOUR_JWT" \
  -H "Idempotency-Key: idx-001" \
  -H "Content-Type: application/json" \
  -d '{
    "docId": "swe_notes",
    "collection": "default",
    "text": "Your document text..."
  }'

curl "http://localhost:3000/v1/search?q=write-ahead%20logging&k=5&collection=default" \
  -H "X-API-Key: YOUR_SERVICE_TOKEN"

Ask

curl -X POST http://localhost:3000/v1/ask \
  -H "X-API-Key: YOUR_SERVICE_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "question": "How is durability handled?", "k": 5, "answerLength": "medium" }'

Delete a document

curl -X DELETE "http://localhost:3000/v1/docs/swe_notes?collection=default" \
  -H "X-API-Key: YOUR_SERVICE_TOKEN"

Memory write + recall

curl -X POST http://localhost:3000/v1/memory/write \
  -H "Authorization: Bearer YOUR_JWT" \
  -H "Idempotency-Key: mem-001" \
  -H "Content-Type: application/json" \
  -d '{ "text": "Release shipped on Friday", "type": "semantic", "collection": "default", "agentId": "agent:release-bot", "tags": ["release", "ops"], "importanceHint": 0.6, "pinned": false }'

curl -X POST http://localhost:3000/v1/memory/recall \
  -H "Authorization: Bearer YOUR_JWT" \
  -H "Content-Type: application/json" \
  -d '{ "query": "release day", "types": ["semantic"], "tags": ["release"], "agentId": "agent:release-bot", "k": 5 }'

Memory feedback

curl -X POST http://localhost:3000/v1/feedback \
  -H "Authorization: Bearer YOUR_JWT" \
  -H "Content-Type: application/json" \
  -d '{ "memoryId": "mem_123", "feedback": "positive", "eventValue": 0.8 }'

Task outcome event

curl -X POST http://localhost:3000/v1/memory/event \
  -H "Authorization: Bearer YOUR_JWT" \
  -H "Content-Type: application/json" \
  -d '{ "memoryId": "mem_123", "eventType": "task_success", "eventValue": 0.9 }'

Recall filters support types, time range (since/until), tags, agentId, and collection.

Supported memory types: artifact, semantic, procedural, episodic, conversation, summary.

Memory types

Type	Use case
artifact	Source documents or raw inputs.
semantic	Facts or knowledge extracted from artifacts.
procedural	How-to steps or workflows.
episodic	Events with time/context.
conversation	Dialogue snippets or chat history.
summary	Compressed rollups or compactions.

Client examples (Python + Node.js)

Install + env

pip install requests
export ATLASRAG_URL="http://localhost:3000"
export ATLASRAG_API_KEY="YOUR_SERVICE_TOKEN"

End-to-end flow

import os
import time
import requests

BASE = os.getenv("ATLASRAG_URL", "http://localhost:3000")
API_KEY = os.getenv("ATLASRAG_API_KEY")  # or use JWT

headers = {
  "Content-Type": "application/json",
  "X-API-Key": API_KEY
}

# 1) Index
doc = {
  "docId": "swe_notes",
  "collection": "default",
  "text": "WAL keeps vectors durable across restarts."
}
res = requests.post(f"{BASE}/v1/docs", headers={**headers, "Idempotency-Key": "idx-001"}, json=doc)
res.raise_for_status()
print(res.json())

# 2) Search
res = requests.get(f"{BASE}/v1/search", headers=headers, params={
  "q": "durability",
  "k": 5,
  "collection": "default"
})
res.raise_for_status()
print(res.json())

# 3) Ask
res = requests.post(f"{BASE}/v1/ask", headers=headers, json={
  "question": "How do we persist vectors?",
  "k": 5,
  "answerLength": "short"
})
res.raise_for_status()
print(res.json())

# 4) Memory write
res = requests.post(f"{BASE}/v1/memory/write", headers={**headers, "Idempotency-Key": "mem-001"}, json={
  "type": "semantic",
  "collection": "default",
  "text": "Vector WAL is enabled in production.",
  "agentId": "agent:ops-bot",
  "tags": ["infra", "wal"],
  "importanceHint": 0.6,
  "pinned": false
})
res.raise_for_status()

# 5) Recall
res = requests.post(f"{BASE}/v1/memory/recall", headers=headers, json={
  "query": "WAL enabled",
  "types": ["semantic"],
  "tags": ["infra"],
  "k": 5
})
res.raise_for_status()
print(res.json())

# 5b) Feedback
res = requests.post(f"{BASE}/v1/feedback", headers=headers, json={
  "memoryId": "mem_123",
  "feedback": "positive",
  "eventValue": 0.8
})
res.raise_for_status()

# 6) Reflect (async job)
res = requests.post(f"{BASE}/v1/memory/reflect", headers={**headers, "Idempotency-Key": "reflect-001"}, json={
  "docId": "swe_notes",
  "types": ["semantic", "summary"],
  "collection": "default"
})
res.raise_for_status()
job_id = res.json()["data"]["job"]["id"]
#
# You can also reflect from a conversation memory item via "conversationId".

# 7) Poll job
while True:
  job = requests.get(f"{BASE}/v1/jobs/{job_id}", headers=headers).json()
  status = job["data"]["job"]["status"]
  if status in ("succeeded", "failed"):
    print(job)
    break
  time.sleep(2)

Collections + ACL visibility

# Restrict a document to an ACL list (inside the tenant)
res = requests.post(f"{BASE}/v1/docs", headers={**headers, "Idempotency-Key": "idx-002"}, json={
  "docId": "private_notes",
  "collection": "finance",
  "text": "Confidential budget details...",
  "visibility": "acl",
  "acl": ["user:alice", "user:bob"]
})

Server-side privileges (no end-user login)

# Requires ALLOW_PRINCIPAL_OVERRIDE=1 and an admin service token
res = requests.post(f"{BASE}/v1/memory/recall", headers=headers, json={
  "query": "policy details",
  "collection": "internal",
  "principalId": "user:alice",
  "privileges": ["role:employee", "dept:hr"],
  "types": ["semantic"],
  "k": 5
})

Install + env (Node 18+)

export ATLASRAG_URL="http://localhost:3000"
export ATLASRAG_API_KEY="YOUR_SERVICE_TOKEN"
node app.mjs

End-to-end flow

const BASE = process.env.ATLASRAG_URL || "http://localhost:3000";
const API_KEY = process.env.ATLASRAG_API_KEY;
const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
const headers = {
  "Content-Type": "application/json",
  "X-API-Key": API_KEY
};

const post = async (path, body, extra = {}) => {
  const res = await fetch(`${BASE}${path}`, {
    method: "POST",
    headers: { ...headers, ...extra },
    body: JSON.stringify(body)
  });
  const data = await res.json();
  if (!res.ok) throw new Error(JSON.stringify(data));
  return data;
};

const get = async (path, params = {}) => {
  const url = new URL(`${BASE}${path}`);
  Object.entries(params).forEach(([k, v]) => url.searchParams.set(k, v));
  const res = await fetch(url, { headers });
  const data = await res.json();
  if (!res.ok) throw new Error(JSON.stringify(data));
  return data;
};

// 1) Index
await post("/v1/docs", {
  docId: "swe_notes",
  collection: "default",
  text: "WAL keeps vectors durable across restarts."
}, { "Idempotency-Key": "idx-001" });

// 2) Search
console.log(await get("/v1/search", { q: "durability", k: "5", collection: "default" }));

// 3) Ask
console.log(await post("/v1/ask", { question: "How do we persist vectors?", k: 5, answerLength: "long" }));

// 4) Memory write
await post("/v1/memory/write", {
  type: "semantic",
  collection: "default",
  text: "Vector WAL is enabled in production.",
  agentId: "agent:ops-bot",
  tags: ["infra", "wal"],
  importanceHint: 0.6,
  pinned: false
}, { "Idempotency-Key": "mem-001" });

// 5) Recall
console.log(await post("/v1/memory/recall", { query: "WAL enabled", types: ["semantic"], tags: ["infra"], k: 5 }));

// 5b) Feedback
await post("/v1/feedback", { memoryId: "mem_123", feedback: "positive", eventValue: 0.8 });

// 6) Reflect + job polling
const reflect = await post("/v1/memory/reflect", {
  docId: "swe_notes",
  types: ["semantic", "summary"],
  collection: "default"
}, { "Idempotency-Key": "reflect-001" });
// You can also pass conversationId to reflect from a conversation memory item.

const jobId = reflect.data.job.id;
while (true) {
  const job = await get(`/v1/jobs/${jobId}`);
  const status = job.data.job.status;
  if (status === "succeeded" || status === "failed") {
    console.log(job);
    break;
  }
  await sleep(2000);
}

Collections + ACL visibility

await post("/v1/docs", {
  docId: "private_notes",
  collection: "finance",
  text: "Confidential budget details...",
  visibility: "acl",
  acl: ["user:alice", "user:bob"]
}, { "Idempotency-Key": "idx-002" });

Server-side privileges (no end-user login)

await post("/v1/memory/recall", {
  query: "policy details",
  collection: "internal",
  principalId: "user:alice",
  privileges: ["role:employee", "dept:hr"],
  types: ["semantic"],
  k: 5
});

Visibility can be tenant, private, or acl. For ACL, include a list of allowed principals.

Visibility without end-user login

If your app does not want users to log in directly, you can still enforce visibility by having your backend call AtlasRAG with a service token and pass principalId and/or privileges in the payload. The server matches these against the item's visibility and ACL list. Enable this by setting ALLOW_PRINCIPAL_OVERRIDE=1 and using an admin service token. Never expose this token to the browser.

Write with ACL (server-side principal)

curl -X POST http://localhost:3000/v1/docs \
  -H "X-API-Key: ADMIN_SERVICE_TOKEN" \
  -H "Idempotency-Key: idx-003" \
  -H "Content-Type: application/json" \
  -d '{
    "docId": "hr_policy",
    "collection": "internal",
    "text": "Confidential HR policy...",
    "principalId": "user:alice",
    "privileges": ["role:employee", "dept:hr"],
    "visibility": "acl",
    "acl": ["user:alice", "dept:hr"]
  }'

Recall as a principal (server-side)

curl -X POST http://localhost:3000/v1/memory/recall \
  -H "X-API-Key: ADMIN_SERVICE_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "policy details",
    "collection": "internal",
    "principalId": "user:alice",
    "privileges": ["role:employee", "dept:hr"],
    "types": ["semantic"],
    "k": 5
  }'

If you use JWTs, principalId is derived from the token and should not be provided in the payload. privileges is only honored for admin service tokens when ALLOW_PRINCIPAL_OVERRIDE=1.

Architecture

Gateway (Node) handles auth, routing, and orchestration.
Vector store (C++ TCP service) stores embeddings and serves similarity search.
Postgres stores chunks, memory items, links, jobs, and idempotency keys.
OpenAI (or compatible) provides embeddings and generation.
Background jobs handle reflection, summarization, redundancy scoring, and lifecycle tasks.
Expired items are automatically swept and removed (vectors + DB rows), preventing orphan vectors.
Jobs retry with exponential backoff before transitioning to failed.
Job reruns are idempotent (derived memories are replaced, not duplicated).
Structured logs include request_id, tenant_id, and collection.
Prometheus metrics are exposed at /metrics (scoped to the tenant; admin sees all tenants).

Technical White Paper

Adaptive Memory Value + Lifecycle (AMV-L)

A value-driven tiering and retrieval-control policy for keeping long-term memory high-signal and cost-bounded.

Objective

Retain useful memory while bounding retrieval and prompt costs.

Output

Per-item tiering plus lifecycle actions (retain, compact, evict, synthesize).

Execution Model

Request-path event queue + asynchronous sweeps and telemetry checks.

Outline

Overview Problem framing Incremental value update Tier model + hysteresis Initialization safety Retrieval gating Scoped vector search Prompt insertion set Contribution correctness Request-path overhead Lifecycle sweeps Cold eviction Compaction Promotion/synthesis TTL precedence Telemetry + acceptance checks Resulting properties Conclusion

1. Overview

AtlasRAG implements Adaptive Memory Value + Lifecycle (AMV-L) as a managed-resource policy, not a passive memory store. Every memory item carries a scalar value and an explicit tier: HOT, WARM, or COLD.

The core contract is bounded retrieval: R = HOT union Sample_k(WARM) (optionally tiny cold probe), then vector search is scoped to this set so online cost is driven by working-set size instead of total memory size.

2. Problem Framing

Production memory systems fail when retrieval cost scales with total retained memory:

Store-everything approaches create retrieval dilution and prompt bloat.
TTL-only approaches can remove useful knowledge while still allowing expensive wide scans before expiry.

AMV-L addresses both with two controls: value-driven lifecycle transitions and hard retrieval gating that excludes cold memory by default.

3. Incremental Value Update V(m)

Each memory item m has value V(m) >= 0. AtlasRAG updates value incrementally per event using recency decay plus event reinforcement:

Request-Time Update

V_next = clamp(
  V_prev * exp(-lambda * delta_t_days)
  + alpha * I_access
  + beta * I_contribute
  - gamma * I_negative,
  0,
  V_max
)

Notation

Symbol	Interpretation
delta_t_days	Time since value_last_update_ts
I_access	1 for retrieval/use access events, else 0
I_contribute	1 only when memory contributed to a successful answer path
I_negative	Failure/negative-feedback penalty term
V_max	Hard upper cap (default 1.0)

Default runtime parameters are environment-driven (for example MEMORY_ACCESS_ALPHA, MEMORY_CONTRIBUTION_BETA, MEMORY_VALUE_DECAY_LAMBDA, MEMORY_VALUE_MAX).

4. Tier Model + Hysteresis

Memory is partitioned into lifecycle tiers with hysteresis thresholds:

Tiering

HOT:   V >= theta_hot_up (or remain HOT until V < theta_hot_down)
WARM:  between hot and warm thresholds
COLD:  below theta_warm_down (or remain COLD until V >= theta_warm_up)

Separate up/down boundaries avoid oscillation around a single threshold. Pinned memories are prevented from demotion by tier transition logic.

5. Initialization Safety

New memory writes initialize in WARM with an initial value constrained to the warm band (theta_warm_up <= V_init < theta_hot_up). This prevents new uploads from being immediately cold-evictable.

Initialization also stamps value_last_update_ts and tier_last_update_ts, enabling true incremental decay from first write.

6. Retrieval Gating

Candidate memories are hard-bounded before vector search:

Bounded Retrieval Set

R = HOT union Sample_k(WARM)
COLD intersection R = empty

Optional cold probing exists as a tiny budget (MEMORY_RETRIEVAL_COLD_PROBE_EPSILON), disabled by default. When disabled, runtime checks enforce cold_candidates == 0.

7. Scoped Vector Search

Similarity search is executed only on chunk vectors belonging to the bounded memory set (R), not the full corpus. The gateway sends explicit candidate IDs to the vector service.

Complexity Target

online scan cost ~= O(|HOT| + k) at memory-selection stage
vector scan is bounded to chunks mapped from that set

8. Prompt Insertion Set

After dense/lexical fusion and reranking, only top results are inserted into prompts: S = Top_n(sim(q, R)). This bounds memory token footprint independently of total stored items.

Prompt telemetry records prompt_tokens_est, memory_tokens_est, and total_tokens_est per request.

9. Contribution Correctness

Contribution boost is applied after successful answer completion and only to memories actually injected into the generated prompt. This avoids over-crediting memories that were retrieved but not used.

Negative outcomes (task_fail, user_negative) apply a penalty term in the incremental update.

10. Request-Path Overhead

Request path records memory events into an in-memory queue. Background flush workers persist batched updates asynchronously, reducing synchronous DB write overhead on latency-sensitive paths.

This keeps online work close to "compute bounded candidates + run scoped retrieval", while value decay and lifecycle management run in periodic background sweeps.

11. Lifecycle Policy

Lifecycle sweeps enforce tier-safe policies on non-expired items:

Condition	Action
expired	delete
pinned	keep
tier == COLD and V < theta_evict	evict
tier == COLD and V < summary_threshold	compact
tier == HOT and policy says synthesize	promote/summarize
otherwise	retain

Policy Skeleton

if expired: delete
else if pinned: retain
else if tier == COLD and V < theta_evict: evict
else if tier == COLD and V < summary_threshold: compact
else if tier == HOT and synthesis_enabled: promote
else: retain

12. Cold Eviction Operator

Value-based eviction is restricted to cold memory: tier == COLD and V < theta_evict. Deletion removes both metadata rows and vector entries (with reconcile jobs when vector deletion fails).

13. Compaction Operator

Compaction targets cold, lower-value groups and replaces multiple items with a denser summary memory. This preserves semantic coverage while shrinking retrieval noise and storage footprint.

14. Promotion / Synthesis Operator

Promotion synthesizes reusable higher-order memories from high-value context (for example semantic or procedural distilled items). Tier transitions are always value-based; synthesis is an optional additional lifecycle operator.

15. TTL Precedence

TTL remains a hard retention boundary. Expiry deletes take precedence over value and tier state, and AMV-L operates only on non-expired items.

16. Telemetry + Acceptance Checks

AMV-L emits per-request telemetry for bounded-set verification and cost tracking:

hot_count, warm_sampled, cold_candidates.
retrieval_set_size, retrieval_bound.
vector_search_scanned_count.
prompt_tokens_est, memory_tokens_est, total_tokens_est.
Lifecycle transition/action events for promote, demote, compact, and delete.

Key acceptance targets: cold_candidates == 0 by default, bounded retrieval size, and latency percentiles (p50/p95/p99) that track bounded prompt and retrieval sets.

17. Resulting System Properties

Effective working set is explicitly bounded before similarity search.
Cold memory is excluded from default retrieval, reducing noise.
Prompt memory token usage is observable and controllable.
Value/tier state adapts online while heavy maintenance runs asynchronously.

18. Conclusion

AMV-L in AtlasRAG is implemented as an incremental, tiered control loop with explicit retrieval gates and scoped vector search. The net effect is better cost predictability and cleaner retrieval behavior than unconstrained global-memory scanning.

Auth type

Use service tokens for server-to-server integrations.

Access token / API key

Saved locally in your browser (localStorage).

Use /login to generate a JWT and save it automatically.

Tips

- Index at least one document before searching or asking.
- If you see Unauthorized, your token is invalid, expired, or revoked.
- Health check only verifies gateway to TCP reachability.
- API keys are created by admins via POST /v1/admin/service-tokens.

Create API key (admin)

Admins can issue service tokens for server-to-server use.

Advanced options

Principal ID (optional)

Defaults to your token subject if omitted.

Roles (comma separated)

Valid roles: admin, indexer, reader.

Expires at

New API key (masked)

(not created)

Tenant ID

Tenant name

Auth mode

Admins can enforce SSO-only or password-only login per tenant.

Allowed SSO providers

Uncheck all to disable SSO for this tenant.

Google Azure Okta