AtlasRAG

Persistent Memory Engine for AI Agents

AtlasRAG provides durable memory APIs, multi-tenant governance, and retrieval workflows so teams can store, recall, and operationalize knowledge for agents in production.

Checking /health...
Protected endpoints use Authorization: Bearer / X-API-Key

Production Memory Platform

AtlasRAG turns retrieval into durable, governed memory for AI agents.

AtlasRAG combines a high-speed C++ vector store with a secure Node gateway so teams can ingest knowledge, retrieve context, and continuously improve memory quality in production without building custom control loops.

This gives product teams a stable memory foundation for customer assistants, internal copilots, and workflow agents, with strong tenant boundaries, operational controls, and measurable runtime behavior.

Full stack placement

AtlasRAG full architecture diagram End users connect to builder-owned agent apps. Apps call AtlasRAG ingest, run, serve, observe, and improve layers, then return responses to end users. End Users (outside AtlasRAG) Web app users | mobile users | Slack/Teams users Builder-owned App / Agent Layer (outside AtlasRAG) Chatbot UI, customer copilots, support assistants, internal knowledge portals, specs workbenches, whitepaper systems Agent runtime/orchestrator (LangChain or custom code), internal agentic search, retrieval QA surfaces, multi-agent workflows Approval chains, tool routing, domain actions, compliance workflows, and business automation pipelines AtlasRAG Platform Layer Gateway APIs: /v1/ask | /v1/search | /v1/memory/* | /v1/feedback INGEST - Docs / URLs - SDK / API writes - Tenant + ACL policy RUN AtlasRAG Gateway C++ vector core Postgres memory store SERVE - Search - Ask - Memory recall - Job status OBSERVE Telemetry | Metrics | Traces/Logs IMPROVE AMV-L | TTL sweep reflection + feedback jobs calls responses
C++ vector core Node gateway APIs Tenant + collection isolation AMV-L lifecycle automation

What AtlasRAG solves for client programs

Answer drift over time

Durable memory and recall controls keep agent outputs grounded in approved, reusable context.

Governance and isolation pressure

Tenant boundaries, visibility modes, ACLs, and role checks make data access explicit and auditable.

Cost growth from unmanaged memory

AMV-L lifecycle policies and TTL sweeps help bound memory growth and reduce unnecessary token use.

Opaque production behavior

Gateway telemetry, metrics, and job status endpoints create observability for operators and stakeholders.

01

Unified ingest, search, and answer APIs

AtlasRAG provides one operational surface for ingestion, retrieval, and answer generation, so implementation teams can ship memory-backed experiences without stitching multiple services together.

  • /v1/docs and /v1/docs/url for indexing.
  • /v1/search and /v1/ask for retrieval and answer workflows.
  • Requests are scoped by tenant and collection to prevent cross-environment leakage.
  • Idempotency keys on write operations reduce duplicate ingestion risk in distributed clients.
02

Durable memory model, not just transient context

Beyond chunks, AtlasRAG stores typed memory objects so agents can preserve and reuse knowledge with explicit meaning, provenance, and lifecycle controls.

  • Six memory types: artifact, semantic, procedural, episodic, conversation, and summary.
  • /v1/memory/write, /v1/memory/recall, and /v1/memory/reflect.
  • Memory scoring signals from /v1/feedback and /v1/memory/event.
  • Async reflection jobs generate derived knowledge while preserving links to source artifacts.
03

Enterprise governance and access control

Security and governance are built into the request path and data model, so client deployments can enforce policy without custom wrappers.

  • JWT and service-token auth with role-based endpoint controls.
  • Visibility modes: tenant, private, and acl.
  • SSO support for Google, Azure, and Okta with tenant-level auth mode controls.
  • Principal-aware recall constraints protect private and ACL-scoped memory access.
04

AMV-L keeps memory quality high over time

AtlasRAG applies Adaptive Memory Value + Lifecycle (AMV-L) to evaluate memory utility continuously and automate retention decisions based on measured behavior.

  • Lifecycle actions promote, compact, retain, or delete by policy thresholds.
  • TTL sweep remains the hard retention boundary for expired memory.
  • Async job processing keeps reflection and lifecycle operations non-blocking.
  • Value decay and redundancy sweeps reduce stale or duplicate memories in long-running deployments.
05

Operationally ready for developer teams

Integration and operations are designed for real delivery teams, from pilots through production scale-up.

  • OpenAPI schema, Swagger UI, and official Node SDK for integration.
  • Idempotency support for write/index/reflect endpoints and idempotent job reruns.
  • Metrics, telemetry, and structured request logs for production visibility.
  • Admin endpoints support service tokens, usage reporting, and tenant policy management.

Runtime architecture

Gateway API Layer

Handles auth, tenancy, ACL policy, idempotency, ingestion, retrieval, ask, and memory endpoints.

Persistent Data Plane

C++ TCP vector index stores embeddings while Postgres stores chunk text, memory items, links, and jobs.

Lifecycle & Telemetry Plane

Scheduled jobs run TTL, AMV-L lifecycle tasks, and telemetry snapshots for ongoing optimization.

Playground

Ingest content, then search or ask with the same context — all in one place.

1Ingest
2Search
3Ask

Ingest

Paste text, upload a file, or index a link.

Collection is your content bucket/namespace. Use default if you do not need separation. Valid names use only letters, numbers, ., -, and _ (example: team_docs, v2.alpha). Invalid: spaces or characters like /, #, ?.
Use letters, numbers, dot, dash, or underscore (no spaces).
We split the text into chunks, embed each chunk, store vectors in the C++ TCP service, and store chunk text in Postgres for previews and citations.
You can also load a local file or index directly from a URL.
Text files are read directly. PDF and .docx are extracted in your browser into Document Text.
If a URL is provided, the server fetches and indexes it.
Raw response (debug)
(no output)

Search

Semantic search across your indexed documents.

Higher K returns more chunks (slower + more context).
Scope retrieval to one collection, or keep All collections.
Raw response (debug)
(no output)

Ask

RAG answer grounded in your indexed sources.

We retrieve top-K chunks, then ask the model to answer using them.
Scope answer retrieval to one collection, or keep All collections.
Choose response depth. Auto adapts to available source detail.
Raw response (debug)
(no output)

Backend metrics

Health, storage, and vector index stats for operators.

Updated: -
Stats JSON
(no output)

Organization usage

Admin view of requests, tokens, storage, and latency for your tenant.

Updated: -

Requires admin privileges.

Per-route usage
(no data)
Usage JSON
(no output)
Leave empty to load all in‑progress jobs for your tenant.
Job details
(no job loaded)
Recent jobs
(no data)

Collections

All collections for your tenant, with document titles and delete controls.

Updated: -
Collection list
(no data)

Deleting a collection removes stored chunk text and memory items. Vector deletion is not yet supported.

Developer documentation

Everything you need to integrate quickly: quickstart, auth, APIs, SDKs, and Python examples.

Quickstart
  1. Get your AtlasRAG base URL and user credentials.
  2. Login to get a JWT (required to create API keys):
curl -X POST https://YOUR_BASE_URL/v1/login \
  -H "Content-Type: application/json" \
  -d '{ "username": "YOUR_USERNAME", "password": "YOUR_PASSWORD" }'
  1. Create a service token (API key) for your app:
curl -X POST https://YOUR_BASE_URL/v1/admin/service-tokens \
  -H "Authorization: Bearer YOUR_ADMIN_JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "app-server",
    "principalId": "svc:app",
    "roles": ["indexer"],
    "expiresAt": "2026-12-31T00:00:00Z"
  }'
  1. Index your first document:
curl -X POST https://YOUR_BASE_URL/v1/docs \
  -H "X-API-Key: YOUR_SERVICE_TOKEN" \
  -H "Idempotency-Key: idx-001" \
  -H "Content-Type: application/json" \
  -d '{
    "docId": "swe_notes",
    "collection": "default",
    "text": "Your document text..."
  }'

Playground upload supports .txt, .md, .json, .csv, .log, .pdf, and .docx. PDF and .docx are extracted to plain text in the browser before indexing. API ingest via POST /v1/docs remains JSON text (no multipart file upload).

  1. Search or Ask with /v1/search and /v1/ask.
GET /v1/health
POST /v1/login
POST /v1/docs
Authentication

Protected endpoints require either a JWT or a service token. Send Authorization: Bearer <jwt> or X-API-Key: <token> (also supports Authorization: ApiKey <token>).

Multi-tenant isolation is enforced by the auth token. JWTs must include a tenant identifier using one of these claims: tenant, tid, or sub. Service tokens inherit the tenant from the admin who created them.

Admins can issue service tokens via POST /v1/admin/service-tokens. Store the returned token securely; it is shown only once.

Create API key (admin)
curl -X POST http://localhost:3000/v1/admin/service-tokens \
  -H "Authorization: Bearer YOUR_ADMIN_JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "ci-pipeline",
    "principalId": "svc:ci",
    "roles": ["indexer"],
    "expiresAt": "2026-12-31T00:00:00Z"
  }'

Requires role admin (or a token whose principal matches the tenant).

Tenant auth mode

Admins can set per-tenant login policy: sso_only, sso_plus_password, or password_only. sso_only disables password login; password_only disables SSO.

You can also restrict which SSO providers are allowed by supplying ssoProviders (google, azure, okta).

Get auth mode (admin)
curl -X GET http://localhost:3000/v1/admin/tenant \
  -H "Authorization: Bearer YOUR_ADMIN_JWT"
Set auth mode (admin)
curl -X PATCH http://localhost:3000/v1/admin/tenant \
  -H "Authorization: Bearer YOUR_ADMIN_JWT" \
  -H "Content-Type: application/json" \
  -d '{ "authMode": "sso_only", "ssoProviders": ["google"] }'

Valid values: sso_only, sso_plus_password, password_only.

Security
Built-in controls
  • Auth required: JWTs or service tokens protect all sensitive endpoints.
  • Tenant isolation: data is partitioned by tenant and enforced in every query.
  • Role-based access: reader/indexer/admin control access to privileged actions.
  • Tenant auth mode: enforce sso_only or password_only per tenant.
  • Tenant SSO allowlist: restrict SSO to google, azure, or okta.
  • Visibility + ACL: tenant, private, acl rules apply to search/recall.
  • Idempotency: write/index/reflect accept Idempotency-Key to prevent double writes.
  • Audit logs: sensitive actions (auth policy changes, key revokes, deletes) are recorded with actor + request metadata.
  • Rate limits + lockout: login throttling + failed-login lockout are enabled.
  • URL ingestion safety: private IP ranges are blocked (SSRF protection).
  • Prompt injection guard: source sanitization is enabled by default.
RBAC by endpoint
Endpoint Role required
GET /v1/healthPublic
POST /v1/loginPublic
/v1/auth/*Public (SSO)
GET /v1/statsReader+
GET /v1/metricsReader+
GET /v1/docsReader+
GET /v1/collectionsReader+
GET /v1/searchReader+
POST /v1/askReader+
POST /v1/memory/recallReader+
POST /v1/feedbackReader+
POST /v1/memory/eventReader+
POST /v1/docsIndexer+
POST /v1/docs/urlIndexer+
DELETE /v1/docs/:docIdIndexer+
POST /v1/memory/writeIndexer+
POST /v1/memory/reflectIndexer+
GET /v1/jobsReader+
GET /v1/jobs/:idReader+
POST /v1/memory/cleanupAdmin
POST /v1/memory/compactAdmin
DELETE /v1/collections/:collectionAdmin
/v1/admin/service-tokensAdmin
/v1/admin/tenantAdmin
/v1/admin/usageAdmin

Reader+ means reader, indexer, or admin. Indexer+ means indexer or admin.

API Reference

Full OpenAPI schema is available at /openapi.json, with Swagger UI at /docs. All versioned routes live under /v1; legacy routes remain as aliases.

File uploads are currently a Playground UI capability. Public ingest APIs expect docId + text JSON payloads; use POST /v1/docs/url for URL-based ingestion.

  • /v1/docs index + list documents
  • /v1/docs/url index content from a URL
  • /v1/docs/:docId delete a document
  • /v1/search retrieve top‑K chunks
  • /v1/ask RAG answers with citations and controllable response length
  • /v1/memory/write durable memory items
  • /v1/memory/recall filtered recall
  • /v1/memory/reflect async reflection jobs
  • /v1/feedback user feedback signals
  • /v1/memory/event task outcome signals
  • /v1/metrics Prometheus metrics (per tenant)
  • /v1/jobs/:id job status
  • /v1/admin/tenant tenant auth mode (admin)
  • /v1/admin/service-tokens issue API keys (admin)
POST /v1/ask request fields
Field Type Required Notes
question string Yes User question to answer from retrieved sources.
k integer No Top-K chunks retrieved before answer generation (default: 5).
docIds string[] No Optional doc filter; only these doc IDs are searched.
collection string No Limit retrieval to one collection. Use collectionScope=all to search all collections.
answerLength enum No auto (default), short, medium, long.

auto adapts answer size to available evidence. The response includes data.answerLength with the effective mode used.

Response format (v1): all /v1 endpoints return a consistent envelope.

Success
{
  "ok": true,
  "data": { ... },
  "meta": {
    "tenantId": "acme",
    "collection": "default",
    "timestamp": "2026-02-12T12:00:00.000Z"
  }
}
Error
{
  "ok": false,
  "error": { "message": "Invalid input", "code": "INVALID_INPUT" },
  "meta": { "tenantId": "acme", "collection": "default", "timestamp": "..." }
}
Error codes (common)
Code Meaning / cause
AUTH_REQUIREDMissing or malformed auth header.
AUTH_INVALIDInvalid JWT/API key or invalid login credentials.
AUTH_EXPIREDAPI key expired.
AUTH_REVOKEDAPI key revoked.
AUTH_CONFIGServer auth misconfiguration.
AUTH_LOOKUP_FAILEDAuth lookup failed in the database.
RATE_LIMITEDToo many requests in the current window.
FORBIDDENInsufficient role/permissions for the endpoint.
NOT_FOUNDRequested resource does not exist.
INVALID_INPUTMissing or invalid parameters in the request.
INVALID_DOC_IDDoc ID failed validation (format/characters).
ACCOUNT_LOCKEDAccount locked after repeated failed logins.
ACCOUNT_DISABLEDUser account disabled by admin.
SSO_ONLYAccount requires SSO login.
IDEMPOTENCY_KEY_REQUIREDMissing Idempotency-Key header.
IDEMPOTENCY_KEY_INVALIDInvalid or oversized idempotency key.
IDEMPOTENCY_KEY_REUSEDIdempotency key reused with different payload.
IDEMPOTENCY_IN_PROGRESSRequest with the same key is already running.

Some endpoints also return operation-specific codes like *_FAILED for internal failures.

AI assistant connections

Connect AtlasRAG documentation to your assistants so they can fetch current integration guidance directly from your docs stack. This includes llms.txt, a built-in MCP endpoint, and quick access patterns for ChatGPT and Claude.

Quick access URLs
Use our MCP server

The AtlasRAG docs MCP server is available at (loading).

Once connected, your assistant can search AtlasRAG documentation in real time for API usage, authentication setup, AMV-L and TTL lifecycle behavior, and implementation patterns.

Connect with Claude Code
(loading command)

Project (local) scope: adds the MCP server only for the current working directory.

(loading command)
Connect with Claude Desktop
  1. Open Claude Desktop.
  2. Go to Settings > Connectors.
  3. Add this MCP server URL: (loading).
Connect with Codex CLI
(loading command)
Connect with Cursor
(loading config)
Connect with VS Code
(loading config)
Connect with Antigravity
(loading config)
Quick prompt for ChatGPT and Claude
(loading prompt)
SDKs

Official SDK: sdk/node (Node.js). It supports JWT or API key auth, idempotency headers, and memory APIs. Additional SDKs can be added using the OpenAPI schema.

Examples

cURL:

Playground upload (UI)

In the Playground Ingest tab, choose a local file to auto-fill Document Text. Supported: .txt, .md, .json, .csv, .log, .pdf, .docx. Legacy .doc is not supported.

Index text
curl -X POST http://localhost:3000/v1/docs \
  -H "Authorization: Bearer YOUR_JWT" \
  -H "Idempotency-Key: idx-001" \
  -H "Content-Type: application/json" \
  -d '{
    "docId": "swe_notes",
    "collection": "default",
    "text": "Your document text..."
  }'
Search
curl "http://localhost:3000/v1/search?q=write-ahead%20logging&k=5&collection=default" \
  -H "X-API-Key: YOUR_SERVICE_TOKEN"
Ask
curl -X POST http://localhost:3000/v1/ask \
  -H "X-API-Key: YOUR_SERVICE_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "question": "How is durability handled?", "k": 5, "answerLength": "medium" }'
Delete a document
curl -X DELETE "http://localhost:3000/v1/docs/swe_notes?collection=default" \
  -H "X-API-Key: YOUR_SERVICE_TOKEN"
Memory write + recall
curl -X POST http://localhost:3000/v1/memory/write \
  -H "Authorization: Bearer YOUR_JWT" \
  -H "Idempotency-Key: mem-001" \
  -H "Content-Type: application/json" \
  -d '{ "text": "Release shipped on Friday", "type": "semantic", "collection": "default", "agentId": "agent:release-bot", "tags": ["release", "ops"], "importanceHint": 0.6, "pinned": false }'

curl -X POST http://localhost:3000/v1/memory/recall \
  -H "Authorization: Bearer YOUR_JWT" \
  -H "Content-Type: application/json" \
  -d '{ "query": "release day", "types": ["semantic"], "tags": ["release"], "agentId": "agent:release-bot", "k": 5 }'
Memory feedback
curl -X POST http://localhost:3000/v1/feedback \
  -H "Authorization: Bearer YOUR_JWT" \
  -H "Content-Type: application/json" \
  -d '{ "memoryId": "mem_123", "feedback": "positive", "eventValue": 0.8 }'
Task outcome event
curl -X POST http://localhost:3000/v1/memory/event \
  -H "Authorization: Bearer YOUR_JWT" \
  -H "Content-Type: application/json" \
  -d '{ "memoryId": "mem_123", "eventType": "task_success", "eventValue": 0.9 }'

Recall filters support types, time range (since/until), tags, agentId, and collection.

Supported memory types: artifact, semantic, procedural, episodic, conversation, summary.

Memory types
Type Use case
artifactSource documents or raw inputs.
semanticFacts or knowledge extracted from artifacts.
proceduralHow-to steps or workflows.
episodicEvents with time/context.
conversationDialogue snippets or chat history.
summaryCompressed rollups or compactions.
Client examples (Python + Node.js)
Install + env
pip install requests
export ATLASRAG_URL="http://localhost:3000"
export ATLASRAG_API_KEY="YOUR_SERVICE_TOKEN"
End-to-end flow
import os
import time
import requests

BASE = os.getenv("ATLASRAG_URL", "http://localhost:3000")
API_KEY = os.getenv("ATLASRAG_API_KEY")  # or use JWT

headers = {
  "Content-Type": "application/json",
  "X-API-Key": API_KEY
}

# 1) Index
doc = {
  "docId": "swe_notes",
  "collection": "default",
  "text": "WAL keeps vectors durable across restarts."
}
res = requests.post(f"{BASE}/v1/docs", headers={**headers, "Idempotency-Key": "idx-001"}, json=doc)
res.raise_for_status()
print(res.json())

# 2) Search
res = requests.get(f"{BASE}/v1/search", headers=headers, params={
  "q": "durability",
  "k": 5,
  "collection": "default"
})
res.raise_for_status()
print(res.json())

# 3) Ask
res = requests.post(f"{BASE}/v1/ask", headers=headers, json={
  "question": "How do we persist vectors?",
  "k": 5,
  "answerLength": "short"
})
res.raise_for_status()
print(res.json())

# 4) Memory write
res = requests.post(f"{BASE}/v1/memory/write", headers={**headers, "Idempotency-Key": "mem-001"}, json={
  "type": "semantic",
  "collection": "default",
  "text": "Vector WAL is enabled in production.",
  "agentId": "agent:ops-bot",
  "tags": ["infra", "wal"],
  "importanceHint": 0.6,
  "pinned": false
})
res.raise_for_status()

# 5) Recall
res = requests.post(f"{BASE}/v1/memory/recall", headers=headers, json={
  "query": "WAL enabled",
  "types": ["semantic"],
  "tags": ["infra"],
  "k": 5
})
res.raise_for_status()
print(res.json())

# 5b) Feedback
res = requests.post(f"{BASE}/v1/feedback", headers=headers, json={
  "memoryId": "mem_123",
  "feedback": "positive",
  "eventValue": 0.8
})
res.raise_for_status()

# 6) Reflect (async job)
res = requests.post(f"{BASE}/v1/memory/reflect", headers={**headers, "Idempotency-Key": "reflect-001"}, json={
  "docId": "swe_notes",
  "types": ["semantic", "summary"],
  "collection": "default"
})
res.raise_for_status()
job_id = res.json()["data"]["job"]["id"]
#
# You can also reflect from a conversation memory item via "conversationId".

# 7) Poll job
while True:
  job = requests.get(f"{BASE}/v1/jobs/{job_id}", headers=headers).json()
  status = job["data"]["job"]["status"]
  if status in ("succeeded", "failed"):
    print(job)
    break
  time.sleep(2)
Collections + ACL visibility
# Restrict a document to an ACL list (inside the tenant)
res = requests.post(f"{BASE}/v1/docs", headers={**headers, "Idempotency-Key": "idx-002"}, json={
  "docId": "private_notes",
  "collection": "finance",
  "text": "Confidential budget details...",
  "visibility": "acl",
  "acl": ["user:alice", "user:bob"]
})
Server-side privileges (no end-user login)
# Requires ALLOW_PRINCIPAL_OVERRIDE=1 and an admin service token
res = requests.post(f"{BASE}/v1/memory/recall", headers=headers, json={
  "query": "policy details",
  "collection": "internal",
  "principalId": "user:alice",
  "privileges": ["role:employee", "dept:hr"],
  "types": ["semantic"],
  "k": 5
})
Install + env (Node 18+)
export ATLASRAG_URL="http://localhost:3000"
export ATLASRAG_API_KEY="YOUR_SERVICE_TOKEN"
node app.mjs
End-to-end flow
const BASE = process.env.ATLASRAG_URL || "http://localhost:3000";
const API_KEY = process.env.ATLASRAG_API_KEY;
const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
const headers = {
  "Content-Type": "application/json",
  "X-API-Key": API_KEY
};

const post = async (path, body, extra = {}) => {
  const res = await fetch(`${BASE}${path}`, {
    method: "POST",
    headers: { ...headers, ...extra },
    body: JSON.stringify(body)
  });
  const data = await res.json();
  if (!res.ok) throw new Error(JSON.stringify(data));
  return data;
};

const get = async (path, params = {}) => {
  const url = new URL(`${BASE}${path}`);
  Object.entries(params).forEach(([k, v]) => url.searchParams.set(k, v));
  const res = await fetch(url, { headers });
  const data = await res.json();
  if (!res.ok) throw new Error(JSON.stringify(data));
  return data;
};

// 1) Index
await post("/v1/docs", {
  docId: "swe_notes",
  collection: "default",
  text: "WAL keeps vectors durable across restarts."
}, { "Idempotency-Key": "idx-001" });

// 2) Search
console.log(await get("/v1/search", { q: "durability", k: "5", collection: "default" }));

// 3) Ask
console.log(await post("/v1/ask", { question: "How do we persist vectors?", k: 5, answerLength: "long" }));

// 4) Memory write
await post("/v1/memory/write", {
  type: "semantic",
  collection: "default",
  text: "Vector WAL is enabled in production.",
  agentId: "agent:ops-bot",
  tags: ["infra", "wal"],
  importanceHint: 0.6,
  pinned: false
}, { "Idempotency-Key": "mem-001" });

// 5) Recall
console.log(await post("/v1/memory/recall", { query: "WAL enabled", types: ["semantic"], tags: ["infra"], k: 5 }));

// 5b) Feedback
await post("/v1/feedback", { memoryId: "mem_123", feedback: "positive", eventValue: 0.8 });

// 6) Reflect + job polling
const reflect = await post("/v1/memory/reflect", {
  docId: "swe_notes",
  types: ["semantic", "summary"],
  collection: "default"
}, { "Idempotency-Key": "reflect-001" });
// You can also pass conversationId to reflect from a conversation memory item.

const jobId = reflect.data.job.id;
while (true) {
  const job = await get(`/v1/jobs/${jobId}`);
  const status = job.data.job.status;
  if (status === "succeeded" || status === "failed") {
    console.log(job);
    break;
  }
  await sleep(2000);
}
Collections + ACL visibility
await post("/v1/docs", {
  docId: "private_notes",
  collection: "finance",
  text: "Confidential budget details...",
  visibility: "acl",
  acl: ["user:alice", "user:bob"]
}, { "Idempotency-Key": "idx-002" });
Server-side privileges (no end-user login)
await post("/v1/memory/recall", {
  query: "policy details",
  collection: "internal",
  principalId: "user:alice",
  privileges: ["role:employee", "dept:hr"],
  types: ["semantic"],
  k: 5
});

Visibility can be tenant, private, or acl. For ACL, include a list of allowed principals.

Visibility without end-user login

If your app does not want users to log in directly, you can still enforce visibility by having your backend call AtlasRAG with a service token and pass principalId and/or privileges in the payload. The server matches these against the item's visibility and ACL list. Enable this by setting ALLOW_PRINCIPAL_OVERRIDE=1 and using an admin service token. Never expose this token to the browser.

Write with ACL (server-side principal)
curl -X POST http://localhost:3000/v1/docs \
  -H "X-API-Key: ADMIN_SERVICE_TOKEN" \
  -H "Idempotency-Key: idx-003" \
  -H "Content-Type: application/json" \
  -d '{
    "docId": "hr_policy",
    "collection": "internal",
    "text": "Confidential HR policy...",
    "principalId": "user:alice",
    "privileges": ["role:employee", "dept:hr"],
    "visibility": "acl",
    "acl": ["user:alice", "dept:hr"]
  }'
Recall as a principal (server-side)
curl -X POST http://localhost:3000/v1/memory/recall \
  -H "X-API-Key: ADMIN_SERVICE_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "policy details",
    "collection": "internal",
    "principalId": "user:alice",
    "privileges": ["role:employee", "dept:hr"],
    "types": ["semantic"],
    "k": 5
  }'

If you use JWTs, principalId is derived from the token and should not be provided in the payload. privileges is only honored for admin service tokens when ALLOW_PRINCIPAL_OVERRIDE=1.

Architecture
  • Gateway (Node) handles auth, routing, and orchestration.
  • Vector store (C++ TCP service) stores embeddings and serves similarity search.
  • Postgres stores chunks, memory items, links, jobs, and idempotency keys.
  • OpenAI (or compatible) provides embeddings and generation.
  • Background jobs handle reflection, summarization, redundancy scoring, and lifecycle tasks.
  • Expired items are automatically swept and removed (vectors + DB rows), preventing orphan vectors.
  • Jobs retry with exponential backoff before transitioning to failed.
  • Job reruns are idempotent (derived memories are replaced, not duplicated).
  • Structured logs include request_id, tenant_id, and collection.
  • Prometheus metrics are exposed at /metrics (scoped to the tenant; admin sees all tenants).

Technical White Paper

Adaptive Memory Value + Lifecycle (AMV-L)

A value-driven tiering and retrieval-control policy for keeping long-term memory high-signal and cost-bounded.

Objective
Retain useful memory while bounding retrieval and prompt costs.
Output
Per-item tiering plus lifecycle actions (retain, compact, evict, synthesize).
Execution Model
Request-path event queue + asynchronous sweeps and telemetry checks.
1. Overview

AtlasRAG implements Adaptive Memory Value + Lifecycle (AMV-L) as a managed-resource policy, not a passive memory store. Every memory item carries a scalar value and an explicit tier: HOT, WARM, or COLD.

The core contract is bounded retrieval: R = HOT union Sample_k(WARM) (optionally tiny cold probe), then vector search is scoped to this set so online cost is driven by working-set size instead of total memory size.

2. Problem Framing

Production memory systems fail when retrieval cost scales with total retained memory:

  • Store-everything approaches create retrieval dilution and prompt bloat.
  • TTL-only approaches can remove useful knowledge while still allowing expensive wide scans before expiry.

AMV-L addresses both with two controls: value-driven lifecycle transitions and hard retrieval gating that excludes cold memory by default.

3. Incremental Value Update V(m)

Each memory item m has value V(m) >= 0. AtlasRAG updates value incrementally per event using recency decay plus event reinforcement:

Request-Time Update
V_next = clamp(
  V_prev * exp(-lambda * delta_t_days)
  + alpha * I_access
  + beta * I_contribute
  - gamma * I_negative,
  0,
  V_max
)
Notation
Symbol Interpretation
delta_t_daysTime since value_last_update_ts
I_access1 for retrieval/use access events, else 0
I_contribute1 only when memory contributed to a successful answer path
I_negativeFailure/negative-feedback penalty term
V_maxHard upper cap (default 1.0)

Default runtime parameters are environment-driven (for example MEMORY_ACCESS_ALPHA, MEMORY_CONTRIBUTION_BETA, MEMORY_VALUE_DECAY_LAMBDA, MEMORY_VALUE_MAX).

4. Tier Model + Hysteresis

Memory is partitioned into lifecycle tiers with hysteresis thresholds:

Tiering
HOT:   V >= theta_hot_up (or remain HOT until V < theta_hot_down)
WARM:  between hot and warm thresholds
COLD:  below theta_warm_down (or remain COLD until V >= theta_warm_up)

Separate up/down boundaries avoid oscillation around a single threshold. Pinned memories are prevented from demotion by tier transition logic.

5. Initialization Safety

New memory writes initialize in WARM with an initial value constrained to the warm band (theta_warm_up <= V_init < theta_hot_up). This prevents new uploads from being immediately cold-evictable.

Initialization also stamps value_last_update_ts and tier_last_update_ts, enabling true incremental decay from first write.

6. Retrieval Gating

Candidate memories are hard-bounded before vector search:

Bounded Retrieval Set
R = HOT union Sample_k(WARM)
COLD intersection R = empty

Optional cold probing exists as a tiny budget (MEMORY_RETRIEVAL_COLD_PROBE_EPSILON), disabled by default. When disabled, runtime checks enforce cold_candidates == 0.

7. Scoped Vector Search

Similarity search is executed only on chunk vectors belonging to the bounded memory set (R), not the full corpus. The gateway sends explicit candidate IDs to the vector service.

Complexity Target
online scan cost ~= O(|HOT| + k) at memory-selection stage
vector scan is bounded to chunks mapped from that set
8. Prompt Insertion Set

After dense/lexical fusion and reranking, only top results are inserted into prompts: S = Top_n(sim(q, R)). This bounds memory token footprint independently of total stored items.

Prompt telemetry records prompt_tokens_est, memory_tokens_est, and total_tokens_est per request.

9. Contribution Correctness

Contribution boost is applied after successful answer completion and only to memories actually injected into the generated prompt. This avoids over-crediting memories that were retrieved but not used.

Negative outcomes (task_fail, user_negative) apply a penalty term in the incremental update.

10. Request-Path Overhead

Request path records memory events into an in-memory queue. Background flush workers persist batched updates asynchronously, reducing synchronous DB write overhead on latency-sensitive paths.

This keeps online work close to "compute bounded candidates + run scoped retrieval", while value decay and lifecycle management run in periodic background sweeps.

11. Lifecycle Policy

Lifecycle sweeps enforce tier-safe policies on non-expired items:

Condition Action
expireddelete
pinnedkeep
tier == COLD and V < theta_evictevict
tier == COLD and V < summary_thresholdcompact
tier == HOT and policy says synthesizepromote/summarize
otherwiseretain
Policy Skeleton
if expired: delete
else if pinned: retain
else if tier == COLD and V < theta_evict: evict
else if tier == COLD and V < summary_threshold: compact
else if tier == HOT and synthesis_enabled: promote
else: retain
12. Cold Eviction Operator

Value-based eviction is restricted to cold memory: tier == COLD and V < theta_evict. Deletion removes both metadata rows and vector entries (with reconcile jobs when vector deletion fails).

13. Compaction Operator

Compaction targets cold, lower-value groups and replaces multiple items with a denser summary memory. This preserves semantic coverage while shrinking retrieval noise and storage footprint.

14. Promotion / Synthesis Operator

Promotion synthesizes reusable higher-order memories from high-value context (for example semantic or procedural distilled items). Tier transitions are always value-based; synthesis is an optional additional lifecycle operator.

15. TTL Precedence

TTL remains a hard retention boundary. Expiry deletes take precedence over value and tier state, and AMV-L operates only on non-expired items.

16. Telemetry + Acceptance Checks

AMV-L emits per-request telemetry for bounded-set verification and cost tracking:

  • hot_count, warm_sampled, cold_candidates.
  • retrieval_set_size, retrieval_bound.
  • vector_search_scanned_count.
  • prompt_tokens_est, memory_tokens_est, total_tokens_est.
  • Lifecycle transition/action events for promote, demote, compact, and delete.

Key acceptance targets: cold_candidates == 0 by default, bounded retrieval size, and latency percentiles (p50/p95/p99) that track bounded prompt and retrieval sets.

17. Resulting System Properties
  • Effective working set is explicitly bounded before similarity search.
  • Cold memory is excluded from default retrieval, reducing noise.
  • Prompt memory token usage is observable and controllable.
  • Value/tier state adapts online while heavy maintenance runs asynchronously.
18. Conclusion

AMV-L in AtlasRAG is implemented as an incremental, tiered control loop with explicit retrieval gates and scoped vector search. The net effect is better cost predictability and cleaner retrieval behavior than unconstrained global-memory scanning.

Use service tokens for server-to-server integrations.
Saved locally in your browser (localStorage).
Use /login to generate a JWT and save it automatically.
- Index at least one document before searching or asking.
- If you see Unauthorized, your token is invalid, expired, or revoked.
- Health check only verifies gateway to TCP reachability.
- API keys are created by admins via POST /v1/admin/service-tokens.
Login with Google SSO Login with Azure SSO Login with Okta SSO
Admins can issue service tokens for server-to-server use.
Advanced options
Defaults to your token subject if omitted.
Valid roles: admin, indexer, reader.
New API key (masked)
(not created)
Admins can enforce SSO-only or password-only login per tenant.
Uncheck all to disable SSO for this tenant.