Version: v0.7.6

Horizontal scaling

This page explains why OpenWA runs as a single instance, which parts of it you can scale out, and how to grow capacity safely within that model. Read it before you reach for replicas: 3.

One instance per session volume

OpenWA is a single-process application. Run exactly one API instance per session-data volume (replicas: 1). Two instances sharing the same session storage corrupt WhatsApp authentication and can get the linked account logged out or banned. There is no safe way around this today — sticky sessions and shared storage do not fix it.

The core constraint: engine state lives in memory

Every running session is a live, stateful connection to WhatsApp:

a Chromium browser, on the default whatsapp-web.js engine, or
a WebSocket client, on the baileys engine (set ENGINE_TYPE=baileys).

OpenWA holds that connection — plus its reconnect, error, and status state — in in-memory maps inside the API process. There is no shared registry of which instance owns which session, no node-claim or lease, and no message-broker adapter coordinating engines across processes. When the process stops, the live connections stop with it; another process cannot pick them up.

That single fact drives the entire scaling model. A second instance has no way to know a session is already live elsewhere, so it tries to bring that session online too.

Two instances writing the same on-disk auth directory for one session corrupts it. WhatsApp's linked-device auth is not designed for concurrent writers, so you get a forced logout, a re-scan loop, or — worst case — a banned account. The danger peaks with AUTO_START_SESSIONS=true, where every instance tries to bring every session online the moment it boots.

Sticky sessions are not a workaround

Cookie or IP affinity at the load balancer reduces the windows where two instances touch the same session, but it does not close them — failover, a rolling deploy, or a brief overlap still produces two writers. Affinity is not a substitute for the missing claim/lease mechanism. Keep one instance.

OpenWA already supports shared external datastores. That is what lets one instance grow, and it is the foundation a future multi-instance design would build on. What it does not have is a way to share live engine state.

State	Where it lives	Shareable today?
Persistent data (session metadata, messages, webhooks, contacts)	SQLite, or PostgreSQL with `DATABASE_TYPE=postgres`	Yes — point the instance at a managed PostgreSQL
Cache and queue	In-memory, or Redis with `REDIS_ENABLED=true`	Yes — external Redis is supported
Media files	Local disk, or S3 / MinIO with `STORAGE_TYPE=s3`	Yes — S3-compatible storage is shareable
Live WhatsApp engine connection	In-memory map in the API process	No — this is the single-instance constraint

The first three rows move state out of the container, onto networked services that survive restarts and could one day back multiple instances. The fourth row cannot move — it is the WebSocket or browser the process is actively holding.

For how to migrate each layer onto its networked counterpart, see the Migration Guide. The environment variables themselves are in Configuration.

Scale up, not out

Because you are capped at one instance, capacity comes from vertical scaling (a bigger host) and engine choice — not more replicas.

Add CPU and RAM. The default whatsapp-web.js engine runs a Chromium instance per session, so memory is usually the first ceiling you hit. More sessions and higher message volume both want more RAM.
Consider Baileys. ENGINE_TYPE=baileys is a browser-free WebSocket client with a much smaller per-session footprint, so you fit more sessions on the same hardware. Review the engine trade-offs in Sessions before switching a live deployment.
Move the shared layers off-box. Switch to PostgreSQL and Redis so the database and cache stop competing with the engines for the host's resources under load.
Run independent instances for isolation. Need more total sessions than one host carries, or hard tenant separation? Run separate OpenWA deployments, each with its own session-data volume and its own routing. They are independent single-instance deployments — they do not share live session state, and that is exactly why this is safe.

Rough starting points

These are unbenchmarked starting figures for the default whatsapp-web.js engine, not guarantees. Baileys needs considerably less. Always size from your own monitoring.

Sessions	RAM	CPU
1–5	2 GB	2 cores
5–10	4 GB	4 cores
10–20	8 GB	8 cores

Know when you are at the ceiling

Watch host memory and CPU, and poll the health endpoints. /api/health returns the basic status; /api/health/ready returns 503 when a required database is unreachable or the instance is draining, which is what your orchestrator should gate traffic on.

The basic check is public — it takes no X-API-Key header — and reports liveness plus the running version:

curl http://localhost:2785/api/health

{
  "status": "ok",
  "timestamp": "2026-06-26T10:15:00.000Z",
  "version": "0.7.6"
}

The readiness check is also public. It probes the auth/audit (main) and data databases and is what your orchestrator should gate traffic on:

curl -i http://localhost:2785/api/health/ready

When both databases respond, it returns 200 with each dependency marked up:

{
  "status": "ok",
  "details": {
    "mainDatabase": { "status": "up" },
    "dataDatabase": { "status": "up" }
  }
}

When a required database is unreachable, it returns 503 with the failing dependency marked down — this is the response your probe will hit, so configure it to treat 503 as "not ready":

{
  "status": "error",
  "details": {
    "mainDatabase": { "status": "down" },
    "dataDatabase": { "status": "up" }
  }
}

While the instance is draining during a graceful shutdown, it returns 503 with a shutdown detail even if the databases are still up, so traffic stops before teardown:

{
  "status": "error",
  "details": {
    "shutdown": { "status": "draining" }
  }
}

When memory sits high under steady load, you have three levers, in order of effort: switch sessions to Baileys, give the host more RAM, or split sessions across a second independent instance. Adding replicas to the same deployment is not on that list.

The session, message, and other resource endpoints you would call alongside these checks do require an X-API-Key header — see Authentication for how to obtain a key.

The multi-node design is a sketch, not a feature

The project documents a future design for true horizontal scaling — a database-backed session registry, per-node session claims and leases, sticky routing, and a message-broker adapter for the real-time channel. None of it is implemented in v0.7.6.

So treat any multi-replica example you come across — a Docker Swarm service with replicas: 3, or a Kubernetes Deployment whose pods share one session volume — as a planning sketch, not a supported topology. Following it will corrupt your sessions. Until the claim/lease mechanism ships, the only correct value is replicas: 1.

Next steps

Deployment — production Docker setup, health checks, and reverse-proxy TLS.
Migration Guide — move persistent data to PostgreSQL, cache to Redis, and media to S3 / MinIO.
Configuration — the DATABASE_TYPE, REDIS_ENABLED, STORAGE_TYPE, and ENGINE_TYPE variables referenced above.

The core constraint: engine state lives in memory​

What you can share, and what you can't​

Scale up, not out​

Know when you are at the ceiling​

The multi-node design is a sketch, not a feature​

Next steps​

The core constraint: engine state lives in memory

What you can share, and what you can't

Scale up, not out

Know when you are at the ceiling

The multi-node design is a sketch, not a feature

Next steps