DataSitr — Architecture Whitepaper

01Architecture whitepaper

Foreword.

To our guests, partners, and evaluators:

DataSitr lets Saudi organizations use modern AI systems without sending raw personal data across borders by default. The gateway sits between your apps and the AI providers your tenants choose; it detects personal data, applies the privacy transformation the lane requires, and routes the request by sensitivity, tenant policy, and Saudi residency rules.

DataSitr is not a chatbot wrapper or a generic gateway. It is a Saudi-context privacy boundary built around three commitments: detect personal data thoroughly in both Arabic and English, vault what should not leave the Kingdom, and record every routing decision so a buyer or a regulator can verify what happened — not take our word for it.

This is a live-pilot architecture description, not a certification or accreditation claim. What follows is intentionally precise: each section describes what the system does today, the artifacts it produces, and the limits we are honest about. Read this whitepaper alongside the trust, compliance, and resources pages — those carry the dated evidence behind the architecture described here.

Every technical claim in this document links to the page that proves it live; where a number matters, we send you to the artifact rather than printing a figure that can go stale.

Context worth stating up front: PDPL enforcement is operational. SDAIA confirmed 48 enforcement decisions in January 2026 with administrative fines up to SAR 5 million (doubled for repeat violations) and up to two years' imprisonment for intentional sensitive-data violations. The technical work this whitepaper describes exists to make compliance demonstrable, not asserted, in that environment.

data-source-date 2026-01 · SDAIA enforcement context · see /trust and /compliance for the dated evidence behind every figure above. attested

Sulaiman Husam Abonami Founder & Architect

02Operating model

Gateway operating model

The gateway intercepts each request before it reaches an AI provider, changes the payload according to privacy risk, and records the decision for later review. The four steps below — intercept, detect, transform, route + record — happen for every request in scope.

01 Intercept

The request enters DataSitr before any model call. Tenant policy, route configuration, and request metadata are loaded at the edge of the pipeline.

02 Detect

Presidio, spaCy, Saudi recognizers, and Arabic NER operate together. Regex is one layer, not the architecture.

03 Transform

Direct identifiers become typed placeholders where reversible protection is allowed. Highest-risk text can remain intact only for in-Kingdom processing.

04 Route + record

The policy engine chooses green, amber, red, or blocked. Compliance metadata is written alongside the processing record.

Current production posture: as of 2026-05-20, the live DataSitr deployment is Alibaba ACK Riyadh primary with scoped GCP Dammam drill-standby evidence. The May 4 customer-route cutover to ACK passed a 4-hour soak; the May 16 Dammam drill covers DNS / GKE / TLS routing only. The platform runs Alibaba KMS startup bootstrap, encrypted vault storage, machine-readable compliance records, active alerting, and dated continuity evidence. What remains separate from current claims: cross-cloud database replication, auth failover, data-tier failover, HSM-backed custody, and a fully refreshed same-origin browser/session proof pack on this exact baseline.

03Detector research

Arabic detector research

The hard part is Arabic and Saudi context: names, organizations, government identifiers, mixed Arabic-English prompts, and safe Arabic prose that should not be redacted. DataSitr treats this as a measured detector program, not a single model toggle.

Backbone Wojood-warmstarted Arabic NER

The Arabic model sits behind structural and contextual gates. It improves recall without letting every Arabic noun become a person.

Saudi layer National IDs, IBANs, phones, CRs, local names

Saudi-specific recognizers cover local document shapes and name signals that generic PII libraries do not prioritize.

Benchmark Measured recall gain over vanilla Presidio

The trust and benchmark pages publish dated, sanitized detector artifacts — the exact precision and recall figures live on the benchmark page so the claim can be inspected without a slide deck.

Guardrail False-positive controls for Arabic safe text

Hard negatives, literary Arabic, and support-text cases are part of the detector discipline so privacy protection does not become unusable redaction.

Open benchmark page Open precision / recall JSON Read detector trust notes

04Vault

Vault, tokenization, rehydration

For reversible lanes, DataSitr replaces detected entities with typed placeholders and stores originals in a tenant-scoped encrypted vault. Rehydration is allowed only in the requesting tenant and request context.

Implemented behavior 1. Detect entities in the inbound prompt.
2. Generate typed placeholders such as [[PERSON:01]].
3. Encrypt original values with AES-256-GCM under tenant-scoped keys.
4. Re-scan transformed text before external eligibility.
5. Rehydrate approved responses only for the original tenant and request context.

This is not format-preserving encryption and not stateless masking. The encrypted state exists because the product must support auditable, tenant-scoped rehydration for approved responses.

05Routing

Three-lane routing

The policy decision is intentionally conservative. Each request becomes green, amber, red, or blocked based on identifiability, sensitivity, tenant policy, and configured provider paths.

Green — tokenized external
Detected direct identifiers become typed placeholders, then the transformed text is rescanned before it can route to eligible global providers.
Amber — pseudonymized in-Kingdom
The text is transformed, but processing stays on operator-configured Saudi-hosted provider paths.
Red — raw in-Kingdom or blocked
Highest-risk categories, including PDPL Article 1(11)-defined sensitive data, stay intact only for configured in-Kingdom execution. If no compliant path exists, the request is blocked.

06Compliance records

Compliance records

Every architecture decision needs a review surface. DataSitr records machine-readable processing metadata per request: classification, destination, legal or policy basis, and evidence material for export.

RoPA Records of Processing Activities

Structured processing records connect each routed request to its classification and purpose context.

Transfers Transfer register entries

External eligibility and in-Kingdom routing decisions become inspectable records rather than hidden provider calls.

Rights Subject-rights workflows

Export, deletion, consent, and breach-register workflows are part of the same compliance operating surface.

Export Signed evidence packs

Reviewer exports are designed for verification. The public compliance page explains what is available and what remains outside current claims.

07Evidence boundary

Evidence boundary

The whitepaper is intentionally precise: live Saudi-hosted pilot, detector benchmarks, encrypted vaulting, compliance records, customer-route HA evidence, and dated continuity evidence. It does not convert those facts into regulator approval, SOC 2, ISO 27001, external pen-test, HSM custody, full-vault verification, or full-region tolerance claims.

Open published constraints Open trust evidence Open live status

08What to verify

Buyer verification path

A serious whitepaper should end with things a buyer can open. Start with the public artifacts, then request the signed reviewer bundle when control-level mappings are needed.

Three claim boundaries hold the whole document together: the gateway sits before the model provider, so the provider sees only the payload allowed by lane policy; the Arabic NER work is presented as verifiable engineering evidence — what it catches, how it is gated, and where to verify — not academic theater; and every public constraint resolves to the compliance page, which keeps the canonical constraints list. This is a live-pilot architecture description with customer-route HA now proven on ACK and Alibaba KMS startup bootstrap — it is not a regulator-approval, certification, HSM-custody, or full-region tolerance claim.

Controls 177-control public matrix summary Open the JSON or Markdown summary, then request the signed reviewer bundle for control-level references.
Detector Public precision / recall artifacts Review dated detector outputs and compare the Arabic/Saudi PII story against the benchmark page.
Runtime Status, trust, and deployment pages Use status for current response checks, trust for dated proof, and deployment for topology and lane diagrams.

Control matrix JSON Control matrix Markdown Trust report JSON Reviewer pack brief

Evaluate the gateway with the evidence open.

Read the architecture here, then verify every claim against the dated artifacts — nothing is asserted that you cannot open.

Request a pilot → Read the full trust & evidence pack →