The request enters DataSitr before any model call. Tenant policy, route configuration, and request metadata are loaded at the edge of the pipeline.
01Architecture whitepaper
Foreword.
To our guests, partners, and evaluators:
DataSitr lets Saudi organizations use modern AI systems without sending raw personal data across borders by default. The gateway sits between your apps and the AI providers your tenants choose; it detects personal data, applies the privacy transformation the lane requires, and routes the request by sensitivity, tenant policy, and Saudi residency rules.
DataSitr is not a chatbot wrapper or a generic gateway. It is a Saudi-context privacy boundary built around three commitments: detect personal data thoroughly in both Arabic and English, vault what should not leave the Kingdom, and record every routing decision so a buyer or a regulator can verify what happened — not take our word for it.
This is a live-pilot architecture description, not a certification or accreditation claim. What follows is intentionally precise: each section describes what the system does today, the artifacts it produces, and the limits we are honest about. Read this whitepaper alongside the trust, compliance, and resources pages — those carry the dated evidence behind the architecture described here.
Every technical claim in this document links to the page that proves it live; where a number matters, we send you to the artifact rather than printing a figure that can go stale.
Context worth stating up front: PDPL enforcement is operational. SDAIA confirmed 48 enforcement decisions in January 2026 with administrative fines up to SAR 5 million (doubled for repeat violations) and up to two years' imprisonment for intentional sensitive-data violations. The technical work this whitepaper describes exists to make compliance demonstrable, not asserted, in that environment.
data-source-date 2026-01 · SDAIA enforcement context · see /trust and /compliance for the dated evidence behind every figure above. attested
02Operating model
Gateway operating model
The gateway intercepts each request before it reaches an AI provider, changes the payload according to privacy risk, and records the decision for later review. The four steps below — intercept, detect, transform, route + record — happen for every request in scope.
Presidio, spaCy, Saudi recognizers, and Arabic NER operate together. Regex is one layer, not the architecture.
Direct identifiers become typed placeholders where reversible protection is allowed. Highest-risk text can remain intact only for in-Kingdom processing.
The policy engine chooses green, amber, red, or blocked. Compliance metadata is written alongside the processing record.
Current production posture: as of 2026-05-20, the live DataSitr deployment is Alibaba ACK Riyadh primary with scoped GCP Dammam drill-standby evidence. The May 4 customer-route cutover to ACK passed a 4-hour soak; the May 16 Dammam drill covers DNS / GKE / TLS routing only. The platform runs Alibaba KMS startup bootstrap, encrypted vault storage, machine-readable compliance records, active alerting, and dated continuity evidence. What remains separate from current claims: cross-cloud database replication, auth failover, data-tier failover, HSM-backed custody, and a fully refreshed same-origin browser/session proof pack on this exact baseline.
03Detector research
Arabic detector research
The hard part is Arabic and Saudi context: names, organizations, government identifiers, mixed Arabic-English prompts, and safe Arabic prose that should not be redacted. DataSitr treats this as a measured detector program, not a single model toggle.
The Arabic model sits behind structural and contextual gates. It improves recall without letting every Arabic noun become a person.
Saudi-specific recognizers cover local document shapes and name signals that generic PII libraries do not prioritize.
The trust and benchmark pages publish dated, sanitized detector artifacts — the exact precision and recall figures live on the benchmark page so the claim can be inspected without a slide deck.
Hard negatives, literary Arabic, and support-text cases are part of the detector discipline so privacy protection does not become unusable redaction.
04Vault
Vault, tokenization, rehydration
For reversible lanes, DataSitr replaces detected entities with typed placeholders and stores originals in a tenant-scoped encrypted vault. Rehydration is allowed only in the requesting tenant and request context.
2. Generate typed placeholders such as [[PERSON:01]].
3. Encrypt original values with AES-256-GCM under tenant-scoped keys.
4. Re-scan transformed text before external eligibility.
5. Rehydrate approved responses only for the original tenant and request context.
This is not format-preserving encryption and not stateless masking. The encrypted state exists because the product must support auditable, tenant-scoped rehydration for approved responses.
05Routing
Three-lane routing
The policy decision is intentionally conservative. Each request becomes green, amber, red, or blocked based on identifiability, sensitivity, tenant policy, and configured provider paths.
-
Green — tokenized external
Detected direct identifiers become typed placeholders, then the transformed text is rescanned before it can route to eligible global providers.
-
Amber — pseudonymized in-Kingdom
The text is transformed, but processing stays on operator-configured Saudi-hosted provider paths.
-
Red — raw in-Kingdom or blocked
Highest-risk categories, including PDPL Article 1(11)-defined sensitive data, stay intact only for configured in-Kingdom execution. If no compliant path exists, the request is blocked.
06Compliance records
Compliance records
Every architecture decision needs a review surface. DataSitr records machine-readable processing metadata per request: classification, destination, legal or policy basis, and evidence material for export.
Structured processing records connect each routed request to its classification and purpose context.
External eligibility and in-Kingdom routing decisions become inspectable records rather than hidden provider calls.
Export, deletion, consent, and breach-register workflows are part of the same compliance operating surface.
Reviewer exports are designed for verification. The public compliance page explains what is available and what remains outside current claims.
07Evidence boundary
Evidence boundary
The whitepaper is intentionally precise: live Saudi-hosted pilot, detector benchmarks, encrypted vaulting, compliance records, customer-route HA evidence, and dated continuity evidence. It does not convert those facts into regulator approval, SOC 2, ISO 27001, external pen-test, HSM custody, full-vault verification, or full-region tolerance claims.
08What to verify
Buyer verification path
A serious whitepaper should end with things a buyer can open. Start with the public artifacts, then request the signed reviewer bundle when control-level mappings are needed.
Three claim boundaries hold the whole document together: the gateway sits before the model provider, so the provider sees only the payload allowed by lane policy; the Arabic NER work is presented as verifiable engineering evidence — what it catches, how it is gated, and where to verify — not academic theater; and every public constraint resolves to the compliance page, which keeps the canonical constraints list. This is a live-pilot architecture description with customer-route HA now proven on ACK and Alibaba KMS startup bootstrap — it is not a regulator-approval, certification, HSM-custody, or full-region tolerance claim.
- Controls 177-control public matrix summary Open the JSON or Markdown summary, then request the signed reviewer bundle for control-level references.
- Detector Public precision / recall artifacts Review dated detector outputs and compare the Arabic/Saudi PII story against the benchmark page.
- Runtime Status, trust, and deployment pages Use status for current response checks, trust for dated proof, and deployment for topology and lane diagrams.
Evaluate the gateway with the evidence open.
Read the architecture here, then verify every claim against the dated artifacts — nothing is asserted that you cannot open.