Drift Detection & Semantic Verification

Experimental Feature

Drift Detection and Semantic Verification are experimental features. They are provided for observability and governance purposes only. Drift signals are advisory and never affect the ALLOW/DENY verification decision. The API surface, scoring algorithms, and data formats may change without notice.

The Problem: Intent vs. Permission

Mandaitor's core verification answers a precise question: "Is this action within the delegate's permitted scope?" This is a necessary condition for trustworthy delegation, but it may not be sufficient.

Consider a mandate that allows an AI agent to perform construction.validation.* actions on a project. The agent could:

Inspect foundations, then approve structural elements — aligned with intent
Approve every element without inspection — technically permitted, but drifting from intent
Repeatedly re-inspect already-approved elements — permitted, but anomalous

All three scenarios pass the existing ALLOW/DENY check. Drift Detection adds a second, advisory signal layer that helps principals and governance teams detect when an agent's behavior pattern diverges from the originally intended purpose of the delegation.

How It Works

Intent Declaration

When creating a mandate, the principal can optionally declare an intent:

{
  "principal": { "type": "NATURAL_PERSON", "subject_id": "..." },
  "delegate": { "type": "AGENT", "subject_id": "agent:validator-v3" },
  "scope": {
    "actions": ["construction.validation.*"],
    "resources": ["monco:project:proj_ABC/*"],
    "effect": "ALLOW"
  },
  "intent": {
    "purpose": "Automated structural validation of Zone EG installations",
    "expected_clusters": ["validation", "inspection"],
    "drift_threshold": 0.7
  }
}

The intent object is entirely optional. Mandates without intent declarations continue to work exactly as before — drift detection simply has less context to work with.

Semantic Graph

Each taxonomy now includes a semantic graph that models relationships between actions beyond simple hierarchy. The graph captures:

Relationship	Meaning	Example
`IMPLIES`	Action A logically leads to Action B	inspect → approve
`CONFLICTS`	Actions A and B should not co-occur	approve ↔ reject
`ESCALATES_TO`	Action A is a higher-authority version of B	flag → halt
`PART_OF`	Action A is a sub-step of Action B	measure → inspect
`PRECEDES`	Action A should happen before Action B	plan → execute

These relationships are auto-inferred from the taxonomy's existing structure (parent actions, tags, risk levels) and can be manually refined by taxonomy contributors.

Multi-Dimensional Drift Scoring

When drift detection is enabled (?drift=true), each verification request is scored across four dimensions:

Dimension	Weight	What It Measures
Semantic Distance	0.4	How far the current action is from the expected action clusters in the semantic graph
Sequence Deviation	0.2	Whether actions follow expected workflow patterns (e.g., inspect before approve)
Frequency Anomaly	0.1	Whether action frequencies deviate from historical baselines
Scope Expansion	0.3	Whether actions are creeping toward the edges of the permitted scope

The weighted aggregate produces a score between 0.0 (no drift) and 1.0 (maximum drift). When the aggregate exceeds the mandate's drift_threshold, the drift_detected flag is set to true.

Session Windows

Drift is computed over a sliding 1-hour session window. Each verification event is recorded in the session, and the drift score reflects the behavioral pattern within that window. Sessions are keyed by mandate_id + delegate_subject_id.

Cross-Mandate Agent Drift

In addition to per-mandate drift, the system computes an aggregate agent drift score across all active mandates for a given delegate. This captures scenarios where an agent behaves normally within each individual mandate but exhibits concerning patterns when viewed holistically.

Using Drift Detection

Requesting Drift Signals

Add ?drift=true to any verification request:

curl -X POST https://api.mandaitor.io/v1/verify?drift=true \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "delegate_subject_id": "agent:validator-v3",
    "action": "construction.validation.approve",
    "resource": "monco:project:proj_ABC/zone:EG/installation:stk_42"
  }'

Response with Drift Signals

{
  "decision": "ALLOW",
  "mandate_id": "mnd_abc123",
  "event_id": "evt_xyz789",
  "semantic_signals": {
    "drift_score": {
      "semantic_distance": 0.12,
      "sequence_deviation": 0.05,
      "frequency_anomaly": 0.02,
      "scope_expansion": 0.08,
      "aggregate": 0.09,
      "drift_detected": false
    },
    "conflicts": [],
    "sequence_violations": [],
    "agent_drift_score": 0.11
  }
}

SDK Usage

import { MandaitorClient } from "@mandaitor/sdk";

const client = new MandaitorClient({ apiKey: "..." });

const result = await client.verifyWithDrift(
  {
    delegate_subject_id: "agent:validator-v3",
    action: "construction.validation.approve",
    resource: "monco:project:proj_ABC/zone:EG/installation:stk_42",
  },
  { drift: true },
);

if (result.semantic_signals?.drift_score?.drift_detected) {
  console.warn("⚠️ Drift detected:", result.semantic_signals.drift_score);
  // Trigger governance workflow, notify principal, etc.
}

Drift Signals in Proof-of-Mandate VCs

When both ?pom=sd-jwt-vc and ?drift=true are requested, drift signals are included in the Proof-of-Mandate VC as selectively disclosable claims. This means:

The VC holder (delegate) can choose whether to reveal drift data to a verifier
Drift data is cryptographically bound to the verification event
Third-party auditors can verify drift claims without trusting the reporter

The following claims are added as SD-JWT disclosures:

drift_score — the full multi-dimensional drift score object
semantic_conflicts — any detected conflicts between actions
agent_drift_score — the cross-mandate aggregate drift score

Important Limitations

Advisory only: Drift signals never change the ALLOW/DENY decision
No enforcement: Mandaitor does not block actions based on drift scores
Fixed weights: During the experimental phase, drift dimension weights are fixed
1-hour window: Session windows are fixed at 1 hour
Opt-in: Drift detection must be explicitly requested per verification call
Experimental API: The response format and scoring algorithm may change

The Problem: Intent vs. Permission​

How It Works​

Intent Declaration​

Semantic Graph​

Multi-Dimensional Drift Scoring​

Session Windows​

Cross-Mandate Agent Drift​

Using Drift Detection​

Requesting Drift Signals​

Response with Drift Signals​

SDK Usage​

Drift Signals in Proof-of-Mandate VCs​

Important Limitations​