Skip to main content

Drift Detection & Semantic Verification

Experimental Feature

Drift Detection and Semantic Verification are experimental features. They are provided for observability and governance purposes only. Drift signals are advisory and never affect the ALLOW/DENY verification decision. The API surface, scoring algorithms, and data formats may change without notice.

The Problem: Intent vs. Permission

Mandaitor's core verification answers a precise question: "Is this action within the delegate's permitted scope?" This is a necessary condition for trustworthy delegation, but it may not be sufficient.

Consider a mandate that allows an AI agent to perform construction.validation.* actions on a project. The agent could:

  1. Inspect foundations, then approve structural elements — aligned with intent
  2. Approve every element without inspection — technically permitted, but drifting from intent
  3. Repeatedly re-inspect already-approved elements — permitted, but anomalous

All three scenarios pass the existing ALLOW/DENY check. Drift Detection adds a second, advisory signal layer that helps principals and governance teams detect when an agent's behavior pattern diverges from the originally intended purpose of the delegation.

How It Works

Intent Declaration

When creating a mandate, the principal can optionally declare an intent:

{
"principal": { "type": "NATURAL_PERSON", "subject_id": "..." },
"delegate": { "type": "AGENT", "subject_id": "agent:validator-v3" },
"scope": {
"actions": ["construction.validation.*"],
"resources": ["monco:project:proj_ABC/*"],
"effect": "ALLOW"
},
"intent": {
"purpose": "Automated structural validation of Zone EG installations",
"expected_clusters": ["validation", "inspection"],
"drift_threshold": 0.7
}
}

The intent object is entirely optional. Mandates without intent declarations continue to work exactly as before — drift detection simply has less context to work with.

Semantic Graph

Each taxonomy now includes a semantic graph that models relationships between actions beyond simple hierarchy. The graph captures:

RelationshipMeaningExample
IMPLIESAction A logically leads to Action Binspect → approve
CONFLICTSActions A and B should not co-occurapprove ↔ reject
ESCALATES_TOAction A is a higher-authority version of Bflag → halt
PART_OFAction A is a sub-step of Action Bmeasure → inspect
PRECEDESAction A should happen before Action Bplan → execute

These relationships are auto-inferred from the taxonomy's existing structure (parent actions, tags, risk levels) and can be manually refined by taxonomy contributors.

Multi-Dimensional Drift Scoring

When drift detection is enabled (?drift=true), each verification request is scored across four dimensions:

DimensionWeightWhat It Measures
Semantic Distance0.4How far the current action is from the expected action clusters in the semantic graph
Sequence Deviation0.2Whether actions follow expected workflow patterns (e.g., inspect before approve)
Frequency Anomaly0.1Whether action frequencies deviate from historical baselines
Scope Expansion0.3Whether actions are creeping toward the edges of the permitted scope

The weighted aggregate produces a score between 0.0 (no drift) and 1.0 (maximum drift). When the aggregate exceeds the mandate's drift_threshold, the drift_detected flag is set to true.

Session Windows

Drift is computed over a sliding 1-hour session window. Each verification event is recorded in the session, and the drift score reflects the behavioral pattern within that window. Sessions are keyed by mandate_id + delegate_subject_id.

Cross-Mandate Agent Drift

In addition to per-mandate drift, the system computes an aggregate agent drift score across all active mandates for a given delegate. This captures scenarios where an agent behaves normally within each individual mandate but exhibits concerning patterns when viewed holistically.

Using Drift Detection

Requesting Drift Signals

Add ?drift=true to any verification request:

curl -X POST https://api.mandaitor.io/v1/verify?drift=true \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"delegate_subject_id": "agent:validator-v3",
"action": "construction.validation.approve",
"resource": "monco:project:proj_ABC/zone:EG/installation:stk_42"
}'

Response with Drift Signals

{
"decision": "ALLOW",
"mandate_id": "mnd_abc123",
"event_id": "evt_xyz789",
"semantic_signals": {
"drift_score": {
"semantic_distance": 0.12,
"sequence_deviation": 0.05,
"frequency_anomaly": 0.02,
"scope_expansion": 0.08,
"aggregate": 0.09,
"drift_detected": false
},
"conflicts": [],
"sequence_violations": [],
"agent_drift_score": 0.11
}
}

SDK Usage

import { MandaitorClient } from "@mandaitor/sdk";

const client = new MandaitorClient({ apiKey: "..." });

const result = await client.verifyWithDrift(
{
delegate_subject_id: "agent:validator-v3",
action: "construction.validation.approve",
resource: "monco:project:proj_ABC/zone:EG/installation:stk_42",
},
{ drift: true },
);

if (result.semantic_signals?.drift_score?.drift_detected) {
console.warn("⚠️ Drift detected:", result.semantic_signals.drift_score);
// Trigger governance workflow, notify principal, etc.
}

Drift Signals in Proof-of-Mandate VCs

When both ?pom=sd-jwt-vc and ?drift=true are requested, drift signals are included in the Proof-of-Mandate VC as selectively disclosable claims. This means:

  • The VC holder (delegate) can choose whether to reveal drift data to a verifier
  • Drift data is cryptographically bound to the verification event
  • Third-party auditors can verify drift claims without trusting the reporter

The following claims are added as SD-JWT disclosures:

  • drift_score — the full multi-dimensional drift score object
  • semantic_conflicts — any detected conflicts between actions
  • agent_drift_score — the cross-mandate aggregate drift score

Important Limitations

  1. Advisory only: Drift signals never change the ALLOW/DENY decision
  2. No enforcement: Mandaitor does not block actions based on drift scores
  3. Fixed weights: During the experimental phase, drift dimension weights are fixed
  4. 1-hour window: Session windows are fixed at 1 hour
  5. Opt-in: Drift detection must be explicitly requested per verification call
  6. Experimental API: The response format and scoring algorithm may change