Drift Detection & Semantic Verification
Drift Detection and Semantic Verification are experimental features. They are provided for observability and governance purposes only. Drift signals are advisory and never affect the ALLOW/DENY verification decision. The API surface, scoring algorithms, and data formats may change without notice.
The Problem: Intent vs. Permission
Mandaitor's core verification answers a precise question: "Is this action within the delegate's permitted scope?" This is a necessary condition for trustworthy delegation, but it may not be sufficient.
Consider a mandate that allows an AI agent to perform construction.validation.* actions
on a project. The agent could:
- Inspect foundations, then approve structural elements — aligned with intent
- Approve every element without inspection — technically permitted, but drifting from intent
- Repeatedly re-inspect already-approved elements — permitted, but anomalous
All three scenarios pass the existing ALLOW/DENY check. Drift Detection adds a second, advisory signal layer that helps principals and governance teams detect when an agent's behavior pattern diverges from the originally intended purpose of the delegation.
How It Works
Intent Declaration
When creating a mandate, the principal can optionally declare an intent:
{
"principal": { "type": "NATURAL_PERSON", "subject_id": "..." },
"delegate": { "type": "AGENT", "subject_id": "agent:validator-v3" },
"scope": {
"actions": ["construction.validation.*"],
"resources": ["monco:project:proj_ABC/*"],
"effect": "ALLOW"
},
"intent": {
"purpose": "Automated structural validation of Zone EG installations",
"expected_clusters": ["validation", "inspection"],
"drift_threshold": 0.7
}
}
The intent object is entirely optional. Mandates without intent declarations continue
to work exactly as before — drift detection simply has less context to work with.
Semantic Graph
Each taxonomy now includes a semantic graph that models relationships between actions beyond simple hierarchy. The graph captures:
| Relationship | Meaning | Example |
|---|---|---|
IMPLIES | Action A logically leads to Action B | inspect → approve |
CONFLICTS | Actions A and B should not co-occur | approve ↔ reject |
ESCALATES_TO | Action A is a higher-authority version of B | flag → halt |
PART_OF | Action A is a sub-step of Action B | measure → inspect |
PRECEDES | Action A should happen before Action B | plan → execute |
These relationships are auto-inferred from the taxonomy's existing structure (parent actions, tags, risk levels) and can be manually refined by taxonomy contributors.
Multi-Dimensional Drift Scoring
When drift detection is enabled (?drift=true), each verification request is scored
across four dimensions:
| Dimension | Weight | What It Measures |
|---|---|---|
| Semantic Distance | 0.4 | How far the current action is from the expected action clusters in the semantic graph |
| Sequence Deviation | 0.2 | Whether actions follow expected workflow patterns (e.g., inspect before approve) |
| Frequency Anomaly | 0.1 | Whether action frequencies deviate from historical baselines |
| Scope Expansion | 0.3 | Whether actions are creeping toward the edges of the permitted scope |
The weighted aggregate produces a score between 0.0 (no drift) and 1.0 (maximum drift).
When the aggregate exceeds the mandate's drift_threshold, the drift_detected flag
is set to true.
Session Windows
Drift is computed over a sliding 1-hour session window. Each verification event is
recorded in the session, and the drift score reflects the behavioral pattern within that
window. Sessions are keyed by mandate_id + delegate_subject_id.
Cross-Mandate Agent Drift
In addition to per-mandate drift, the system computes an aggregate agent drift score across all active mandates for a given delegate. This captures scenarios where an agent behaves normally within each individual mandate but exhibits concerning patterns when viewed holistically.
Using Drift Detection
Requesting Drift Signals
Add ?drift=true to any verification request:
curl -X POST https://api.mandaitor.io/v1/verify?drift=true \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"delegate_subject_id": "agent:validator-v3",
"action": "construction.validation.approve",
"resource": "monco:project:proj_ABC/zone:EG/installation:stk_42"
}'
Response with Drift Signals
{
"decision": "ALLOW",
"mandate_id": "mnd_abc123",
"event_id": "evt_xyz789",
"semantic_signals": {
"drift_score": {
"semantic_distance": 0.12,
"sequence_deviation": 0.05,
"frequency_anomaly": 0.02,
"scope_expansion": 0.08,
"aggregate": 0.09,
"drift_detected": false
},
"conflicts": [],
"sequence_violations": [],
"agent_drift_score": 0.11
}
}
SDK Usage
import { MandaitorClient } from "@mandaitor/sdk";
const client = new MandaitorClient({ apiKey: "..." });
const result = await client.verifyWithDrift(
{
delegate_subject_id: "agent:validator-v3",
action: "construction.validation.approve",
resource: "monco:project:proj_ABC/zone:EG/installation:stk_42",
},
{ drift: true },
);
if (result.semantic_signals?.drift_score?.drift_detected) {
console.warn("⚠️ Drift detected:", result.semantic_signals.drift_score);
// Trigger governance workflow, notify principal, etc.
}
Drift Signals in Proof-of-Mandate VCs
When both ?pom=sd-jwt-vc and ?drift=true are requested, drift signals are included
in the Proof-of-Mandate VC as selectively disclosable claims. This means:
- The VC holder (delegate) can choose whether to reveal drift data to a verifier
- Drift data is cryptographically bound to the verification event
- Third-party auditors can verify drift claims without trusting the reporter
The following claims are added as SD-JWT disclosures:
drift_score— the full multi-dimensional drift score objectsemantic_conflicts— any detected conflicts between actionsagent_drift_score— the cross-mandate aggregate drift score
Important Limitations
- Advisory only: Drift signals never change the ALLOW/DENY decision
- No enforcement: Mandaitor does not block actions based on drift scores
- Fixed weights: During the experimental phase, drift dimension weights are fixed
- 1-hour window: Session windows are fixed at 1 hour
- Opt-in: Drift detection must be explicitly requested per verification call
- Experimental API: The response format and scoring algorithm may change