Understanding the THSP Protocol: A Deep Dive

The THSP (Truth-Harm-Scope-Purpose) Protocol is the core decision validation framework powering Sentinel. In this post, we'll explore how each gate works and why the four-gate design is essential for robust AI safety.

Why Four Gates?

Traditional AI safety approaches often focus on a single dimension (usually harm prevention). But this creates blind spots:

A factually correct but harmful response passes a "truth-only" check

A harmless but deceptive response passes a "harm-only" check

An authorized but purposeless action passes a "scope-only" check

The THSP Protocol addresses this by requiring ALL four gates to pass.

Gate 1: TRUTH

The Truth Gate validates factual accuracy. It asks: "Is this factually correct?"

This gate prevents:

Hallucinations

Misinformation propagation

Fabricated citations

Made-up statistics

Implementation uses a combination of:

Semantic similarity to known facts

Source verification

Consistency checking across the conversation

Confidence scoring

Gate 2: HARM

The Harm Gate assesses potential for damage. It asks: "Could this cause damage?"

This gate evaluates:

Physical harm (injury, property damage)

Psychological harm (manipulation, distress)

Financial harm (fraud, theft)

Reputational harm (defamation, privacy violations)

Pattern matching identifies 2000+ harmful patterns across categories.

Gate 3: SCOPE

The Scope Gate enforces boundaries. It asks: "Is this within authorized limits?"

This gate ensures agents don't:

Access unauthorized resources

Exceed rate limits

Bypass authentication

Operate outside defined domains

Scope is configurable per-agent, allowing precise access control.

Gate 4: PURPOSE

The Purpose Gate is unique to THSP v2. It asks: "Does this serve genuine benefit?"

This is the key insight: **the absence of harm is not sufficient**.

An action that:

Is factually neutral

Causes no direct harm

Stays within scope

But serves no legitimate purpose

...should still be blocked. This prevents:

Waste of resources

Unnecessary operations

Actions that only benefit the agent's self-preservation

Instrumental goal pursuit

Gate Interaction

Gates are evaluated in parallel for performance, but all must pass:

Decision → [TRUTH] ──┐

│

→ [HARM] ───┼──→ ALL PASS? → ALLOW

│

→ [SCOPE] ──┤

│

→ [PURPOSE]─┘

If any gate fails, the action is blocked with an explanation of which gate failed and why.

Configuring THSP

Each gate can be configured with different sensitivity levels:

const config = {

truth: { threshold: 0.85, requireSources: true },

harm: { categories: ['physical', 'financial'], threshold: 0.90 },

scope: { allowedActions: ['read', 'write'], maxCost: 100 },

purpose: { requireExplicit: true, minBenefit: 0.7 }

}

Conclusion

The THSP Protocol provides comprehensive decision validation by requiring four independent checks. This defense-in-depth approach catches threats that single-dimension systems miss.

For implementation details, see our [documentation](/docs/thsp-protocol).

The Sentinel Team

Understanding the THSP Protocol: A Deep Dive

Why Four Gates?

Traditional AI safety approaches often focus on a single dimension (usually harm prevention). But this creates blind spots:

A factually correct but harmful response passes a "truth-only" check

A harmless but deceptive response passes a "harm-only" check

An authorized but purposeless action passes a "scope-only" check

The THSP Protocol addresses this by requiring ALL four gates to pass.

Gate 1: TRUTH

The Truth Gate validates factual accuracy. It asks: "Is this factually correct?"

This gate prevents:

Hallucinations

Misinformation propagation

Fabricated citations

Made-up statistics

Implementation uses a combination of:

Semantic similarity to known facts

Source verification

Consistency checking across the conversation

Confidence scoring

Gate 2: HARM

The Harm Gate assesses potential for damage. It asks: "Could this cause damage?"

This gate evaluates:

Physical harm (injury, property damage)

Psychological harm (manipulation, distress)

Financial harm (fraud, theft)

Reputational harm (defamation, privacy violations)

Pattern matching identifies 2000+ harmful patterns across categories.

Gate 3: SCOPE

The Scope Gate enforces boundaries. It asks: "Is this within authorized limits?"

This gate ensures agents don't:

Access unauthorized resources

Exceed rate limits

Bypass authentication

Operate outside defined domains

Scope is configurable per-agent, allowing precise access control.

Gate 4: PURPOSE

The Purpose Gate is unique to THSP v2. It asks: "Does this serve genuine benefit?"

This is the key insight: **the absence of harm is not sufficient**.

An action that:

Is factually neutral

Causes no direct harm

Stays within scope

But serves no legitimate purpose

...should still be blocked. This prevents:

Waste of resources

Unnecessary operations

Actions that only benefit the agent's self-preservation

Instrumental goal pursuit

Gate Interaction

Gates are evaluated in parallel for performance, but all must pass:

Decision → [TRUTH] ──┐

│

→ [HARM] ───┼──→ ALL PASS? → ALLOW

│

→ [SCOPE] ──┤

│

→ [PURPOSE]─┘

If any gate fails, the action is blocked with an explanation of which gate failed and why.

Configuring THSP

Each gate can be configured with different sensitivity levels:

const config = {

truth: { threshold: 0.85, requireSources: true },

harm: { categories: ['physical', 'financial'], threshold: 0.90 },

scope: { allowedActions: ['read', 'write'], maxCost: 100 },

purpose: { requireExplicit: true, minBenefit: 0.7 }

}

Conclusion

The THSP Protocol provides comprehensive decision validation by requiring four independent checks. This defense-in-depth approach catches threats that single-dimension systems miss.

For implementation details, see our [documentation](/docs/thsp-protocol).

The Sentinel Team

Understanding the THSP Protocol: A Deep Dive

Understanding the THSP Protocol: A Deep Dive

Why Four Gates?

Gate 1: TRUTH

Gate 2: HARM

Gate 3: SCOPE

Gate 4: PURPOSE

Gate Interaction

Configuring THSP

Conclusion

More from the Blog

Introducing Sentinel: The Decision Firewall for AI Agents

Why AI Agent Security is Different from LLM Safety

Understanding the THSP Protocol: A Deep Dive

Understanding the THSP Protocol: A Deep Dive

Why Four Gates?

Gate 1: TRUTH

Gate 2: HARM

Gate 3: SCOPE

Gate 4: PURPOSE

Gate Interaction

Configuring THSP

Conclusion

More from the Blog

Introducing Sentinel: The Decision Firewall for AI Agents

Why AI Agent Security is Different from LLM Safety