Trust-Gated Rate Limiting Implementation

Date: 2025-11-11 Phase: Production Hardening Feature: PR #3 - Trust-Gated Rate Limiting Status: ✅ Complete

Overview

Implemented dynamic rate limiting that adjusts message throughput based on peer trust classification. This provides adaptive DoS protection: strict limits for untrusted peers, generous limits for trusted partners.

Goals

Adaptive Security: Rate limits should reflect actual trust relationships
DoS Protection: Prevent untrusted peers from flooding the network
High Throughput: Enable fast communication with trusted partners
Dynamic Adjustment: Limits update automatically as trust changes
Backwards Compatibility: Work without trust graph (fallback mode)

Implementation

Architecture

Trust Classes and Limits:

pub struct TrustGatedRateLimitConfig {
    pub isolated: RateLimitConfig,   // score < 0.1: 10 msg/sec, burst 2
    pub known: RateLimitConfig,      // score 0.1-0.4: 50 msg/sec, burst 10
    pub partner: RateLimitConfig,    // score 0.4-0.7: 100 msg/sec, burst 20
    pub federated: RateLimitConfig,  // score 0.7+: 200 msg/sec, burst 50
    pub refill_interval: Duration,   // Shared: 100ms
}

Design Decisions:

20x Range: Federated peers get 20x more throughput than isolated (200 vs 10 msg/sec)
- Rationale: Strong incentive to build trust, severe punishment for untrusted
- Alternative considered: 10x range felt insufficient for DoS protection
Trust Score Mapping: TrustGraph computes final score as 70% direct + 30% transitive
- Direct trust edges need adjustment: direct_score * 0.7 = desired_final_score
- Example: Direct 1.0 → Final 0.7 (Federated class)
- This caught us in testing initially - scores weren't mapping correctly

Token Bucket Reset on Trust Change: Full capacity reset when class changes

fn update_config(&mut self, new_capacity: f64, new_refill_rate: f64, new_trust_class: Option<TrustClass>) {
    if self.trust_class != new_trust_class {
        self.tokens = new_capacity;  // Full reset
        self.last_refill = Instant::now();
    }
}

Rationale: Immediate benefit for trust upgrades encourages good behavior
Alternative: Gradual refill would delay benefits, less incentive

Per-Message Trust Lookup: Query trust graph on every message
- Rationale: Ensures rate limits reflect current trust state
- Cost: One async lock + hash lookup per message
- Optimization considered: Cache with TTL (deferred - premature)

Key Components

icn-net/src/rate_limit.rs:

TrustGatedRateLimitConfig - Configuration for all trust classes
RateLimiter::new_trust_gated() - Constructor with trust graph integration
TokenBucket::update_config() - Dynamic reconfiguration on trust change
Token bucket tracks trust_class: Option<TrustClass> to detect changes

icn-core/src/supervisor.rs:

Creates shared TrustGraph from persistent store (~/.icn/trust/)
Passes to both GossipActor (for access control) and NetworkActor (for rate limiting)
Trust lookup closure bridges sync/async contexts with block_in_place

icn-net/src/actor.rs:

NetworkActor::spawn() accepts optional trust_graph parameter
Conditional construction: trust-gated if graph provided, fallback otherwise
Logs mode: "Trust-gated rate limiting enabled" vs "Using fallback rate limiting"

Challenges & Solutions

Challenge 1: Trust Score Calculation

Problem: Initial tests failed because trust scores didn't map to expected classes.

Investigation:

// Test setup (WRONG):
graph.add_edge(TrustEdge::new(alice, bob, 0.5)).unwrap();
// Expected: Partner class (0.4-0.7)
// Actual: Known class (final score: 0.5 * 0.7 = 0.35)

Root Cause: TrustGraph applies 70% direct + 30% transitive formula. We were setting direct scores assuming 1:1 mapping.

Solution: Adjusted test scores to account for formula:

// CORRECT:
graph.add_edge(TrustEdge::new(alice, bob, 0.7)).unwrap();
// Final score: 0.7 * 0.7 = 0.49 → Partner class ✓

Lesson: Always understand the scoring algorithm. Added comments in tests explaining the calculation.

Challenge 2: Token Bucket Behavior on Trust Change

Problem: Test expected 50 fresh tokens after trust upgrade (Known→Federated), but bucket only had remaining tokens from previous capacity.

First Attempt: Cap tokens to new capacity

self.tokens = self.tokens.min(new_capacity);  // WRONG

Result: No benefit for trust upgrade!

Second Attempt: Reset to full capacity

self.tokens = new_capacity;  // CORRECT

Result: Immediate 50 tokens after upgrade ✓

Rationale:

Trust upgrades should have immediate positive effect
Encourages cooperative behavior
Models real-world trust: "You've proven yourself, here's more access"

Challenge 3: Backwards Compatibility

Problem: Existing code uses NetworkActor::spawn() without trust graph. Adding required parameter would break all call sites.

Solution: Optional parameter + fallback mode

pub async fn spawn(
    // ... other params
    trust_graph: Option<Arc<RwLock<TrustGraph>>>,  // Optional!
) -> Result<NetworkHandle>

All existing tests pass None, supervisor passes Some(trust_graph).

Fallback behavior: Uses RateLimitConfig::default() (100 msg/sec, burst 20) for all peers.

Testing

Test Coverage

test_trust_gated_rate_limiting_different_classes():

Creates 4 peers with different trust levels
Verifies each gets correct burst capacity (2, 10, 20, 50)
Confirms rate limiting at expected thresholds

test_trust_gated_rate_limiting_trust_class_change():

Peer starts as Known (burst 10)
Consumes all tokens → rate limited
Trust upgraded to Federated
Immediately gets 50 fresh tokens ✓
Consumes all 50 → rate limited at new threshold

test_trust_gated_config_for_class():

Validates configuration mappings
Ensures all trust classes have correct limits

Integration: All 140+ existing tests pass with changes:

Updated 4 integration test files to pass None for trust graph
Updated 1 unit test in icn-net/src/actor.rs
Fixed 3 test files to use new handle_message(&sender, msg) signature

Test Results

test result: ok. 19 passed; 0 failed; 3 ignored (icn-net)
test result: ok. 140+ passed; 0 failed (full suite)

All tests passing on first try after fixing trust score calculations.

Design Patterns

Pattern 1: Trust-Gated Resources

Concept: Resource allocation based on peer trust classification.

Application:

Rate limiting (this PR)
Future: Bandwidth quotas, storage allocation, computation credits

Implementation:

let config = match trust_class {
    TrustClass::Isolated => &config.isolated,
    TrustClass::Known => &config.known,
    TrustClass::Partner => &config.partner,
    TrustClass::Federated => &config.federated,
};

Benefits:

Automatic adaptation to trust changes
Clear incentives for building trust
Severe limits for attacks without trust investment

Pattern 2: Shared Trust Graph

Concept: Single trust graph shared across multiple actors.

Implementation:

let trust_graph_handle = Arc::new(RwLock::new(trust_graph));
// Pass to multiple consumers
let gossip = GossipActor::spawn(did, trust_lookup);
let network = NetworkActor::spawn(..., Some(trust_graph_handle));

Benefits:

Single source of truth for trust data
Updates immediately visible to all actors
Persistent storage (survives restarts)

Tradeoffs:

RwLock contention possible (read-heavy workload mitigates)
Could optimize with per-actor caches + invalidation

Metrics & Observability

Existing Metric (from earlier work):

icn_network_messages_rate_limited_total - Total messages blocked

Future Additions (planned):

icn_network_rate_limited_by_class{class="isolated|known|partner|federated"} - Per-class blocking
icn_network_active_peers_by_class{class="..."} - Trust distribution
icn_network_trust_class_changes_total - Rate limit adjustments
icn_trust_graph_lookup_duration_seconds - Performance monitoring

Performance Considerations

Per-Message Overhead:

Trust graph lock acquisition: ~μs (RwLock read)
Trust score computation: ~μs (hash lookup + arithmetic)
Token bucket operations: ~100ns (in-memory arithmetic)

Total: ~1-2μs per message (negligible compared to network I/O)

Scaling:

Trust graph lock is read-heavy (per-message) vs write-light (trust updates)
RwLock optimized for this pattern
Could add per-actor caches if contention observed (not needed yet)

Memory:

One TokenBucket per active peer: ~128 bytes
Trust graph scales with trust edges: ~200 bytes per edge
Total: Dominated by peer count, not message volume

Security Analysis

Threat Model

Attack: Message Flooding

Without trust: Attacker can send 10 msg/sec (Isolated class)
Impact: Limited to 0.1% of federated peer capacity
Mitigation: Automatic, no operator intervention needed

Attack: Trust Grinding

Scenario: Attacker builds trust slowly to gain higher limits
Cost: Requires actual trust edges (not free)
Detection: Trust graph auditing (future work)

Attack: Peer Impersonation

Prevention: TLS certificate verification (existing)
Trust: Based on DID identity, not IP address

Defense in Depth

TLS Certificate Verification - Prevents DID spoofing
Trust-Gated Rate Limiting - Throttles untrusted peers (this PR)
QUIC Stream Limits - Bounds concurrent streams (existing)
Message Size Limits - Prevents memory exhaustion (existing)

Layered protection: Each layer independent, failure in one doesn't compromise others.

Production Readiness

✅ Complete

Core implementation
Comprehensive testing (3 new tests)
Integration with supervisor
Backwards compatibility
Documentation (CLAUDE.md + CHANGELOG)

⏳ Future Work

Prometheus metrics for per-class rate limiting
Configuration via icn.toml (currently hardcoded defaults)
Trust graph metrics (edge count, score distribution)
Cache optimization if lock contention observed
Trust audit logs for security monitoring

Lessons Learned

Understand Dependent Algorithms: TrustGraph's 70/30 scoring required test score adjustments. Always verify assumptions about external components.
Test Trust Dynamics: Testing static trust is easy, testing trust changes caught the token bucket reset bug. Dynamic behavior matters.
Immediate Positive Feedback: Resetting tokens on trust upgrade creates strong incentive for good behavior. UX applies to automated systems too.
Optional Parameters for Evolution: Making trust_graph optional preserved all existing tests while enabling new functionality. Evolution-friendly APIs reduce churn.
Shared State Patterns: Arc<RwLock<T>> for shared trust graph works well for read-heavy workloads. Lock granularity matters less than read/write ratio.

References

Trust graph implementation: icn-trust/src/lib.rs
Rate limiting algorithm: Token bucket (standard, RFC-like behavior)
Trust score formula: 70% direct + 30% transitive (PageRank-inspired)

Commits

ac8203a - feat: Implement trust-gated rate limiting (PR #3)
1cad4fb - feat: Wire trust graph to network actor in supervisor
3644549 - docs: Document trust-gated rate limiting
035a524 - docs: Add trust-gated rate limiting to CHANGELOG

Lines Changed: ~350 additions, ~30 modifications across 9 files

Next Steps

Planned follow-up work:

Metrics - Per-class rate limiting metrics for observability
Configuration - Expose rate limit tuning via icn.toml
Trust Metrics - Monitor trust graph health and dynamics
Validation - Run daemon and observe trust-gated limiting in practice
Audit Logging - Security monitoring for trust manipulation

This completes the trust-gated rate limiting feature. The system now provides adaptive DoS protection that automatically adjusts to peer trustworthiness.