๐ ICN Demo Wiring - ROOT CAUSE FOUND!
Archived Document Notice (2026-02-12): This file is retained for historical context and may not reflect current code, APIs, runtime defaults, CI status, or deployment posture. Use active documentation under
docs/as authoritative.
Date: 2025-12-18 17:21
Time to Discovery: 2.5 hours
Status: โ
BUG IDENTIFIED - FIX AVAILABLE
๐ Root Cause
File: icn/crates/icn-net/src/session.rs
Line: 178
Bug: Double-binding to same UDP port
The Issue
// Line 165: QUIC endpoint binds to listen_addr
let mut endpoint = Endpoint::server(server_config, listen_addr)?;
// Line 168: Successfully listening
info!("QUIC endpoint listening on {}", endpoint.local_addr()?);
// Line 178: BUG - Tries to bind ANOTHER socket to same address!
let socket = tokio::net::UdpSocket::bind(local_addr).await?; // โ FAILS HERE
What Happens:
- QUIC endpoint successfully binds to port 19777
- Code then tries to create a second UDP socket on port 19777 for STUN queries
- Second bind fails with "Address already in use"
- Error propag ates up, triggers shutdown
- All actors stop
- Runtime exits
The Irony: The comment on line 176 says "We use the same socket" but the code creates a NEW socket instead!
โ The Fix
Option 1: Reuse QUIC Endpoint's Socket (Proper Fix)
Quinn's Endpoint exposes the underlying UDP socket. Use that instead of binding a new one:
// Instead of:
let socket = tokio::net::UdpSocket::bind(local_addr).await?;
// Do:
// Quinn should provide access to the underlying socket
// Need to check Quinn API for how to get it
Option 2: Disable STUN Discovery (Quick Workaround)
For demo purposes, we can just remove STUN discovery. The node will work on local network without it.
In demo.toml:
[network]
# Remove or set to empty
stun_servers = []
Or in code: Comment out lines 178-192 in session.rs
Option 3: Bind Before Creating Endpoint
// Bind socket first
let socket = tokio::net::UdpSocket::bind(listen_addr).await?;
// Get the actual bound address
let local_addr = socket.local_addr()?;
// Create endpoint using the socket (if Quinn supports this)
// Or: Do STUN discovery before creating endpoint
๐ Immediate Action Plan
For Demo (5 minutes)
Quickest path: Disable STUN in config
# Edit demo.toml - remove or comment out stun_servers
nano <demo-data-dir>/demo.toml
# Or: Don't pass stun_servers in config at all
Then daemon should start successfully!
For Proper Fix (30 minutes)
Check Quinn documentation for how to access underlying socket
Either:
- A) Reuse endpoint's socket for STUN queries
- B) Do STUN discovery before creating endpoint
- C) Use a different approach (separate STUN socket on different port)
Test fix
Submit PR with fix
๐ Impact Assessment
Why This Wasn't Caught in Tests
Tests probably:
- Don't enable STUN discovery, OR
- Use mock STUN servers, OR
- Test network actor in isolation without full initialization
Why This Affects Us Now
Our config has:
# From supervisor code - STUN servers are hard-coded
stun_servers = ["stun.l.google.com:19302", "stun1.l.google.com:19302"]
These get resolved and passed to session_manager.start(), triggering the buggy code path.
โจ Next Steps
Immediate (now):
cd <repo-root>/icn/crates/icn-net/src
# Quick fix: Comment out the problematic socket bind
# Edit session.rs line 178-179
Test (5 minutes):
cd <repo-root>/icn
cargo build --release
# Start daemon
./target/release/icnd --config <demo-data-dir>/demo.toml \
--gateway-enable \
--gateway-bind "127.0.0.1:8080" \
--gateway-jwt-secret "demo-secret-key-change-in-production"
Expected Result:
โ
QUIC endpoint listening
โ
Gateway API spawned
โ
Supervisor waiting for shutdown
โ
DAEMON RUNNING!
๐ฏ Confidence Level
Before: 45% full stack demo, 85% CLI demo
After fix: 90% full stack demo, 95% CLI demo
Time to working demo: 30-60 minutes (apply fix + test)
๐ Lessons Learned
- Time boxing worked - Would have found this eventually, but staying focused helped
- Following the error messages - "Address already in use" was real, just not what we thought
- Reading logs carefully - The timestamp sequence revealed the issue
- Grep is your friend - Finding the exact error context was key
- Sometimes bugs are obvious - Double-bind to same port is a classic mistake
๐ For Future Reference
When you see "Address already in use":
- First check: Is something else using the port? (we did this)
- Second check: Is the SAME process trying to bind twice? (should have checked this earlier!)
The smoking gun was: Actors stopping immediately after "QUIC endpoint listening" - that timing meant the error was in the same code path, not external.
Status: READY TO FIX AND TEST! ๐
Let's apply the fix and get this daemon running!