Zenith Temporary Coverage Runner Setup
Date: 2026-03-22 Status: Ready to execute — 30 min ops task Context: coverage-ci-decision.md
Why Zenith
ci-runner (3.8GB RAM) is proven insufficient for full-workspace cargo-llvm-cov. Zenith (54GB RAM, Ryzen 7 7800X3D) has no capacity constraints and is immediately available. It hosts WSL2 Ubuntu 24.04, which can run a GitHub Actions self-hosted runner natively.
This is a temporary assignment. After Hyperion's RAM RMA completes, a proper always-on runner VM will be provisioned there and Zenith will be deregistered. The workflow label stays stable through that transition.
Step 1 — Get the registration token
From a browser, go to:
https://github.com/InterCooperative-Network/icn/settings/actions/runners/new
Select Linux / x64. Copy the registration token (starts with A, expires after 1 hour).
Also note the runner download URL from that page — it shows the current version (e.g. v2.321.0).
Step 2 — On Zenith, in WSL2 Ubuntu-24.04
# Create an isolated runner directory (not inside the ICN repo)
mkdir -p ~/actions-runner-coverage && cd ~/actions-runner-coverage
# Download the runner — use the URL from Step 1 (current version shown on GitHub)
# Example (check GitHub for latest version):
curl -o actions-runner-linux-x64.tar.gz -L \
https://github.com/actions/runner/releases/download/v2.321.0/actions-runner-linux-x64-2.321.0.tar.gz
tar xzf ./actions-runner-linux-x64.tar.gz
# Configure with coverage label — this is the stable interface
./config.sh \
--url https://github.com/InterCooperative-Network/icn \
--token <REGISTRATION_TOKEN_FROM_STEP_1> \
--name "zentith-coverage" \
--labels "self-hosted,linux,x64,coverage,zentith"
When prompted for runner group: accept default (Default).
When prompted for work folder: accept default (_work).
Step 3 — Install Rust coverage tooling on Zenith WSL2
# Install or confirm pinned toolchain
rustup toolchain install 1.88.0
rustup component add llvm-tools-preview --toolchain 1.88.0
# Install cargo-llvm-cov
cargo +1.88.0 install cargo-llvm-cov --locked
# Verify
cargo +1.88.0 llvm-cov --version
# Expected: cargo-llvm-cov <version>
If rustup is not yet on Zenith WSL2:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
source ~/.cargo/env
Step 4 — Start the runner
cd ~/actions-runner-coverage
./run.sh
For a persistent background service (optional, survives WSL2 restart):
sudo ./svc.sh install
sudo ./svc.sh start
sudo ./svc.sh status
Verify it appears online:
https://github.com/InterCooperative-Network/icn/settings/actions/runners
Look for zentith-coverage with status Idle.
Step 5 — Validate before touching ci.yml
On Zenith WSL2, with the ICN repo cloned or accessible:
cd /path/to/icn/icn # Rust workspace root (contains Cargo.toml)
cargo +1.88.0 llvm-cov --workspace --lcov --output-path /tmp/lcov-validate.info 2>&1 | tail -20
ls -lh /tmp/lcov-validate.info
# Expected: file exists, > 0 bytes, runtime < 20 min
Do not update ci.yml until this completes cleanly.
Step 6 — Update ci.yml (minimal diff)
File: .github/workflows/ci.yml, the coverage: job.
Before:
coverage:
name: Test Coverage
needs: [changes]
if: needs.changes.outputs.docs_only != 'true'
timeout-minutes: 45
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- name: Free disk space
uses: ./.github/actions/free-disk-space
with:
verbose: 'true'
aggressive: 'true'
- name: Install build tools
uses: ./.github/actions/install-build-tools
- uses: dtolnay/rust-toolchain@stable
- name: Install cargo-tarpaulin
run: cargo install cargo-tarpaulin --locked
- name: Pre-tarpaulin cleanup
run: |
rm -rf ~/.cargo/registry/cache || true
rm -rf ~/.cargo/git/db || true
df -h /
- name: Generate coverage
run: cargo tarpaulin --workspace --timeout 300 --out Xml --output-dir ./coverage
working-directory: ./icn
continue-on-error: ${{ env.GATE_RATCHET_PHASE_COVERAGE != 'blocking' }}
- name: Upload coverage to Codecov
if: hashFiles('./icn/coverage/cobertura.xml') != ''
uses: codecov/codecov-action@v5
with:
files: ./icn/coverage/cobertura.xml
fail_ci_if_error: false
After:
coverage:
name: Test Coverage
needs: [changes]
if: needs.changes.outputs.docs_only != 'true'
timeout-minutes: 30
runs-on: [self-hosted, linux, x64, coverage]
steps:
- uses: actions/checkout@v6
- name: Set up Rust toolchain
# Uses the pinned toolchain from rust-toolchain.toml (1.88.0)
run: rustup show
- name: Install cargo-llvm-cov
run: cargo install cargo-llvm-cov --locked
- name: Generate coverage
run: cargo llvm-cov --workspace --lcov --output-path ./coverage/lcov.info
working-directory: ./icn
continue-on-error: ${{ env.GATE_RATCHET_PHASE_COVERAGE != 'blocking' }}
- name: Upload coverage to Codecov
if: hashFiles('./icn/coverage/lcov.info') != ''
uses: codecov/codecov-action@v5
with:
files: ./icn/coverage/lcov.info
fail_ci_if_error: false
What changed:
runs-on:ubuntu-latest→[self-hosted, linux, x64, coverage]timeout-minutes: 45 → 30- Removed:
Free disk space,Install build tools,dtolnay/rust-toolchain@stable,Install cargo-tarpaulin,Pre-tarpaulin cleanup - Coverage tool: tarpaulin → llvm-cov
- Output format: Cobertura XML → LCOV (Codecov accepts both)
Create a PR from a chore/ branch:
git checkout -b chore/coverage-llvm-cov-zenith
# edit the file
git add .github/workflows/ci.yml
git commit -m "chore(ci): migrate coverage to llvm-cov on self-hosted coverage runner"
gh pr create --title "chore(ci): migrate coverage to llvm-cov on self-hosted coverage runner" \
--body "Switches the coverage job from tarpaulin on GitHub-hosted runners to cargo-llvm-cov on the self-hosted coverage runner (Zenith WSL2 temporarily, Hyperion post-RMA).
**Why:** ci-runner (3.8GB RAM) is insufficient for full-workspace instrumented builds. Zenith (54GB RAM) handles it without constraints.
**Gate:** coverage remains \`observational\` — this PR does not block merges.
**Validation:** Full workspace llvm-cov run completed on Zenith WSL2 before this PR was opened.
**Runner label:** \`[self-hosted, linux, x64, coverage]\` — stable across Zenith → Hyperion migration."
Rollback
If Zenith becomes unavailable before the PR merges:
- The workflow is unchanged (still on
ubuntu-latestwith tarpaulin) - No action needed
If Zenith goes offline after the PR merges and the coverage label has no active runner:
- Coverage jobs will queue and eventually time out
- Gate is
observational— no merges are blocked - Options: (a) restart Zenith WSL2 runner, (b) temporarily re-point label to a k3s-worker
If k3s-worker becomes the temporary fallback:
# On k3s-worker-1 (10.8.30.41, 15GB RAM, 28GB disk):
curl -fsSL https://sh.rustup.rs | sh -s -- -y --default-toolchain 1.88.0
source ~/.cargo/env
rustup component add llvm-tools-preview
cargo install cargo-llvm-cov --locked
# Register runner with same labels
mkdir ~/actions-runner-coverage && cd ~/actions-runner-coverage
# download + ./config.sh with same --labels as Zenith
Disk concern: if the _work directory fills (28GB can be tight for large instrumented builds), add a step to ci.yml to clean target/llvm-cov-target after the run.
Long-Term Migration to Hyperion (post-RMA)
When Hyperion returns with working RAM:
- Provision a runner VM on Hyperion — recommended spec: 8+ vCPU, 16GB RAM, 100GB disk
- Install Rust 1.88.0 + cargo-llvm-cov + sccache on the VM
- Register the runner with labels:
self-hosted,linux,x64,coverage,hyperion - Verify a full coverage run completes cleanly on Hyperion
- Deregister Zenith:
cd ~/actions-runner-coverage && ./config.sh remove --token <REMOVAL_TOKEN> - The
ci.ymlworkflow label[self-hosted, linux, x64, coverage]requires no change
The label is the stable interface. The machine behind it changes transparently.
Completed Execution (2026-03-22)
All steps were executed. Results:
Validation run (on Zentith WSL2, before ci.yml change):
- Duration: 11 minutes 9 seconds (22:50:44 → 22:01:29 local)
- Peak RAM: 7.2GB of 54GB — no swap used
- Output:
/tmp/lcov-validate.info, 288,731 LCOV lines, valid source file references - Exit code: 0
CI run (PR #1395, after ci.yml change):
- Runner:
zentith-coverage(self-hosted, Zenith WSL2) - Duration: 11 minutes 21 seconds
Generate coveragestep: successUpload coverage to Codecovstep: success — first successful Codecov upload in project history- PR: https://github.com/InterCooperative-Network/icn/pull/1395
Runner process:
- Started with:
nohup ./run.sh > /tmp/gh-runner.log 2>&1 & - systemd service approach was not used — hung in WSL2 during
svc.sh install - Runner is a background process tied to the WSL2 session
⚠️ Persistence note: The runner does NOT survive WSL2 restarts or Windows reboots. After a restart:
cd ~/actions-runner-coverage && nohup ./run.sh > /tmp/gh-runner.log 2>&1 &
To make it persistent, add to ~/.bashrc or set up a cron @reboot entry — the systemd approach hung and was skipped.
Success Condition — Achieved ✓
- Push to
maintriggers the coverage job onzentith-coverage - Job completes in 11 min 21 sec (well under 30 min limit)
lcov.infoproduced and uploaded to Codecov- First coverage data in Codecov dashboard for this project
Gate remains observational until a meaningful coverage baseline is established.