OS/3 microkernel — from-scratch, capability-secured, preemptive, ring-3, stable Rust
Find a file
Hannes Lehmann 37fcc8df8d fix(kernel): make the priorities self-check measure SELECTION, not race on execution
The A3 priorities POST flaked ~1/5 — but ONLY under heavy host load (full ci.sh); 0/20
in isolation. Root cause PROVEN, not guessed (a probe showed if_at_entry=0,
pre_enable_seq=0 — refuting the "interrupts enabled at entry" and "ran during spawn"
theories): the check recorded which thread's BODY ran first, but a dispatched thread
runs preemptible (initial RFLAGS.IF=1, sched.rs:247/442). When the HIGH-priority thread
is correctly selected first, a timer tick landing between its dispatch and its order
store triggers the (correct) deferred re-enqueue — briefly removing it from contention
so the LOW thread runs and records order 0. Selection was right; the test measured a
preemption-sensitive proxy. Under host load the dispatch->record window stretches, so
the tick lands far more often — hence load-only, ~1/5.

Fix — assert the SELECTION property directly and deterministically:
- new Smp::peek_ready(core): the id reschedule would take as its local candidate (first
  READY from the queue front, mirroring pop_ready) WITHOUT dequeuing. Host-tested
  (peek_ready_predicts_the_pick_without_consuming): idempotent, and reschedule takes
  exactly what peek predicted.
- priorities() now computes high-first by PEEKING that high is the next pick BEFORE
  enabling interrupts — immune to the artifact. It still runs both threads afterward to
  prove strict affinity (pinned, never stolen), but no longer asserts execution order.

Also folded in os3 review hygiene: reset the run-order statics per call, hold interrupts
off across BOTH spawns (close the restore-between-calls window), and fail explicitly if
either spawn returns None instead of silently passing.

Proof: 16/16 boots green under 6-CPU-hog host load (recreating the original trigger);
0/20 had ever failed in isolation. ci.sh green on all 5 lanes; sched host/loom tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 13:39:44 +02:00
crates fix(kernel): make the priorities self-check measure SELECTION, not race on execution 2026-06-28 13:39:44 +02:00
docs fix(kernel): ledger thread dimension measures SLOT conservation, not liveness 2026-06-27 22:07:23 +02:00
test test(kernel): boot-end cross-resource conservation ledger (ROADMAP #31 / audit AF5) 2026-06-26 21:59:27 +02:00
.gitignore chore: gitignore the .ergon/ review-tool directory 2026-06-25 09:34:35 +02:00
Cargo.lock rng: add os3-rng crate (xoshiro256** + hardware entropy) and SYS_RANDOM 2026-06-24 11:36:08 +02:00
Cargo.toml rng: add os3-rng crate (xoshiro256** + hardware entropy) and SYS_RANDOM 2026-06-24 11:36:08 +02:00
ci.sh feat(kernel): no-panic static audit (ROADMAP #30a) + trim the chaos fuzzer's boot cost 2026-06-25 20:34:23 +02:00
README.md docs: README entry point + standing test-discipline MUSTs (conservation/enforcement/condition) 2026-06-26 08:53:41 +02:00
rust-toolchain.toml Initial commit: OS/3 microkernel + session handover 2026-06-22 20:45:48 +02:00

OS/3

A from-scratch x86-64 microkernel in stable Rust (no nightly), UEFI-booted, capability-secured. Heavily commented. Mandate: rock solid — never goes down except on a real hardware fault.

Start here. This README is the entry point. Read the four docs under "The MUSTs" before changing anything — they are the standing, non-negotiable rules for this project. Everything else (subsystem notes, migration trackers, handovers) hangs off the roadmap.


The MUSTs (read these, in order)

Doc What it is — and the rules it binds you to
docs/VISION.md the goal: a tiny capability substrate that personalities (DOS/C64/BeOS-style) run on.
docs/PRINCIPLES.md the engineering principles + Definition of Done. #1 Rock solid (a kernel panic from any ring-3 action is a release blocker). #7 Prove the bug first (a fix lands with a test that failed before it; for concurrency, an exhaustive loom model). #10 Verify before you fix. #12 Honest reporting.
docs/TESTING.md how we test, and why a green suite still ships critical bugs. KVM is the gate (TCG can't enforce real-CPU semantics). The three tests our suite kept missingConservation (assert resources return to baseline — catches leaks), Enforcement (do the forbidden thing and assert it traps — not "the bit is set"), Condition forced (drive the hard case: high RAM, exhaustion, cross-core). Ends with the per-change checklist — work it before calling anything done.
docs/ROADMAP.md the feature backlog (what's done / what's next), grouped by layer.

The one-line version of "done": a change is working when the feature runs; it is done only when the invariant it could break is guarded by a test that failed before the fix — conservation balanced, protection enforced (not just configured), and the hard case forced. If you can't name the invariant and the test that guards it, it isn't done.


Build & test (the gate)

cargo test                     # host tests (pure-logic crates) + loom models — fast inner loop
RUSTFLAGS="--cfg loom" cargo test -p os3-sched --lib   # exhaustive concurrency models
bash ci.sh                     # THE GATE: host tests + KVM boot + the >4 GiB + tiny lanes
bash test/boot.sh kvm          # primary boot gate alone (exit 33 + every milestone marker)
bash test/boot.sh kvm-hugemem  # 16 GiB — forces high RAM / high CR3 (don't trust 6 GiB alone)
bash test/boot.sh tcg-bigmem   # generic-CPU >4 GiB (portability second opinion)

A boot passes only if QEMU exits 33 and every marker in ci.sh is present — a missing marker is a silently-skipped check, i.e. a failed boot. Never grep only for boot OK.


Layout

  • crates/kernel — the kernel binary (boot, paging, scheduler glue, syscalls, drivers seam).
  • crates/{vm,cap,sched,primitives,memory,rng,hal,acpi,task}pure, host-tested logic (page tables, capabilities, the scheduler policy, the direct-map seam, …). Bit-twiddling lives here with unit tests, not in the kernel — the kernel keeps only the thin unsafe.
  • docs/ — the MUSTs above, plus subsystem notes (SCHEDULER.md), migration trackers (MIGRATION_ARBITRARY_RAM*.md), and dated handovers.

How we work

Two parallel sessions share main (primitives/features and hardening/paging), each in its own git worktree, integrating via origin. Don't lane-cross into the other stream's files without coordinating; never clobber a dirty worktree. Land small, KVM-green commits.