The A3 priorities POST flaked ~1/5 — but ONLY under heavy host load (full ci.sh); 0/20 in isolation. Root cause PROVEN, not guessed (a probe showed if_at_entry=0, pre_enable_seq=0 — refuting the "interrupts enabled at entry" and "ran during spawn" theories): the check recorded which thread's BODY ran first, but a dispatched thread runs preemptible (initial RFLAGS.IF=1, sched.rs:247/442). When the HIGH-priority thread is correctly selected first, a timer tick landing between its dispatch and its order store triggers the (correct) deferred re-enqueue — briefly removing it from contention so the LOW thread runs and records order 0. Selection was right; the test measured a preemption-sensitive proxy. Under host load the dispatch->record window stretches, so the tick lands far more often — hence load-only, ~1/5. Fix — assert the SELECTION property directly and deterministically: - new Smp::peek_ready(core): the id reschedule would take as its local candidate (first READY from the queue front, mirroring pop_ready) WITHOUT dequeuing. Host-tested (peek_ready_predicts_the_pick_without_consuming): idempotent, and reschedule takes exactly what peek predicted. - priorities() now computes high-first by PEEKING that high is the next pick BEFORE enabling interrupts — immune to the artifact. It still runs both threads afterward to prove strict affinity (pinned, never stolen), but no longer asserts execution order. Also folded in os3 review hygiene: reset the run-order statics per call, hold interrupts off across BOTH spawns (close the restore-between-calls window), and fail explicitly if either spawn returns None instead of silently passing. Proof: 16/16 boots green under 6-CPU-hog host load (recreating the original trigger); 0/20 had ever failed in isolation. ci.sh green on all 5 lanes; sched host/loom tests green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> |
||
|---|---|---|
| crates | ||
| docs | ||
| test | ||
| .gitignore | ||
| Cargo.lock | ||
| Cargo.toml | ||
| ci.sh | ||
| README.md | ||
| rust-toolchain.toml | ||
OS/3
A from-scratch x86-64 microkernel in stable Rust (no nightly), UEFI-booted, capability-secured. Heavily commented. Mandate: rock solid — never goes down except on a real hardware fault.
Start here. This README is the entry point. Read the four docs under "The MUSTs" before changing anything — they are the standing, non-negotiable rules for this project. Everything else (subsystem notes, migration trackers, handovers) hangs off the roadmap.
The MUSTs (read these, in order)
| Doc | What it is — and the rules it binds you to |
|---|---|
| docs/VISION.md | the goal: a tiny capability substrate that personalities (DOS/C64/BeOS-style) run on. |
| docs/PRINCIPLES.md | the engineering principles + Definition of Done. #1 Rock solid (a kernel panic from any ring-3 action is a release blocker). #7 Prove the bug first (a fix lands with a test that failed before it; for concurrency, an exhaustive loom model). #10 Verify before you fix. #12 Honest reporting. |
| docs/TESTING.md | how we test, and why a green suite still ships critical bugs. KVM is the gate (TCG can't enforce real-CPU semantics). The three tests our suite kept missing — Conservation (assert resources return to baseline — catches leaks), Enforcement (do the forbidden thing and assert it traps — not "the bit is set"), Condition forced (drive the hard case: high RAM, exhaustion, cross-core). Ends with the per-change checklist — work it before calling anything done. |
| docs/ROADMAP.md | the feature backlog (what's done / what's next), grouped by layer. |
The one-line version of "done": a change is working when the feature runs; it is done only when the invariant it could break is guarded by a test that failed before the fix — conservation balanced, protection enforced (not just configured), and the hard case forced. If you can't name the invariant and the test that guards it, it isn't done.
Build & test (the gate)
cargo test # host tests (pure-logic crates) + loom models — fast inner loop
RUSTFLAGS="--cfg loom" cargo test -p os3-sched --lib # exhaustive concurrency models
bash ci.sh # THE GATE: host tests + KVM boot + the >4 GiB + tiny lanes
bash test/boot.sh kvm # primary boot gate alone (exit 33 + every milestone marker)
bash test/boot.sh kvm-hugemem # 16 GiB — forces high RAM / high CR3 (don't trust 6 GiB alone)
bash test/boot.sh tcg-bigmem # generic-CPU >4 GiB (portability second opinion)
A boot passes only if QEMU exits 33 and every marker in ci.sh is present —
a missing marker is a silently-skipped check, i.e. a failed boot. Never grep only for
boot OK.
Layout
crates/kernel— the kernel binary (boot, paging, scheduler glue, syscalls, drivers seam).crates/{vm,cap,sched,primitives,memory,rng,hal,acpi,task}— pure, host-tested logic (page tables, capabilities, the scheduler policy, the direct-map seam, …). Bit-twiddling lives here with unit tests, not in the kernel — the kernel keeps only the thinunsafe.docs/— the MUSTs above, plus subsystem notes (SCHEDULER.md), migration trackers (MIGRATION_ARBITRARY_RAM*.md), and dated handovers.
How we work
Two parallel sessions share main (primitives/features and hardening/paging), each in
its own git worktree, integrating via origin. Don't lane-cross into the other
stream's files without coordinating; never clobber a dirty worktree. Land small,
KVM-green commits.