2026-05-01
Project Moirai - Bare-Metal Embodied Control
so i started a new project called Moirai... been really into this idea of running continuous control RL policies on edge hardware without the bloat of standard ml frameworks.
basically, most ml frameworks are built for cloud gpus, not for constrained robots. the latency is kinda crazy. so my goal is to get microsecond-level determinism on commodity cpus, completely bypassing floating-point matrix mul.
right now im architecting an end-to-end pipeline to train and deploy 1.58-bit (ternary-weight) reinforcement learning policies. the deployment layer is a custom bare-metal c++ inference runtime. by packing neural weights into high-density 2-bit structures and using raw avx2 compiler intrinsics, im replacing heavy fp32 gemm operations with fast integer dot products.
the architecture has three strict constraints:
- Zero heap allocation: a statically allocated memory arena guarantees deterministic execution without os garbage collection spikes
- Cache residency: 2-bit packing ensures the active continuous control policy stays entirely in the cpu's l1/l2 cache
- rl-specific topologies: unlike generalized llm quantization engines, this runtime is aggressively unrolled and specialized for fixed-shape mlps
the repo is strictly private while i finalize the mathematical stability of the quantization-aware training (qat) methodology. benchmarks and source code will be released publicly once the core engine is done.
back to the metal :))