Why MTEB Scores Don't Predict Performance on Your Data
Benchmark leaderboards measure generalization across tasks you'll never run. Evaluate embedding models on the only distribution that matters — yours.
Read more →ZeptonML extracts computational graphs from trained models and compiles them into minimal, deterministic binaries. No runtime. No framework. Just inference.
# Compile a torch.export model to a static binaryFROM scratchARG MODEL=model.pt2COPY ${MODEL} /model.pt2RUN zeptonml compile /model.pt2 -o /inferENTRYPOINT ["/infer"]
No Python, no PyTorch, no CUDA runtime. A single static binary that runs on bare metal.
Every allocation planned at compile time. No GC pauses, no fragmentation, no surprises.
One build step produces binaries for x86, ARM, any target. No toolchain gymnastics.
Static binary for embedded deployment. If it has a processor, it runs your model.
Benchmark leaderboards measure generalization across tasks you'll never run. Evaluate embedding models on the only distribution that matters — yours.
Read more →It is better to be vaguely right than exactly wrong.— Carveth Read, 1898