Compile once.
Deploy anywhere.

ZeptonML extracts computational graphs from trained models and compiles them into minimal, deterministic binaries. No runtime. No framework. Just inference.

Container size15 GB
910280 MB
125× smaller
Cold start45 s
68 ms
5,625× faster
Memory~2 GB
2468 MB
42× lighter
Dockerfile
# Compile a torch.export model to a static binary
FROM scratch
ARG MODEL=model.pt2
COPY ${MODEL} /model.pt2
RUN zeptonml compile /model.pt2 -o /infer
ENTRYPOINT ["/infer"]
01

Zero runtime

No Python, no PyTorch, no CUDA runtime. A single static binary that runs on bare metal.

02

Deterministic memory

Every allocation planned at compile time. No GC pauses, no fragmentation, no surprises.

03

Cross-compilation

One build step produces binaries for x86, ARM, any target. No toolchain gymnastics.

04

Edge ready

Static binary for embedded deployment. If it has a processor, it runs your model.

Why MTEB Scores Don't Predict Performance on Your Data

Benchmark leaderboards measure generalization across tasks you'll never run. Evaluate embedding models on the only distribution that matters — yours.

Read more →
It is better to be vaguely right than exactly wrong.
— Carveth Read, 1898