Inference Benchmarks

Performance metrics across hardware, software, and model configurations

Early Preview

We are working to certify more internal benchmarks to be published. If you're interested in providing hardware or have questions, email benchmarks@haimaker.ai.

Total Benchmarks

21

GPU Models Tested

1

Frameworks

1

Recent Benchmarks

View All & Filter →

Configuration	Output TPS	Input TPS	Energy Cost (kWh/MT)
NVIDIA A100-PCIE-40GB (1x) - Mistral-Nemo-Instruct NVIDIANVIDIA A100-PCIE-40GBmistral	3,541.62	6,567.89	0.01
NVIDIA H100 80GB HBM3 (8x) - gpt-oss-120b NVIDIANVIDIA H100 80GB HBM3openai	18,672.47	50,200.55	0.02
NVIDIA H100 80GB HBM3 (8x) - llama-2-70b-hf NVIDIANVIDIA H100 80GB HBM3meta-llama	668.64	855.76	0.79
NVIDIA H100 80GB HBM3 (8x) - llama-3.3-70b-instruct NVIDIANVIDIA H100 80GB HBM3meta-llama	9,219.60	16,108.82	0.06
NVIDIA H200 NVL (2x) - mistral-nemo-instruct-2407 NVIDIANVIDIA H200 NVLmistralai	12,204.48	47,690.47	0.01
NVIDIA H200 NVL (2x) - qwen3-30b-a3b NVIDIANVIDIA H200 NVLqwen	6,124.38	51,413.77	0.00
NVIDIA H200 NVL (2x) - allam-7b-instruct-preview NVIDIANVIDIA H200 NVLhumain-ai	11,481.64	45,184.12	0.01
NVIDIA H200 NVL (2x) - llama-2-70b-hf (50% Max Batch Token) NVIDIANVIDIA H200 NVLmeta-llama	4,620.81	8,844.22	0.03
NVIDIA H200 NVL (2x) - llama-2-70b-hf NVIDIANVIDIA H200 NVLmeta-llama	5,012.77	10,466.05	0.03
NVIDIA H200 NVL (2x) - gpt-oss-120b NVIDIANVIDIA H200 NVLopenai	3,166.06	11,929.37	0.01