Haimaker.ai Logo

Inference Benchmarks

Performance metrics across hardware, software, and model configurations

Early Preview

We are working to certify more internal benchmarks to be published. If you're interested in providing hardware or have questions, email benchmarks@haimaker.ai.

Total Benchmarks

21

GPU Models Tested

1

Frameworks

1

Recent Benchmarks

View All & Filter →
ConfigurationOutput TPSInput TPSEnergy Cost (kWh/MT)
NVIDIA A100-PCIE-40GB (1x) - Mistral-Nemo-Instruct
NVIDIANVIDIA A100-PCIE-40GBmistral
3,541.626,567.890.01
NVIDIA H100 80GB HBM3 (8x) - gpt-oss-120b
NVIDIANVIDIA H100 80GB HBM3openai
18,672.4750,200.550.02
NVIDIA H100 80GB HBM3 (8x) - llama-2-70b-hf
NVIDIANVIDIA H100 80GB HBM3meta-llama
668.64855.760.79
NVIDIA H100 80GB HBM3 (8x) - llama-3.3-70b-instruct
NVIDIANVIDIA H100 80GB HBM3meta-llama
9,219.6016,108.820.06
NVIDIA H200 NVL (2x) - mistral-nemo-instruct-2407
NVIDIANVIDIA H200 NVLmistralai
12,204.4847,690.470.01
NVIDIA H200 NVL (2x) - qwen3-30b-a3b
NVIDIANVIDIA H200 NVLqwen
6,124.3851,413.770.00
NVIDIA H200 NVL (2x) - allam-7b-instruct-preview
NVIDIANVIDIA H200 NVLhumain-ai
11,481.6445,184.120.01
NVIDIA H200 NVL (2x) - llama-2-70b-hf (50% Max Batch Token)
NVIDIANVIDIA H200 NVLmeta-llama
4,620.818,844.220.03
NVIDIA H200 NVL (2x) - llama-2-70b-hf
NVIDIANVIDIA H200 NVLmeta-llama
5,012.7710,466.050.03
NVIDIA H200 NVL (2x) - gpt-oss-120b
NVIDIANVIDIA H200 NVLopenai
3,166.0611,929.370.01