llama3-java-hat Inference Benchmarks

Llama 3.2 1B Instruct (FP16) — Tokens/sec across HAT backends

These benchmarks are for fun only. Results are collected on GitHub Actions runners with variable performance characteristics. Do not use these numbers for serious performance comparisons. The OpenCL backend runs on PoCL (CPU-based OpenCL), not a real GPU — expect significantly lower throughput than actual GPU hardware.

GitHub Actions (PoCL OpenCL)

CPU runner baseline collected in GitHub Actions.

Google Cloud n1-standard-4 + T4 (Separate Graph)

Dedicated GPU runs collected from GCP T4 instances.