golgotha ~/build>~/local/llama/bin/llama-bench --hf-repo unsloth/gemma-4-E4B-it-GGUF ggml_cuda_init: found 1 CUDA devices (Total VRAM: 23983 MiB): Device 0: NVIDIA GeForce RTX 5090 Laptop GPU, compute capability 12.0, VMM: yes, VRAM: 23983 MiB | model | size | params | backend | ngl | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | gemma4 E4B Q4_K - Medium | 4.62 GiB | 7.52 B | CUDA | -1 | pp512 | 9555.89 ± 101.42 | | gemma4 E4B Q4_K - Medium | 4.62 GiB | 7.52 B | CUDA | -1 | tg128 | 165.64 ± 0.19 | build: e3f542d8 (4236)