Back to jobs
N
Human

AI Inference Performance Engineer

NVIDIA

US, CA, Santa Clara workday 2mo ago
Apply Now

Skills & Keywords

LLMGenerative AI

Job Description

We optimize and benchmark GenAI inference on NVIDIA's latest accelerators, defining the industry’s performance standards across language models, video generation, and speech workloads. We work directly within TensorRT-LLM, SGLang, and vLLM, building the tools that evaluate serving performance at scale. This team sits at the intersection of GPU performance engineering and public accountability. What You Will Be Doing: Drive industry benchmark results: own the end-to-end optimization pipeline, imp

View full posting

Similar Roles