Back to jobs
E
Human

Lead Machine Learning Engineer, Inference & Performance

Egen

Remote aijobs 1d ago
Apply Now

Get roles like this in your inbox

New agentic AI jobs, curated every Thursday. No spam.

Skills & Keywords

Attention MechanismBatchingCUDAData EngineeringFlashAttentionGKEGPU ArchitectureGoogle CloudKV cacheKubernetesProfilingPython

Job Description

Configure autoscaling for serving; Deploy LLM models in production; Handle mixed workloads gracefully; Identify training bottlenecks; Implement batching and quantization; Improve throughput per dollar; Manage KV cache strategies; Measure GPU utilization; Minimize latency; Operate models on GPU clusters; Optimize LLM inference; Optimize attention implementations; Profile training runs; Translate client requirements into AI architectures; Tune LLM serving throughput;

View full posting

Similar Roles