Skills & Keywords

Attention MechanismBatchingCUDAData EngineeringFlashAttentionGKEGPU ArchitectureGoogle CloudKV cacheKubernetesProfilingPython

Job Description

Configure autoscaling for serving; Deploy LLM models in production; Handle mixed workloads gracefully; Identify training bottlenecks; Implement batching and quantization; Improve throughput per dollar; Manage KV cache strategies; Measure GPU utilization; Minimize latency; Operate models on GPU clusters; Optimize LLM inference; Optimize attention implementations; Profile training runs; Translate client requirements into AI architectures; Tune LLM serving throughput;

View full posting

Apply Now

Lead Machine Learning Engineer, Inference & Performance

Skills & Keywords

Job Description

Similar Roles