Skills & Keywords
Job Description
Configure autoscaling for serving; Deploy LLM models in production; Handle mixed workloads gracefully; Identify training bottlenecks; Implement batching and quantization; Improve throughput per dollar; Manage KV cache strategies; Measure GPU utilization; Minimize latency; Operate models on GPU clusters; Optimize LLM inference; Optimize attention implementations; Profile training runs; Translate client requirements into AI architectures; Tune LLM serving throughput;
View full postingSimilar Roles
IN_Senior Associate_AI/ML Developer_Application Technology_Advisory_Kolkata
Pwc
Research Analyst, Center for Security and Emerging Technology (CSET), Walsh School of Foreign Service (SFS)
Center For Security And Emerging Technology Cset
Director of Machine Learning & Artificial Intelligence
Dominos
Senior Data Scientist, Rider New Products
Lyft