Skills & Keywords
BenchmarkingC++CachingCloud Infrastructure Cost OptimizationCloud infrastructureCloud infrastructure costCost OptimizationData PipelinesDebuggingDistributed SystemsGPU PerformanceGPU performance analysis
Job Description
Build benchmarking frameworks and performance dashboards for training and serving; Design and build ML training and inference efficiency systems; Develop ML tooling for debugging profiling optimization monitoring; Drive technical strategy for ML platform scalability reliability and cost efficiency; Improve GPU and resource utilization through scheduling resource management caching and workload optimization; Lead cross functional initiatives to improve ML engineer productivity; Optimize distribut
View full posting