Skills & Keywords
A/BA/B TestingB testingData evaluationExperimentationInformation RetrievalLanguage ModelsLanguage ProcessingLarge Language ModelsMachine LearningModel MonitoringNatural Language
Job Description
Analyze production logs for model failures; Build evaluation datasets; Build tooling for AI performance dashboards; Create rubrics and synthetic test cases; Design and run end to end experiments; Identify hallucinations and quality issues; Monitor AI performance in production; Productionize successful experiments; Track product and operational metrics; Validate improvements with engineers;
View full posting