Full-time · Unknown

Senior Staff Machine Learning Engineer, Data & Evaluation

This is a high-level leadership role focused on the critical 'Data & Eval' infrastructure that makes Generative AI reliable at scale. You will be the technical authority responsible for how Airbnb measures AI quality and iterates on models via feedback loops. It is a perfect blend of high-level strategy and deep technical execution in a company where AI is a top priority.

A
Hiring company

Airbnb

San Francisco, California, United States · Posted 2 May 2026

The role

Overview

This is a high-level leadership role focused on the critical 'Data & Eval' infrastructure that makes Generative AI reliable at scale. You will be the technical authority responsible for how Airbnb measures AI quality and iterates on models via feedback loops. It is a perfect blend of high-level strategy and deep technical execution in a company where AI is a top priority.

The hiring side

About Airbnb

Airbnb is an enterprise organisation working in technology/hospitality, based in San Francisco, California, United States.

Industry

Technology/Hospitality

Size

Enterprise

Location

San Francisco, California, United States

Go deeper

See Airbnb the way an insider would

Unlock company research, key people to know, recent moves, and how this role fits into their wider picture.

Research Airbnb

What they need

Requirements & Skills

Key Responsibilities

  • Define the overarching evaluation strategy and success metrics for GenAI systems
  • Architect and scale frameworks for golden sets, synthetic data, and automated regressions
  • Design the end-to-end data flywheel including instrumentation and feedback collection
  • Lead cross-functional quality initiatives to align product, ops, and engineering on performance standards
  • Productionize pipelines for model monitoring and continuous pre/post-deployment testing

Essential

  • Extensive expertise in evaluation methodologies including offline/online alignment and metric design
  • Hands-on experience with Generative AI orchestration, including RAG, tool calling, and memory management
  • Proven track record of building complex data pipelines and quality systems
  • Experience with human-in-the-loop evaluation and labeling workflows
  • Ability to lead large-scale technical projects across multiple departments
  • Strong background in statistical analysis, A/B testing, and power analysis

Preferred

  • Experience with LLM-as-judge frameworks and synthetic data generation
  • Background in developing guardrails and bias detection for AI systems
  • Familiarity with dataset versioning and governance in a large-scale production environment

Key Skills

Machine LearningGenerative AILLM Fine-tuningData EngineeringStatistical ModelingSystem ArchitecturePythonRAGEvaluation Frameworks

Networking

People to Know

Sign up to discover hiring managers, team leads, and key people at Airbnb.

Perks

Benefits & perks

  • Opportunity to work on state-of-the-art AI technology
  • Collaborative environment with industry-leading engineers
  • Impactful work affecting a global community of millions
  • Culture of long-term innovation and technical excellence

Next step

Apply now

Found via careers.airbnb.com

We use cookies to improve your experience

We use essential cookies for functionality and analytics cookies to understand how you use Career Steer and improve our services. You can manage your preferences or learn more in our Privacy Policy.