Machine Learning Engineers – Recommender Systems

Location Santa Clara, California
Date Posted August 1, 2022
Category Engineering
Job Type Not Specified


The NVIDIA Merlin team is looking for Machine Learning Engineers, Data Scientists, and ML Infrastructure Engineers who are as passionate about recommender systems as we are to join our growing team. We re building the next generation of recommendation tools with the twin goals of accelerating ETL, training and inference on GPU and making recommenders easy to build and deploy at all stages of a company s RecSys journey. Building on our collective experience of the challenges we ve faced getting recommender systems into production, over the past two years the team has built three open source libraries with the goal of making it easier for RecSys Engineers. We are continuing to innovate in this space. We ve also competed in and won three recommendation challenges, including the ACM RecSys Challenge hosted by Twitter, and have had two conference papers accepted. We re out to win the hearts and minds of people who build recommender systems, and we want you on our team. Join us and help craft the future of RecSys!

What you ll be doing:

  • Defining and leading our strategy around integrating MLOps standard methodologies directly into Merlin libraries and framework.

  • Developing and improving our open source software (OSS) libraries likeMerlin-Models, Merlin-Systems,Transformers4Rec,NVTabularandHugeCTRto enable MLOps practitioners to easily integrate the Merlin framework into their stack.

  • Creating examples and internal recommender solutions to demonstrate and validate our libraries, helping to improve the experience of our users.

  • Providing your expertise around the deployment and monitoring of RecSys models in production.

  • Evaluating and contributing to other OSS libraries like RAPIDS, MLflow, Airflow, OpenTelemetry, Feast, Milvus, and others to integrate and upstream our solutions so that they are available across commonly used recommender ecosystems.

  • Identifying, Profiling, and understanding bottlenecks and performance issues at every stage of the recommendations pipeline and working with HPC Engineers to fix bottlenecks in GPU-based RecSys workflows.

  • What we re looking for:

  • 5+ years of experience as a ML Ops or ML Engineer, preferably building and maintaining distributed systems.

  • 2+ years of experience deploying and monitoring ML systems at scale across a range of models and platforms.

  • First-hand experience deploying and maintaining deep learning recommender systems and services in production at scale.

  • Experience developing and maintaining software libraries, especially open source, following industry standard methodologies.

  • The ability to share and communicate your ideas clearly to the wider recommender systems community through blog posts, GitHub projects, and forums.

  • Excellent communication and interpersonal skills are required, along with the ability to work in a dynamic, product oriented, distributed team.

  • Drop files here browse files ...