Offer summary

Qualifications:

10+ years in machine learning solutions and ML Ops, Experience with LLM APIs and serving, Proficiency in Python and PyTorch, Strong knowledge of Kubernetes and RDBMS, Master’s degree in Computer Science or related field..

Key responsabilities:

Design and maintain machine learning data pipelines

Optimize deep learning models for revenue impact

Drive cost efficiency and performance improvements

Implement software engineering best practices

Transition prototypes to production-ready systems

Job description

About Lily AI:

Lily AI is a female-founded retail AI company empowering retailers and brands by bridging the gap between merchant-speak and customer-speak. Leveraging computer vision, natural language processing, machine learning, and vertical-specific large language models (LLMs), the Lily platform enhances customer shopping experiences by analyzing product catalogs and automatically enriching the assortment with the natural language consumers actually use when they search and shop. The platform then distributes the optimized data across a retailer’s entire ecosystem from their website to Google Ads and beyond, ultimately delivering upwards of 9-figure revenue lift through improved product attribution, enhanced discovery, increased traffic and higher conversion. Learn more at www.lily.ai.

Overview:

As a Staff Machine Learning Engineer, you will design and develop scalable platforms and services focused on driving business impact, while raising the bar for technical excellence. You will work with other machine learning scientists, engineers, and product managers to help define the ML roadmap, make key architectural decisions around MLOps and contribute to the overall AI strategy of the company.

Your day-to-day will include:

Define, design, and maintain scalable Machine Learning data pipelines, training infrastructure, and inference systems.
Optimize, benchmark, and productionize deep learning models to extract high-value product attributes, that drive revenue up in onsite search and google ads offerings.
Drive cost efficiency and throughput improvements, owning relevant KPIs.
Promote and implement software engineering best practices across the team.
Shape and evolve the technical stack to meet emerging business and technical needs.
Transition research prototypes into robust, production-ready systems.
Deploy, monitor, and continuously improve models in production environments.
Optimize model performance, focusing on memory usage and latency.
Automate workflows by building efficient pipelines and orchestration frameworks.
Develop tools and shared libraries to boost team productivity and accelerate development.

What we consider critical for this role:

Experience:

10+ years in building large-scale machine learning solutions and ML Ops practices.
Working with LLM APIs and serving LLMs in-house at scale.

Technical Expertise:

Kubernetes, RDBMS, and API-driven development.
Model serving in low-latency, high-throughput use cases
Observability, data pipeline design, service scaling, and cost optimization.

Code Quality:

Strong emphasis on code hygiene, including review, documentation, testing, and CI/CD practices.

Programming Skills:

Proficiency in Python and PyTorch.
Extensive experience with the scientific Python ecosystem.

Cloud Development:

Proficiency in cloud-native application development.

Mindset:

Action-oriented with the ability to articulate complex concepts into thoughtful, actionable iterations.

What will set you apart from other candidates:

Proficient in writing high-performance production code in Python, with experience using frameworks like PyTorch.
Excellent communication and interpersonal skills.
Experience with Azure.
Expertise in deep learning-based Computer Vision and NLP models.
Proficiency with tools for managing the ML lifecycle, such as MLFlow and Kubeflow.
Proficiency with real-time serving and optimization tools for deep learning, including TFX, PyTorch JIT, TorchScript, and Seldon.
Master’s degree in Computer Science or a related field.

Details:

Currently, we are hiring from the below US states, Canada and Latin America – (candidates must be currently residing in Canada, Latin America or the following US states or open to relocating):

Alabama
Arizona
California
Colorado
Connecticut
Florida
Georgia
Illinois
Indiana
Massachusetts
Minnesota
Nevada
New Jersey
New York
North Carolina
Oregon
Pennsylvania
Rhode Island
Tennessee
Texas
Utah
Virginia
Washington

Compensation is competitive and will be determined based on a combination of experience, seniority, internal, external equity and location. For some context: this position in the US would pay between $140,000 - $220,000 USD per year, depending on experience and seniority. In other regions, compensation will be adjusted for local currency and local market rates. Lily AI compensation policy is calculated with a focus on equity and ownership.

Required profile