Match score not available

Machine Learning Team Lead

Remote:

Full Remote

Experience:

Senior (5-10 years)

Work from:

Australia

Offer summary

Qualifications:

Strong experience in MLOps pipelines., Proficiency in Python programming., Hands-on experience with AWS services., Leadership experience in technical teams..

Key responsabilities:

Lead MLOps efforts and scale infrastructure.
Mentor and grow a team of engineers.

Leonardo.Ai https://leonardo.ai/

11 - 50 Employees

See all jobs

Job description

Leonardo.Ai seeks a Machine Learning Team Lead to drive and scale our AI infrastructure.

At Leonardo.Ai, we are advancing our generative AI platform to empower millions, regardless of expertise, with intuitive tools for creating high-quality images and videos. Now part of the Canva family, we're ready to build a world-class R&D team to seamlessly integrate AI products, tools, and features, making creativity limitless for nearly a quarter of a billion users.

The Role:

As a Machine Learning Team Lead, you will lead our MLOps efforts, develop scalable infrastructure, and mentor a growing team of engineers. You will bridge the gap between research and production, ensuring robust deployment, monitoring, and optimisation of machine learning models to support the development of next-generation AI products. This is an opportunity to shape best practices, drive innovation, and contribute to Leonardo’s AI evolution.

What You'll Do:

Technical Leadership & Strategy

Define and implement best practices for MLOps infrastructure, cloud integration, and model deployment at scale.
Work closely with research scientists, software engineers, and data engineers to align technical strategies with business goals.
Stay ahead of the curve on emerging technologies and guide the team in adopting best-in-class tools and methodologies.

MLOps Infrastructure Development

Design, build, and maintain end-to-end machine learning pipelines, covering data ingestion, model training, deployment, monitoring, and retraining.
Develop reusable tools and frameworks to accelerate experimentation, deployment, and model versioning.
Integrate workflow automation tools such as ComfyUI nodes, optimising for performance and scalability.

Cloud & DevOps Integration

Oversee cloud infrastructure implementation and management, primarily in AWS (e.g., S3, EC2, SageMaker), using infrastructure-as-code tools like Terraform.
Establish robust CI/CD pipelines tailored for machine learning workflows to ensure smooth transitions from research to production.
Optimise resource allocation and manage cloud costs efficiently.

Data Engineering & Management

Develop and manage scalable ETL pipelines to process and store large datasets efficiently.
Automate data ingestion and transformation workflows while ensuring data integrity, security, and compliance.
Enhance data accessibility for research and product teams.

Model Deployment & Monitoring

Lead the deployment of machine learning models in production, focusing on scalability, performance, and reliability.
Implement robust monitoring solutions to track model performance, detect drift, and trigger retraining.
Utilise techniques like model quantisation, distillation, and caching to optimise inference.

Team Leadership & Growth

Lead, mentored, and grew a high-performing team of ML Engineers and MLOps specialists.
Foster a culture of innovation, ownership, and technical excellence.
Drive continuous learning and skill development within the team through mentorship, code reviews, and training initiatives.

Skills we like:

Strong experience building and managing MLOps pipelines using frameworks such as Kubeflow, MLflow, or similar.
Proficiency in Python, with expertise in writing high-performance, maintainable code.
Hands-on experience with AWS cloud services and infrastructure-as-code tools (Terraform, CloudFormation).
Deep understanding of Docker, Kubernetes, and container orchestration.
Strong grasp of CI/CD principles tailored for machine learning workflows.
Experience designing scalable ETL pipelines and working with both SQL and NoSQL databases.
Knowledge of monitoring tools such as Prometheus, Grafana, or CloudWatch.
Proven leadership experience, with a track record of mentoring and managing technical teams.

Nice-to-Have Skills:

Experience with distributed computing frameworks (Apache Spark, Dask, Ray).
Understanding of network configurations (proxies, SSH, NAT, VPN) and security best practices.
Familiarity with API integrations and model explainability techniques.
Hands-on experience with performance optimisation strategies like multi-threading and vectorisation.

What's in it for you?

A range of benefits to set you up for every success in and outside of work. Here's a taste of what's on offer:

Impact the future of AI
Reward package including equity - we want our success to be yours too
Inclusive parental leave policy that supports all parents & carers with 18 weeks paid leave
An annual Vibe & Thrive allowance to support your wellbeing, social connection, office setup & more
Flexible leave options that empower you to be a force for good, take time to recharge and support you personally, including remote working abroad
Support with your professional development
Fun and engaging company events, both virtual and in-person

20 days annual leave
Novated car leasing

We're committed to building a diverse, safe and inclusive environment where employees can be authentic and teams collaborate effectively to bring innovative ideas to life.