AI AND MACHINE LEARNING ENGINEER

Remote: 
Full Remote
Contract: 
Work from: 

Offer summary

Qualifications:

2+ years in MLOps, DevOps, or backend engineering for AI workloads, Proficient in DeepStream 7.x and containerization with Docker, Strong programming skills in Python and bash, with CI/CD scripting experience, Experience deploying and optimizing CNNs and LLMs in production environments..

Key responsibilities:

  • Build and automate inference pipelines for computer vision models
  • Migrate and optimize Triton workloads to DeepStream with minimal downtime
  • Serve and optimize large language models using quantization and pruning techniques
  • Automate build/test/release processes and support model lifecycle management.

Dicetek LLC logo
Dicetek LLC https://www.dicetek.net/
1001 - 5000 Employees
See all jobs

Job description

We are in need of 1 AI and Machine Learning Engineer who will assist our Team in Emerging Technologies.

The chosen resource needs to work offshore and below are the detailed requirement for this role.

Must Have

  • 2 + years in MLOps, DevOps or backend engineering for AI workloads
  • DeepStream 7.x poweruser—pipelines, Gstplugins, nvdsanalytics, nvstreammux
  • Solid grasp of containerization (Docker) & GPU scheduling
  • Proven track record squeezing latency/throughput on NVIDIA GPUs (TensorRT, mixed precision, CUDA toolkit)
  • Handson deploying YOLO or comparable CNNs in production
  • Experience selfhosting and serving LLMs (vLLM, TensorRTLLM, or similar) plus quantization/pruning/distillation
  • Strong Python & bash; confidence with CI/CD scripting

Nice To Have

  • Exposure to cloud GPUs (AWS /GCP /Azure)
  • Experience with edge devices (Jetson, Xavier, Orin)
  • Performance profiling with Nsight Systems / DCGM
  • Knowledge of Triton Inference Server internals
  • Familiarity with distributed training (PyTorch DDP, DeepSpeed)
  • Basic frontend/REST gRPC API design skills

What You Will Do

  • Build & automate inference pipelines
  • Design, containerize and deploy CV models (YOLO v8 / v11, custom CNNs) with DeepStream 7.x, optimizing for lowest latency and highest throughput on NVIDIA GPUs.
  • Migrate existing Triton workloads to DeepStream with minimal downtime.
  • Serve and optimize large language models
  • Selfhost Llama 3.2, Llama 4, and future LLM/VLMs on the cluster using bestpractice quantization, pruning and distillation techniques.
  • Expose fast, reliable APIs and monitoring for downstream teams.
  • Continuous delivery & observability
  • Automate build/test/release steps and set up health metrics, logs and alerts so models stay stable in production.
  • Allocate GPU resources efficiently across CV and LLM services.
  • Model lifecycle support (10 – 20 %)
  • Assist data scientists with occasional finetuning or retraining runs and package models for production.

Required profile

Experience

Spoken language(s):
English
Check out the description to know which languages are mandatory.

Machine Learning Engineer Related jobs