2+ years in MLOps, DevOps, or backend engineering for AI workloads, Proficient in DeepStream 7.x and containerization with Docker, Strong programming skills in Python and bash, with CI/CD scripting experience, Experience deploying and optimizing CNNs and LLMs in production environments..
Key responsibilities:
Build and automate inference pipelines for computer vision models
Migrate and optimize Triton workloads to DeepStream with minimal downtime
Serve and optimize large language models using quantization and pruning techniques
Automate build/test/release processes and support model lifecycle management.
Report This Job
Help us maintain the quality of our job listings. If you find any issues with this job post, please let us know.
Select the reason you're reporting this job:
Dicetek is a Global IT Solutions and Services Company, incorporated
in 2006 with its Head Quarters in Singapore. We continue to expand
our global network while providing high valued-added consulting
services that assist our clients in expanding their business
operations on a global basis.
DICETEK has established offices in India, UAE, Singapore & the USA.
As a world-class company with a regional focus, we primarily focus on
providing IT Solutions, IT Consulting, and Professional Services,
across different verticals like Banking & Financial Services, Telecom,
Government, Oil & Gas, Logistics, Supply Chain, Manufacturing,
Sales Automations.
We have a solid performance and reputation in the technology
industry for providing excellent services to our clients.
Our values are represented by our integrity, thought leadership, and
commitment in maintaining a high level of excellence in the constant
the evolving world of information technology.
With more than 16 years in the industry, we have established a
successful track record of Consulting services delivery across a
variety of technical roles across the public and private sectors.
Dicetek has a specialist team of IT Consultants who offer both
international experience and a deep understanding of the local
market.
To find the most up-to-date job opportunities, please review the link below.
https://dicetek.talentrecruit.com/career-page
Solid grasp of containerization (Docker) & GPU scheduling
Proven track record squeezing latency/throughput on NVIDIA GPUs (TensorRT, mixed precision, CUDA toolkit)
Handson deploying YOLO or comparable CNNs in production
Experience selfhosting and serving LLMs (vLLM, TensorRTLLM, or similar) plus quantization/pruning/distillation
Strong Python & bash; confidence with CI/CD scripting
Nice To Have
Exposure to cloud GPUs (AWS /GCP /Azure)
Experience with edge devices (Jetson, Xavier, Orin)
Performance profiling with Nsight Systems / DCGM
Knowledge of Triton Inference Server internals
Familiarity with distributed training (PyTorch DDP, DeepSpeed)
Basic frontend/REST gRPC API design skills
What You Will Do
Build & automate inference pipelines
Design, containerize and deploy CV models (YOLO v8 / v11, custom CNNs) with DeepStream 7.x, optimizing for lowest latency and highest throughput on NVIDIA GPUs.
Migrate existing Triton workloads to DeepStream with minimal downtime.
Serve and optimize large language models
Selfhost Llama 3.2, Llama 4, and future LLM/VLMs on the cluster using bestpractice quantization, pruning and distillation techniques.
Expose fast, reliable APIs and monitoring for downstream teams.
Continuous delivery & observability
Automate build/test/release steps and set up health metrics, logs and alerts so models stay stable in production.
Allocate GPU resources efficiently across CV and LLM services.
Model lifecycle support (10 – 20 %)
Assist data scientists with occasional finetuning or retraining runs and package models for production.
Required profile
Experience
Spoken language(s):
English
Check out the description to know which languages are mandatory.