Offer summary
Qualifications:
Extensive experience in CUDA, C++, and Triton, Proficiency in building inference stacks using ggml, vllm, DeepSpeed.Key responsabilities:
- Collaborate with ML Teams effectively
- Optimize low-level primitives for efficient model execution
- Stay up-to-date with advancements in ML inference