Match score not available

AI Solutions Specialist

extra holidays - extra parental leave
Remote: 
Full Remote
Contract: 
Experience: 
Senior (5-10 years)
Work from: 

DDN Storage  logo
DDN Storage Information Technology & Services Scaleup
501 - 1000 Employees
See all jobs

Job description

Overview:

This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial services, autonomous cars, Government, academia, research and manufacturing.

  

"DDN's A3I solutions are transforming the landscape of AI infrastructure." – IDC 

 

“The real differentiator is DDN. I never hesitate to recommend DDN. DDN is the de facto name for AI Storage in high performance environments” - Marc Hamilton, VP, Solutions Architecture & Engineering | NVIDIA 

  

DDN is the global leader in AI and multi-cloud data management at scale. Our cutting-edge data intelligence platform is designed to accelerate AI workloads, enabling organizations to extract maximum value from their data. With a proven track record of performance, reliability, and scalability, DDN empowers businesses to tackle the most challenging AI and data-intensive workloads with confidence. 

  

Our success is driven by our unwavering commitment to innovation, customer-centricity, and a team of passionate professionals who bring their expertise and dedication to every project. This is a chance to make a significant impact at a company that is shaping the future of AI and data management. 

  

Our commitment to innovation, customer success, and market leadership makes this an exciting and rewarding role for a driven professional looking to make a lasting impact in the world of AI and data storage. 

Job Description:

As a AI Solutions Specialist at DDN, you will lead the technical development of cutting-edge AI, GPU virtualization, and high-performance computing (HPC) solutions. You will play a critical role in optimizing our storage and cluster environments to drive AI inferencing, GPU computing, and large-scale HPC systems to new heights. You will leverage your deep technical expertise in AI inference, GPU virtualization, and infrastructure optimization to enable seamless integration of our storage products with modern computing stacks.

 

Your work will impact our customers' ability to run AI-driven workloads and maximize performance across hybrid on-premise and cloud environments. You’ll collaborate with cross-functional teams to innovate, drive strategic partnerships, and ensure the scalability and efficiency of AI and HPC solutions for some of the most demanding applications in the world.

 

Key Responsibilities:

  • Lead Innovation in GPU Virtualization & AI Workloads:
    Design, optimize, and implement advanced GPU virtualization solutions, including GPU Direct Storage integration, to enhance performance for AI inferencing and HPC workloads.
  • Optimize Large-Scale AI & HPC Infrastructures:
    Develop and deploy solutions that improve cluster utilization and optimize performance for AI and GPU-driven systems. Manage GPU clusters and related infrastructure to maximize availability, scalability, and efficiency.
  • AI Inference & Model Optimization:
    Drive the optimization of AI inference workloads using frameworks such as TensorFlow, PyTorch, and other industry-leading tools. Leverage expertise in CUDA to tune and accelerate AI models and workloads.
  • Hybrid Cloud Infrastructure Strategy:
    Architect, deploy, and optimize cloud-based and hybrid on-premise solutions for AI and HPC workloads. Ensure integration with cloud providers and bare-metal systems to deliver high-performance, scalable, and cost-effective solutions.
  • Drive Performance Improvement:
    Continually assess and optimize system configurations for AI inference and HPC workloads, driving significant performance improvements through specialized technologies such as RDMA, InfiniBand, and high-bandwidth interconnects.
  • Strategic Planning & Partnerships:
    Build and maintain relationships with key stakeholders, including cloud service providers, hardware manufacturers (e.g., Nvidia), and customers, to stay ahead of industry trends and integrate best-in-class technologies into DDN’s offerings.

Required Skills and Experience:

  • Extensive experience in optimizing AI inference and GPU-based workloads using frameworks such as TensorFlow, PyTorch, and CUDA. Strong understanding of GPU virtualization, including integration of technologies such as NVIDIA vGPU and GPUDirect.
  • Proven track record of managing large-scale HPC clusters, optimizing performance, and scaling workloads. Proficient in cluster management tools and optimizing infrastructure for AI-driven applications.
  • Expertise in deploying cloud-based solutions across hybrid environments (AWS, Azure, Google Cloud, etc.). Experience in managing and optimizing cloud-native infrastructure for real-time AI and HPC workloads.
  • Knowledge of RDMA, InfiniBand, high-bandwidth interconnects, and their impact on performance in distributed systems.
  • Extensive experience working with Nvidia GPUs. Familiarity with Nvidia’s software stack and optimizations for AI/ML workloads.
  • Programming (e.g., Python, C++, CUDA) and performance tuning for large-scale, complex systems. Experience optimizing LLM (Large Language Model) training and inference workloads.

Preferred Qualifications:

  • +5 years of experience 
  • BS or MS degree in Computer Science, Engineering, or related technical field.
  • Experience with distributed systems, containerization (Docker, Kubernetes), and orchestration.
  • Familiarity with machine learning and AI frameworks, and the ability to work with data science teams to optimize models.
DDN:

Join our dynamic and driven team, where engineering excellence is at the heart of everything we do. We seek individuals who love to challenge themselves and are fueled by curiosity. Here, you'll have the opportunity to work across various areas of the company, thanks to our flat organizational structure that encourages hands-on involvement and direct contributions to our mission. Leadership is earned by those who take initiative and consistently deliver outstanding results, both in their work ethic and deliverables, making strong prioritization skills essential. Additionally, we value strong communication skills in all our engineers and researchers, as they are crucial for the success of our teams and the company as a whole.

 

Interview Process: After submitting your application, one of our recruiters will review your resume. If your application passes this stage, you will be invited to a 30-minute interview during which a member of our team will ask some basic questions. If you clear the interview, you will enter the main process, which can consist of up to four interviews in total:

  • Systems design: Translate high-level requirements into a scalable, fault-tolerant service (depending on role).
  • Real-time problem-solving: Demonstrate practical skills in a live problem-solving session.
  • Meet and greet with the wider team.
  • Our goal is to finish the main process in 2-3 weeks at most.

 

DataDirect Networks, Inc. is an Equal Opportunity/Affirmative Action employer.  All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity, gender expression, transgender, sex stereotyping, sexual orientation, national origin, disability, protected Veteran Status, or any other characteristic protected by applicable federal, state, or local law.

#LI-Remote

Required profile

Experience

Level of experience: Senior (5-10 years)
Industry :
Information Technology & Services
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Teamwork
  • Communication
  • Problem Solving
  • Prioritization

AI Specialist Related jobs