Match score not available

Operation and Maintenance Engineer-AI Tools

unlimited holidays - extra holidays - extra parental leave - long remote period allowed
Remote: 
Full Remote
Contract: 
Experience: 
Mid-level (2-5 years)
Work from: 

Offer summary

Qualifications:

Bachelor's degree in computer science or related field, 3+ years experience in operation and maintenance.

Key responsabilities:

  • System maintenance and monitoring
  • Deployment and release, problem troubleshooting, performance optimization, security management
MyShell.ai logo
MyShell.ai Startup https://myshell.ai/
11 - 50 Employees
See more MyShell.ai offers

Job description

About MyShell

MyShell is revolutionizing the AI landscape by building an open ecosystem for AI-native apps. Our powerful platform and intuitive toolkit empower anyone to create, access, and benefit from AI-powered applications. Launched in April 2023, MyShell has quickly gained global traction, attracting a diverse community of creators and users.

Our team of talented individuals from top institutions like MIT, Princeton, and Oxford is committed to fostering innovation in a supportive and transparent work environment. With funding from leading VCs, MyShell is poised to reshape the future of AI, making it accessible and integral to everyone's daily life. Join us on this thrilling journey as we redefine what's possible with AI.

About the Role

We are currently seeking an experienced and highly skilled AI tool operation and maintenance engineer to join our team and be responsible for the operation and maintenance of AI capability tools (such as ControlNet). This position will ensure the efficient and stable operation of the AI tools and continuously optimize the system performance and user experience.
 
Main responsibilities:
 
1. System maintenance and monitoring:
• Be responsible for the daily operation and maintenance of AI tools (such as ControlNet), including the monitoring and maintenance of servers, databases, and networks.
• Use monitoring tools to monitor the system performance in real-time, and promptly detect and handle system failures and performance bottlenecks.
2. Deployment and release:
• Be responsible for the deployment and version release of AI graphic tools to ensure that new features and fixes can be quickly and safely launched.
• Develop and implement the CI/CD (Continuous Integration and Continuous Deployment) process and automate deployment tasks.
3. Problem troubleshooting and resolution:
• Handle various emergent problems during the operation of AI graphic tools, conduct fault排查 and performance optimization.
• Analyze logs and monitoring data, locate the root cause of problems, and propose solutions.
4. Performance optimization:
• Optimize the deployment architecture of AI tools to improve operational efficiency and stability.
• Develop and implement system performance tuning strategies to ensure stable operation under high loads.
5. Security management:
• Ensure the security of AI tools, conduct regular security assessments and vulnerability fixes.
• Implement data protection and backup strategies to ensure data security and system recovery capabilities.
6. Collaboration and communication:
• Work closely with the development team, participate in the system design and optimization of AI tools, and provide operation and maintenance-related suggestions.
• Write and maintain operation and maintenance documents to ensure knowledge sharing and inheritance.
 
Qualifications:
 
1. Educational background:
• Bachelor's degree or above in computer science, software engineering, or related fields.
2. Work experience:
• More than 3 years of experience as an operation and maintenance engineer or in a related position. Candidates with experience in the operation and maintenance of AI graphic tools are preferred.
3. Skill requirements:
• Proficient in the Linux operating system and its commands.
• Familiar with common monitoring tools (such as Prometheus, Grafana, ELK, etc.) and automated operation and maintenance tools (such as Ansible, Puppet, Chef, etc.).
• Proficient in using scripting languages (such as Python, Shell, etc.) for automated operation and maintenance.
• Familiar with cloud computing platforms (such as AWS, Azure, GCP, etc.) and their operation and maintenance management.
• Familiar with the principles and applications of AI graphic tools (such as ControlNet), and candidates with relevant project experience are preferred.
4. Other requirements:
• Possess good communication skills and teamwork spirit, and be able to work in a high-pressure environment.
• Have strong analytical skills and problem-solving abilities, and be able to handle complex problems independently.
• Have a strong sense of responsibility and work initiative, and be able to continuously learn and upgrade one's skills

 

Plus Points

  • Exceptional problem-solving abilities and strong communication skills.
  • Experience with AI or machine learning technologies and their integration into backend systems.
  • Contributions to open-source projects or a strong presence in the developer community.
  • Prior experience in a fast-paced startup environment.

What We Offer

  • Competitive salary and equity package, commensurate with experience and location.
  • Flexible working hours and a fully remote work environment, with the ability to collaborate effectively across time zones.
  • A dynamic and collaborative work environment that fosters innovation, growth, and professional development.
  • The opportunity to work on cutting-edge technologies and help shape the future of AI, transforming industries and making a global impact.

Required profile

Experience

Level of experience: Mid-level (2-5 years)
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Accountability
  • Verbal Communication Skills
  • Open Mindset
  • Analytical Skills
  • Teamwork

AI Operations (AI Ops) Engineer Related jobs