Proven expertise in Python and experience with Flask., Background in deploying and maintaining ML models, focusing on automated evaluation and load balancing frameworks., Experience with large language models (LLMs) and continuous auto-evaluation pipelines., At least 2 years of relevant experience, preferably in startup environments. .
Key responsabilities:
Improve and maintain the legal evaluation platform by developing innovative evaluation frameworks.
Establish a Continuous Auto-Evaluation Framework to monitor and validate LLM outputs.
Design and deploy a load balancing system for LLM API calls, ensuring robust fallback mechanisms.
Deploy and maintain proprietary LLM models, streamlining training and fine-tuning processes.
Report This Job
Help us maintain the quality of our job listings. If you find any issues with this job post, please let us know.
Select the reason you're reporting this job:
DraftWise helps firms produce higher quality contracts in less time by empowering every attorney with instant access to the firm’s collective knowledge.
Our team combines world class engineering with big law to design a solution that puts the attorney in the drivers seat.
For more information, or to schedule a demo please contact our team at contact@draftwise.com.
DraftWise is revolutionizing the legal industry with an innovative knowledge platform that transforms how lawyers and legal professionals work. Our products unlock, for the first time, years of historic legal contracts negotiated by firms for their clients and seamlessly integrate generative AI into the workflows of transactional attorneys. By leveraging cutting-edge data management, machine learning, generative AI, and language analytics, we empower legal professionals to draft better, safer contracts with unprecedented efficiency.
Our founding team brings together former Palantir and Google engineers, alongside an experienced lawyer from a top global law firm. This diverse combination of expertise in engineering, product development, and customer success enables us to deliver industry-leading solutions. While DraftWise operates as a fully remote company, we also provide coworking opportunities in hubs across New York City and London, with plans to expand to additional cities as we grow density in new regions. We are committed to hiring the best talent globally, fostering a culture of respect, collaboration, and excellence.
Backed by Y Combinator and prominent venture capital funds, including a Series A funding round led by Index Ventures with participation from Earlybird Ventures (now Bek Ventures), DraftWise is already generating revenue from some of the most prestigious law firms in the world. Our clients include members of the Vault 10, Magic Circle, AMLAW 100, and other top-tier legal rankings. Our rapid growth spans multiple countries, with further expansions underway.
Following our recent fundraise, we are investing in critical functions and scaling our team to meet rising demand. Joining DraftWise means working alongside a highly talented team to deliver outstanding results for our customers and making a tangible impact on the world’s most influential law firms.
What we value
Strong communication skills in an open environment.
The ability to work independently and make informed decisions with minimal supervision.
Interest in working in a dynamic environment with dynamic objectives.
A commitment to autonomy, ownership, and delivering high-quality solutions.
Openness to giving and receiving constructive feedback.
About This Role
We are seeking a Machine Learning Operations (ML Ops) Engineer to contribute to the development of cutting-edge AI solutions for legal contract analysis. You will be part of our growing ML team, working closely with our ML engineer, NLP engineer and Backend engineers.
Key Responsibilities
Improving and maintaining our legal evaluation platform:
Drive continuous enhancements to our state-of-the-art legal evaluation platform by developing innovative evaluation frameworks and ensuring its robust, scalable, and precise operation.
Collaborate closely with cross-functional teams to create new methodologies that empower our legal experts to rigorously assess LLM output quality while ensuring the platform remains adaptive to evolving challenges and technological advances.
Establishing a Continuous Auto-Evaluation Framework:
Develop and deploy an automated system that continuously monitors and validates our LLM outputs against stringent quality thresholds.
Incorporate real-time performance metrics, anomaly detection, and proactive remediation triggers—ensuring that any dip in response quality is promptly identified and corrected.
Integrate feedback loops that sustain our commitment to excellence, thereby ensuring our platform maintains consistently high standards.
Designing a Load Balancing Framework for LLM API Calls:
Architect and deploy a dynamic load balancing system that intelligently distributes API calls across multiple LLM endpoints.
Implement robust fallback mechanisms to automatically reroute traffic when an LLM instance is down or reaching usage quotas.
Integrate real-time health monitoring and performance analytics to assess endpoint availability and adjust routing in real time.
Deploying and Maintaining Proprietary LLM Models with Training and Fine-Tuning Support:
Deploy and scale in-house LLM models to ensure reliable and high-performance operations tailored to our unique legal evaluation needs.
Establish automated pipelines that streamline the training, fine-tuning, and deployment processes, ensuring continuous model improvement and adaptation.
Collaborate closely with ML and NLP teams to integrate the latest techniques, rigorous evaluation metrics, and industry best practices into the training workflow.
Implement proactive monitoring and maintenance protocols to quickly identify and address performance bottlenecks or operational issues.
About You
We're looking for teammates who possess:
Proven expertise in Python with hands-on experience using Flask, along with a willingness to explore and adapt to new frameworks when necessary.
A robust background in deploying, maintaining, and fine-tuning ML models, with a focus on developing automated evaluation and load balancing frameworks.
Demonstrated experience in working with large language models (LLMs) and establishing continuous auto-evaluation pipelines. For example, you may have built systems that monitor key performance metrics (e.g., latency, quality scores) in real time, or used an open-source framework such as DeepEval or Phoenix.
At least 2 years of relevant experience, ideally within startup environments, showcasing a resourceful and proactive approach to addressing evolving challenges.
The ability to work closely with ML, NLP, and backend engineers to drive cutting-edge AI solutions tailored for legal contract analysis.
While not a necessity, basic knowledge of HTML/JS is a plus, enhancing your contributions within a full-stack development context.
A customer-centric approach coupled with strong analytical skills to ensure the platform's robust, scalable, and precise operation.
What We Offer
Remote-first work style: Work anywhere in Europe.
Meaningful equity plan, giving you a stake in our growth.
Competitive salary to reward your expertise.
Private medical care to support your well-being.
A new laptop and a work-from-home stipend for necessary accessories.
Unlimited PTO and sick leave to prioritize work-life balance.
The opportunity to shape our company and disrupt the legal world.
Required profile
Experience
Industry :
Computer Software / SaaS
Spoken language(s):
English
Check out the description to know which languages are mandatory.