Match score not available

Sr. Data Scientist

unlimited holidays - extra holidays - extra parental leave - long remote period allowed
Remote: 
Full Remote
Contract: 
Experience: 
Mid-level (2-5 years)
Work from: 

Job description

 

Job Title: Sr. Data Scientist

Experience: 6+ Years

Location: Mumbai/Pune (Hybrid)


Company Overview:

Cacti is a "Talent-as-a-Service (TaaS)" marketplace platform that helps bring together global talent seekers, top talent, and skill enhancers under one brand. We also provide Managed Services in the Legal, Legal–Tech, and Tech domains. We democratize access to global opportunities for talented professionals and empower organizations with transparent, outcome-based solutions. We have worked with clients across diverse sectors in the areas of legal and business solutions, contracts management, finance, technology, and analytics.


Roles and Responsibilities:
  • As a core member of the NLP team, you will research, prototype, develop, deploy and scale innovative ML/DL solutions in collaboration with legal Experts and Product Management teams.
  • You will develop predictive models on large-scale datasets to address various business problems leveraging advanced statistical modelling, machine learning, or data mining techniques.
  • Design and implement infrastructure for orchestrating end to end machine learning lifecycles
  • Set up processes to monitor and continually improve efficiency and performance of models
  • Software development including algorithm implementation, optimization, performance profiling, integration to production systems, testing and documentation.
  • Write high-quality production code as you build and maintain robust, scalable machine learning systems.
  • Program primarily in Python using efficient algorithms and software design patterns
  • Scale and improve performance of Natural Language systems in production



Requirements
  • Python Programming: Strong proficiency in Python, as it is the primary language for our application.
  • API Integration: Skills in integrating external APIs, specifically the OpenAI API, for making queries and retrieving responses.
  • LangChain Proficiency: Experience with LangChain libraries and the ability to use them for document chunking and invoking LLMs. Familiarity with the LangChain framework and its components.
  • Document Processing: Expertise in processing and handling documents, especially in breaking them into chunks or segments based on the requirements of the application.
  • Embedding Techniques: Knowledge of embedding techniques such as Word Embeddings, Sentence Embeddings, or document embeddings for representing chunks of documents as vectors.
  • Vector Database Management: Experience in working with vector databases (Qdrant/Elasticsearch/Faiss/ChromaDB/pinecone) for storage and retrieval of document embeddings.
  • Strong problem-solving skills with an emphasis on product development.
  • Experience creating and using advanced machine learning algorithms and statistics: regression, simulation, scenario analysis, modelling, clustering, decision trees, neural networks, etc.
  • Knowledge of a variety of machine learning techniques (clustering, decision tree learning, artificial neural networks, etc.) and their real-world advantages/drawbacks.
  • Knowledge of advanced statistical techniques and concepts (regression, properties of distributions, statistical tests and proper usage, etc.) and experience with applications.
  • Experience with text extraction from Images/Documents/PDFs.
  • Experience with deep learning architectures such as LSTMs, Transformers,
  • Experience with cutting-edge deep learning–based NLP models such as BERT.
  • Experience with deep learning NLP toolkits such as huggingface, spacy, etc.
  • Experience with deep learning frameworks like TensorFlow, PyTorch
  • Experience with Agile, Scrum.
  • Experience with proprietary and open source LLM training (OpenAI, LLAMA, Falcon,Google), creating datasets, working with embeddings and PEFT/ LORA models.
  • Experience with Reinforcement Learning (RLHF/ RLAIF model training) and knowledge of RL algorithms (PPO) will be a big  plus


Benefits
  • Flexible working hours and Remote working options
  • Competitive salary and Bonus incentives
  • Health Insurance, Medical Incentives, and Travel Incentives
  • Professional development and mentorship programs
  • Opportunity to work with Global Client


Required profile

Experience

Level of experience: Mid-level (2-5 years)
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Analytical Skills
  • Collaboration

Data Scientist Related jobs