This is a remote position.
We are looking for a naturally curious Data Engineer with strong interpersonal relationship skills to build end to end data pipelines, that is, design, build and maintain ETL/ELT pipelines. We will rely on you to build data products to extract valuable business insights. In this role, you should be highly analytical with a knack for analysis, math and statistics. Critical thinking and problem-solving skills are essential for interpreting data. We want to see a passion for machine-learning and research. Your goal will be to help our company analyze trends to make better decisions.
Computer science: Apply the principles of artificial intelligence, database systems, human/computer interaction, numerical analysis, and software engineering.
Programming: Write computer programs and analyze large datasets to uncover answers to complex problems. Need to be comfortable writing code working in a variety of languages such as Java, R, Python, and SQL.
Machine learning: Implement algorithms and statistical models to enable a computer to automatically learn from data.
Statistical analysis: Identify patterns in data. This includes having a keen sense of pattern detection and anomaly detection.
Business intuition: Connect with stakeholders to gain a full understanding of the problems they’re looking to solve.
Analytical thinking. Find analytical solutions to abstract business issues.
Design, build and maintain ETL/ELT pipelines.
Monitor data and models regularly. Persistently ensure quality check
Transform data into scalable, performant data models that would be consumed by data analysts and data scientists.
Design, build and update foundational analytics data models that simplify analyses across the products.
Implement best practices to ensure high quality, robust data.
Critical thinking: Apply objective analysis of facts before coming to a conclusion.
Minimum of 3 years (2 with advanced degree) experience working in an engineering role. Suitable previous roles include: Data Engineer, Machine Learning Engineer, MLOps Engineer, etc.
3+ years of experience with Python (including major machine learning and deployment libraries such as numpy, pandas, scikit learn, tensorflow, Flask).
3+ years of experience with SQL.
3+ years experience with Apache Kafka
3+ years experience with Apache Spark
3+ years experience with web scraping (you should be familiar with the major libraries such as requests, Beautiful Soup, lxml, Selenium and Scrapy).
3+ years experience in unit testing and integration testing
2+ years of experience deploying machine learning models in a cloud computing environment such as AWS, GCP or Azure;
2+ years of experience building out data pipelines from scratch in a highly distributed and fault-tolerant manner.
2+ years experience with Docker
2+ years experience with Git
2+ years experience with CI/CD
Knowledge of REST APIs
Knowledge of data structures and algorithms
Solid oral and written communication skills, especially around analytical concepts and methods.
Strong math skills (probability, statistics, linear algebra)
Knowledge of Data visualization tools (such as Tableau, Power BI, etc.)
Flexible WFH/In office hours
Health Insurance
Mazars
Factorial HR
Whatnot
CI&T
EPAM Systems