Job Title:
Airflow Orchestration and Ingestion Engineer (Cloud Migration)
Location:
Remote
Contract – 6 months
Overview:
We are looking for an experienced Airflow Orchestration and Ingestion Engineer to support the migration of legacy data workflows from Apache Oozie to Apache Airflow, as part of a broader transition from an on-prem Hadoop environment to a modern cloud-based data platform on Databricks and AWS. This is a critical role focused on reengineering data pipeline orchestration, automation, and deployment within a cloud-native framework.
Key Responsibilities:
Workflow Migration:
Convert Oozie workflows to Airflow DAGs using Python.
Build reusable, modular Airflow pipelines for ingestion, transformation, and scheduling.
Ensure accurate migration and one-to-one workflow alignment without disrupting business processes.
Cloud Data Platform Transition:
Work with engineering teams to migrate Hadoop workloads to Databricks on AWS.
Leverage Airflow to orchestrate data pipelines across AWS services (S3, EMR, Glue, Redshift).
Pipeline Optimization:
Enhance pipeline performance for throughput and latency in the AWS ecosystem.
Integrate Airflow with Databricks for transformation and analytics tasks.
Monitoring & Error Handling:
Implement retry logic, exception handling, and alerting in Airflow.
Set up observability tools like CloudWatch, Prometheus, or Airflow’s native monitoring.
Collaboration & Documentation:
Collaborate with data architects, DevOps, and cloud teams.
Document orchestration logic, best practices, and the migration process.
CI/CD and Infrastructure Automation:
Develop CI/CD pipelines using Jenkins and Terraform.
Automate DAG deployment and infrastructure provisioning via IaC.
Integrate validation steps in deployment workflows.
Required Skills and Experience:
Strong expertise in Apache Airflow, including complex DAG design and orchestration.
Prior experience with Apache Oozie and workflow migration.
Proficiency in Python for Airflow DAG development.
Hands-on experience with Hadoop ecosystems (e.g., HDFS, Hive, Spark).
Knowledge of CI/CD tools such as Jenkins and Infrastructure as Code (IaC) with Terraform.
Experience with Databricks (preferably on AWS) and big data orchestration.
Solid understanding of AWS services: S3, EMR, Glue, Lambda, Redshift, IAM.
Familiarity with container tools such as Docker or Kubernetes.
Preferred Qualifications:
Experience with large-scale cloud migrations, especially Hadoop-to-Databricks.
Proficiency in Spark / PySpark for big data transformation.
AWS or Databricks certifications are a plus.
Familiarity with Git and workflow monitoring platforms.
Mapbox
EWOR
Native Instruments
Pharmavise Corporation
Abnormal Security