Offer summary

Qualifications:

Strong expertise in Apache Airflow and complex DAG design., Prior experience with Apache Oozie and workflow migration., Proficiency in Python for Airflow DAG development., Solid understanding of AWS services and experience with Databricks..

Key responsibilities:

Convert Oozie workflows to Airflow DAGs using Python.

Collaborate with engineering teams to migrate Hadoop workloads to Databricks on AWS.

Implement monitoring and error handling in Airflow.

Develop CI/CD pipelines and automate DAG deployment using Jenkins and Terraform.

Job description

Job Title:
Airflow Orchestration and Ingestion Engineer (Cloud Migration)

Location:
Remote
Contract – 6 months

Overview:
We are looking for an experienced Airflow Orchestration and Ingestion Engineer to support the migration of legacy data workflows from Apache Oozie to Apache Airflow, as part of a broader transition from an on-prem Hadoop environment to a modern cloud-based data platform on Databricks and AWS. This is a critical role focused on reengineering data pipeline orchestration, automation, and deployment within a cloud-native framework.

Key Responsibilities:

Workflow Migration:
- Convert Oozie workflows to Airflow DAGs using Python.
- Build reusable, modular Airflow pipelines for ingestion, transformation, and scheduling.
- Ensure accurate migration and one-to-one workflow alignment without disrupting business processes.
Cloud Data Platform Transition:
- Work with engineering teams to migrate Hadoop workloads to Databricks on AWS.
- Leverage Airflow to orchestrate data pipelines across AWS services (S3, EMR, Glue, Redshift).
Pipeline Optimization:
- Enhance pipeline performance for throughput and latency in the AWS ecosystem.
- Integrate Airflow with Databricks for transformation and analytics tasks.
Monitoring & Error Handling:
- Implement retry logic, exception handling, and alerting in Airflow.
- Set up observability tools like CloudWatch, Prometheus, or Airflow’s native monitoring.
Collaboration & Documentation:
- Collaborate with data architects, DevOps, and cloud teams.
- Document orchestration logic, best practices, and the migration process.
CI/CD and Infrastructure Automation:
- Develop CI/CD pipelines using Jenkins and Terraform.
- Automate DAG deployment and infrastructure provisioning via IaC.
- Integrate validation steps in deployment workflows.

Required Skills and Experience:

Strong expertise in Apache Airflow, including complex DAG design and orchestration.
Prior experience with Apache Oozie and workflow migration.
Proficiency in Python for Airflow DAG development.
Hands-on experience with Hadoop ecosystems (e.g., HDFS, Hive, Spark).
Knowledge of CI/CD tools such as Jenkins and Infrastructure as Code (IaC) with Terraform.
Experience with Databricks (preferably on AWS) and big data orchestration.
Solid understanding of AWS services: S3, EMR, Glue, Lambda, Redshift, IAM.
Familiarity with container tools such as Docker or Kubernetes.