Data Engineer

Remote: 
Full Remote
Contract: 
Work from: 

Offer summary

Qualifications:

2-4 years of experience in building ETL or ELT pipelines using Databricks or Snowflake., Proficient in Python, particularly with libraries like pandas and PySpark., Basic knowledge of SQL and experience with structured schemas., Familiarity with Git and CI/CD processes..

Key responsibilities:

  • Build and schedule Python parsers to extract structured JSON from various document formats.
  • Develop and maintain data pipelines for ERP and ticketing systems.
  • Implement text-embedding or LLM-based entity extraction to enhance document feeds.
  • Write unit tests and data-quality checks to ensure pipeline reliability.

Offshorly - Mobile First Websites | Enterprise Solutions | Digital Outsourcing logo
Offshorly - Mobile First Websites | Enterprise Solutions | Digital Outsourcing http://www.offshorly.com
11 - 50 Employees
See all jobs

Job description

What The Engineer Will Actually Do

  • P1 | Build and schedule Python parsers that extract structured JSON from PowerPoint, PDF, and Excel documents, then land the data in Databricks Bronze → Silver tables.
  • P1 | Develop/maintain simple Auto Loader or Fivetran pipelines for ERP and ticketing systems.
  • P2 | Add basic text‑embedding or LLM‑based entity extraction (LangChain or open‑source transformers) to enrich the document feed.
  • P3 | Write unit tests and lightweight data‑quality checks (Great Expectations) so parsing errors do not break the pipeline.
  • P3 | Produce concise handover docs for our future data architect.

Skill Set

Must‑have (core):

  • 2‑4 years building ETL or ELT pipelines with Databricks or Snowflake (Delta/Parquet, Spark SQL, Airflow or similar).
  • Solid Python (pandas, PySpark) and experience parsing Office files with libraries such as python‑pptx, openpyxl, pdfplumber, or PyPDF.
  • Basic SQL tuning and ability to work with structured schemas.
  • Git and CI/CD familiarity.

Nice‑to‑have (bonus)

  • Exposure to LangChain, Hugging Face Transformer, or any LLM inference workflow.
  • Experience adding embeddings to tables for downstream ML or search.
  • Great Expectations or similar data‑quality tooling.
  • Familiarity with Unity Catalog or Snowflake RBAC concepts.

Required profile

Experience

Spoken language(s):
English
Check out the description to know which languages are mandatory.

Data Engineer Related jobs