Offer summary

Qualifications:

2-4 years of experience in building ETL or ELT pipelines using Databricks or Snowflake., Proficient in Python, particularly with libraries like pandas and PySpark., Basic knowledge of SQL and experience with structured schemas., Familiarity with Git and CI/CD processes..

Key responsibilities:

Build and schedule Python parsers to extract structured JSON from various document formats.

Develop and maintain data pipelines for ERP and ticketing systems.

Implement text-embedding or LLM-based entity extraction to enhance document feeds.

Write unit tests and data-quality checks to ensure pipeline reliability.

Web | Mobile Apps | Software Development We are a digital agency that combines cost-effective and outsourced digital teams with experienced British management you can trust. Offshorly have been operating in the UK and Manila since 2014 and have grown to a team of over 40 Developers / Designers with a wide range of industry experience. We offer two core services - dedicated digital teams working solely for your business and fully managed project work. Key points: Dedicated teams with a history of delivering for startups and established businesses Management and development team with extensive web and web apps experience A multitude of E-commerce experience across many platforms We are transparent in everything we do so you see progress all stages We pass on offshore savings to you without compromising on the deliverable Flexible and clear monthly models meaning you spend on exactly the resource you need Specialties: Full technical understanding of multiple mobile, web and web service technologies. Project delivery, full project cycle experience, UX, wireframing and scoping, budget, schedule and resource control. Desktop, iOS, Android, We believe in the transformative power of mobile particularly in the developing world. To that end we dedicate 10% of our teams time to local social projects in the Philippines.

Job description

What The Engineer Will Actually Do

P1 | Build and schedule Python parsers that extract structured JSON from PowerPoint, PDF, and Excel documents, then land the data in Databricks Bronze → Silver tables.
P1 | Develop/maintain simple Auto Loader or Fivetran pipelines for ERP and ticketing systems.
P2 | Add basic text‑embedding or LLM‑based entity extraction (LangChain or open‑source transformers) to enrich the document feed.
P3 | Write unit tests and lightweight data‑quality checks (Great Expectations) so parsing errors do not break the pipeline.
P3 | Produce concise handover docs for our future data architect.

Skill Set

Must‑have (core):

2‑4 years building ETL or ELT pipelines with Databricks or Snowflake (Delta/Parquet, Spark SQL, Airflow or similar).
Solid Python (pandas, PySpark) and experience parsing Office files with libraries such as python‑pptx, openpyxl, pdfplumber, or PyPDF.
Basic SQL tuning and ability to work with structured schemas.
Git and CI/CD familiarity.

Nice‑to‑have (bonus)