2-4 years of experience in building ETL or ELT pipelines using Databricks or Snowflake., Proficient in Python, particularly with libraries like pandas and PySpark., Basic knowledge of SQL and experience with structured schemas., Familiarity with Git and CI/CD processes..
Key responsibilities:
Build and schedule Python parsers to extract structured JSON from various document formats.
Develop and maintain data pipelines for ERP and ticketing systems.
Implement text-embedding or LLM-based entity extraction to enhance document feeds.
Write unit tests and data-quality checks to ensure pipeline reliability.
Report This Job
Help us maintain the quality of our job listings. If you find any issues with this job post, please let us know.
Select the reason you're reporting this job:
Offshorly - Mobile First Websites | Enterprise Solutions | Digital Outsourcinghttp://www.offshorly.com
11 - 50
Employees
About Offshorly - Mobile First Websites | Enterprise Solutions | Digital Outsourcing
Web | Mobile Apps | Software Development
We are a digital agency that combines cost-effective and outsourced digital teams with experienced British management you can trust.
Offshorly have been operating in the UK and Manila since 2014 and have grown to a team of over 40 Developers / Designers with a wide range of industry experience.
We offer two core services - dedicated digital teams working solely for your business and fully managed project work.
Key points:
Dedicated teams with a history of delivering for startups and established businesses
Management and development team with extensive web and web apps experience
A multitude of E-commerce experience across many platforms
We are transparent in everything we do so you see progress all stages
We pass on offshore savings to you without compromising on the deliverable
Flexible and clear monthly models meaning you spend on exactly the resource you need
Specialties:
Full technical understanding of multiple mobile, web and web service technologies.
Project delivery, full project cycle experience, UX, wireframing and scoping, budget, schedule and resource control. Desktop, iOS, Android,
We believe in the transformative power of mobile particularly in the developing world. To that end we dedicate 10% of our teams time to local social projects in the Philippines.
P1 | Build and schedule Python parsers that extract structured JSON from PowerPoint, PDF, and Excel documents, then land the data in Databricks Bronze → Silver tables.
P1 | Develop/maintain simple Auto Loader or Fivetran pipelines for ERP and ticketing systems.
P2 | Add basic text‑embedding or LLM‑based entity extraction (LangChain or open‑source transformers) to enrich the document feed.
P3 | Write unit tests and lightweight data‑quality checks (Great Expectations) so parsing errors do not break the pipeline.
P3 | Produce concise handover docs for our future data architect.
Skill Set
Must‑have (core):
2‑4 years building ETL or ELT pipelines with Databricks or Snowflake (Delta/Parquet, Spark SQL, Airflow or similar).
Solid Python (pandas, PySpark) and experience parsing Office files with libraries such as python‑pptx, openpyxl, pdfplumber, or PyPDF.
Basic SQL tuning and ability to work with structured schemas.
Git and CI/CD familiarity.
Nice‑to‑have (bonus)
Exposure to LangChain, Hugging Face Transformer, or any LLM inference workflow.
Experience adding embeddings to tables for downstream ML or search.
Great Expectations or similar data‑quality tooling.
Familiarity with Unity Catalog or Snowflake RBAC concepts.
Required profile
Experience
Spoken language(s):
English
Check out the description to know which languages are mandatory.