Help us maintain the quality of our job listings. If you find any issues with this job post, please let us know.
Select the reason you're reporting this job:
At Cummins, we empower everyone to grow their careers through meaningful work, building inclusive and equitable teams, coaching, development and opportunities to make a difference. Across our entire organization, you'll find engineers, developers, and technicians who are innovating, designing, testing, and building. You'll also find accountants, marketers, as well as manufacturing, quality and supply chain specialists who are working with technology that's just as innovative and advanced.
From your first day at Cummins, we’re focused on understanding your talents, current skills and future goals – and creating a plan to get you there. Your journey begins with planning your development and connecting to diverse experiences designed to spur innovation. From our internships to our senior leadership roles, we attract, hire and reward the best and brightest from around the world and look to them for new ideas and fresh perspectives. Learn more about #LifeAtCummins at cummins.com/careers.
Please note even though the GPP mentions Remote, this is a Hybrid role.
Key Responsibilities
Implement and automate deployment of distributed systems for ingesting and transforming data from various sources (relational, event-based, unstructured).
Continuously monitor and troubleshoot data quality and integrity issues.
Implement data governance processes and methods for managing metadata, access, and retention for internal and external users.
Develop reliable, efficient, scalable, and quality data pipelines with monitoring and alert mechanisms using ETL/ELT tools or scripting languages.
Develop physical data models and implement data storage architectures as per design guidelines.
Analyze complex data elements and systems, data flow, dependencies, and relationships to contribute to conceptual, physical, and logical data models.
Participate in testing and troubleshooting of data pipelines.
Develop and operate large-scale data storage and processing solutions using distributed and cloud-based platforms (e.g., Data Lakes, Hadoop, Hbase, Cassandra, MongoDB, Accumulo, DynamoDB).
Use agile development technologies, such as DevOps, Scrum, Kanban, and continuous improvement cycles, for data-driven applications.
Responsibilities
Qualifications:
College, university, or equivalent degree in a relevant technical discipline, or relevant equivalent experience required.
This position may require licensing for compliance with export controls or sanctions regulations.
Competencies
System Requirements Engineering: Translate stakeholder needs into verifiable requirements and establish acceptance criteria.
Collaborates: Build partnerships and work collaboratively with others to meet shared objectives.
Communicates Effectively: Develop and deliver multi-mode communications that convey a clear understanding of the unique needs of different audiences.
Customer Focus: Build strong customer relationships and deliver customer-centric solutions.
Decision Quality: Make good and timely decisions that keep the organization moving forward.
Data Extraction: Perform ETL activities from various sources and transform them for consumption by downstream applications and users.
Programming: Create, write, and test computer code, test scripts, and build scripts using industry standards and tools.
Quality Assurance Metrics: Apply measurement science to assess whether a solution meets its intended outcomes.
Solution Documentation: Document information and solutions based on knowledge gained during product development activities.
Solution Validation Testing: Validate configuration item changes or solutions using best practices.
Data Quality: Identify, understand, and correct flaws in data to support effective information governance.
Problem Solving: Solve problems using systematic analysis processes and industry-standard methodologies.
Values Differences: Recognize the value that different perspectives and cultures bring to an organization.
Qualifications
Skills and Experience Needed:
Must-Have:
3-5 years of experience in data engineering with a strong background in Azure Databricks and Scala/Python.
Hands-on experience with Spark (Scala/PySpark) and SQL.
Experience with SPARK Streaming, SPARK Internals, and Query Optimization.
Proficiency in Azure Cloud Services.
Agile Development experience.
Unit Testing of ETL.
Experience creating ETL pipelines with ML model integration.
Knowledge of Big Data storage strategies (optimization and performance).
Critical problem-solving skills.
Basic understanding of Data Models (SQL/NoSQL) including Delta Lake or Lakehouse.
Quick learner.
Nice-to-Have:
Understanding of the ML lifecycle.
Exposure to Big Data open source technologies.
Experience with SPARK, Scala/Java, Map-Reduce, Hive, Hbase, and Kafka.
SQL query language proficiency.
Experience with clustered compute cloud-based implementations.
Familiarity with developing applications requiring large file movement for a cloud-based environment.
Exposure to Agile software development.
Experience building analytical solutions.
Exposure to IoT technology.
Work Schedule: Most of the work will be with stakeholders in the US, with an overlap of 2-3 hours during EST hours on a need basis.
Job Systems/Information Technology
Organization Cummins Inc.
Role Category Remote
Job Type Exempt - Experienced
ReqID 2409179
Relocation Package Yes
Required profile
Experience
Level of experience:Mid-level (2-5 years)
Spoken language(s):
English
Check out the description to know which languages are mandatory.