TOC Lead Systems Engineer
Start Your Free Trial Now!
Position: TOC Lead Systems Engineer
Location: 100% REMOTE
Job Type: 12 Month Contract to Hire
JOB TITLE: TOC LEAD SYSTEMS ENGINEER
Job Description: Technology Operations Center (TOC) Monitoring Lead
This role is responsible for helping lead and develop the team and processes that support the First-to-Know capability of the Technical Operations Center and requires coordination and multi-tasking skills to ensure maximum service availability of our systems. TOC Engineers and analysts monitor the performance and capacity of enterprise-wide systems using a variety of tools to identify hardware, software, and environmental alerts. This team provides eyes-on-glass monitoring of Client systems and will investigate, verify, report, communicate and escalate any issues.
• Mentor and lead a technical team of monitoring engineers, analysts, and Technology Operations Center (TOC) operators
• Provide technical guidance for directing, maintaining, and monitoring the health and wellness of all critical infrastructure and applications
• Focused on the development of people, process, and technology required to provide 24x7x365 proactive monitoring services to the organization
• Help develop and implement Event Management through the integration of Dynatrace and ServiceNow
• Oversee daily operations of Monitoring and Event Management of Infrastructure and Application performance monitoring for the enterprise
• Utilize various monitoring tools including DynaTrace to identify and diagnose complex problems and factors affecting system performance
• Develop runbooks in support of monitoring and alerting events
• Monitor all SLAs for the TOC and provide timely reports
• Perform supervisory and staff management duties as assigned
• Prepare or participate in developing specific TOC performance reports
• Ensure daily operational readiness by managing and performing daily health checks of critical systems
• Lead kickoff meetings with various teams to determine what is needed to perform monitoring.
• Create accurate process documents that can be followed to performed various tasks within the TOC.
• Provide technical/management leadership on major tasks or technology assignments
• Assist leadership in establishing goals and plans that meet company and department objectives
• Author communications that are sent out to the Client user community and Client leadership about outages, upgrades, IT challenges, etc.
• Candidates will also work with the technical teams to write up outage summaries and lessons learned reports for senior management to understand the impact to the Client community and corrections to avoid future occurrences.
• This team provides 24x7x365 support to the Client customer community.
This role will require shift work. The Client Technology Operation Center covers a 24/7 operation and members are asked to be flexible in providing coverage outside of their normal shift hours, when the need arises. Position is for full time employment and can be performed fully remote.
Additional responsibilities include:
• Provide eyes on glass monitoring using various monitoring tools such as Dynatrace, Splunk, SCOM, ITM, SolarWinds and other monitoring tools
• Investigate and verify alerts and reported issues
• Escalate issues to the Tier 2 operations team when necessary
• Access devices and analyze graphing
• Review device logs documentation and analysis
• Perform real time monitoring of vital systems
• Provide general event management and communication management support for the TOC
• Support a 24x7 system monitoring service to proactively identify and assess problems before the customer reports them
• Support response time -ensuring system information, contact information and processes are in place to coordinate the necessary IT response to system problems
• Rely on your teammates and be an active collaborator and participant within the group
• Provide event management and support to service owners and IT managers
• Author reports, prepare data for status/findings presentations, prepare flowcharts and draft process documents for team activities.
• Communicate an honest interpretation of data to all stakeholders; support and facilitate open communication between all stakeholders.
Required Qualifications:
• Bachelors degree, preferably in an IT discipline, or equivalent IT work experience
• 10+ years experience of NOC/TOC experience in progressively responsible roles
• At least five (2) years in a direct supervisor capacity with experience managing a NOC/TOC supporting a large enterprise level operation
• Experience in analysis, implementation and evaluation of current IT processes and workflows to help the organization run efficiently and effectively
• Excellent organizational, leadership, and communication skills
• Able to manage multiple priorities while balancing urgent requests with shifting timelines and deliverables
• Experience with enterprise dashboards and monitoring tools
• Experienced in Event Management
• Able to accurately interpret various metrics from monitoring tools.
• Strong analytical skills and able to collate and interpret data from various sources.
• Strong communicator with a natural aptitude for dealing with people
Desired Qualifications:
• Familiarity with developing and implementing mature operations center processes, tooling and automation
• Working knowledge of IT Infrastructure Operations (such as systems and network administration, security, various tools, etc.)
• Experienced in Event Management with ServiceNow
• Dynatrace Monitoring experience preferred
• Experience with some or all the following monitoring and reporting tools: Splunk, Dynatrace, SCOM, ITM, SolarWinds, ServiceNow
• Experience working in Incident Response as an Incident Resolver
• ITIL certified or equivalent experience
RRAI
Nubika - Cloud Solutions
Everlight Solar
EPAM Systems
NewyTechPeople