Key Responsibilities Infrastructure Management
● Design, implement, and maintain cloud infrastructure on AWS and Azure using infrastructure as code (primarily Terraform)
● Manage and optimize Kubernetes clusters for production workloads
● Implement and maintain CI/CD pipelines for automated testing and deployment
● Collaborate with development teams to implement containerization strategies
● Monitor and optimize system performance, capacity, and availability
● Implement and maintain robust logging and monitoring solutions
● Implement and maintain security controls to meet FedRAMP, NIST 800-53 Rev 5, and NIST 800-171 requirements
● Participate in security assessments and remediation efforts
● Implement and maintain security baseline configurations
● Conduct regular security reviews of infrastructure and applications
● Document and maintain security procedures and policies
Disaster Recovery & Business Continuity
● Develop, implement, and regularly test disaster recovery procedures
● Maintain and update business continuity plans
● Implement automated backup and recovery solutions
● Conduct simulated disaster recovery exercises
● Ensure data replication and redundancy across multiple regions
● Develop and maintain runbooks for critical system recoveries
● Participate in an on-call rotation (1 week every 4-6 weeks)
● Troubleshoot and resolve complex infrastructure issues
● Respond to and remediate security incidents
● Maintain comprehensive documentation of systems and processes
● Use Jira for task management, incident tracking, and workflow automation
● Provide mentorship and guidance to junior team members
Qualifications Required Skills & Experience
● 3+ years of experience in DevOps, Site Reliability Engineering, or similar roles
● Expert-level knowledge of AWS services including S3, EC2, EKS, ALB, FSX, WorkSpaces, Directory Services, ECS, Fargate, RDS, and Lambda
● Proficient with Azure services (equivalent to AWS services mentioned above)
● Advanced knowledge of Terraform for infrastructure as code
● Deep understanding of Kubernetes administration and architecture
● Strong experience with Git version control and CI/CD pipelines
● Experienced with containerization technologies (Docker, Kubernetes)
● Familiarity with FedRAMP, NIST 800-53 Rev 5, and NIST 800-171 requirements
● Experience implementing and maintaining security controls for cloud environments
● Experience with implementing and testing disaster recovery procedures
● Strong documentation skills and experience with Jira
● Excellent verbal and written communication skills
● Ability to work independently and as part of a team
● Problem-solving skills and ability to work under pressure
Required Certifications (At least one of the following)
● AWS Certified DevOps Engineer – Professional
● AWS Certified Solutions Architect – Professional
● Certified Kubernetes Administrator (CKA)
● HashiCorp Certified: Terraform Associate
● Experience with Adobe Connect and/or Adobe Learning Manager
● Experience with eLearning platforms or learning management systems
● Experience with PaaS and SaaS offerings
● Experience with network security and firewall configuration
● Experience with database administration (SQL and NoSQL)
● Experience with scripting languages (Python, Bash, PowerShell)
● Experience with configuration management tools (Ansible, Chef, Puppet)
● Experience with log aggregation and analysis tools
● Experience with monitoring tools (Prometheus, Grafana, CloudWatch)
● Must be a U.S. citizen
● Must reside in the United States, preferably on the East Coast
● Ability to obtain and maintain security clearances if required
● 100% remote position
● Flexible work schedule with core hours
● Occasional off-hours work required for maintenance windows
● On-call rotation (1 week every 4-6 weeks)
● Occasional travel may be required for team meetings or training
● Medical, Dental, Vision + More Benefits
Compensation$100-130k
Remote IT World
Zumigo
BairesDev
Flodesk
Exadel