Job Description
Senior Site Reliability Engineer
Position Summary:
We are seeking a Senior Service Reliability Engineer who will be responsible for improving and
maintaining software development, test, and live infrastructure and services. The ideal candidate
will be self-motivated, articulate, have experience with Linux and other *NIX derivatives,
and be comfortable working in a fast-paced software development environment.
Responsibilities:
Support Change Healthcare Analytics Cloud, a mission-critical, platform in production and
development environments
Identify and drive improvements in infrastructure and system reliability, performance, monitoring,
and overall stability of the platform
Capacity planning and demand forecasting to meet systems demand, identifying performance
bottlenecks and devising tuning improvements
Build tools and automation that eliminate repetitive tasks and prevent the incident occurrence
Participate in 24x7 operational support and on-call rotation shifts
Qualifications:
- Minimum of 6 years of production applications and systems support
- Experience supporting, analyzing, and troubleshooting large-scale distributed mission-critical
- systems
- Systematic problem-solving approach and a strong sense of ownership to drive problems to
- resolution
- Experience configuring and managing web servers (Apache, Tomcat, Nginx) and RESTful web
- service applications
- Strong knowledge of Linux systems administration and architecture
- Experience with configuring, managing, and supporting AWS environments
- Proficiency working with Amazon Web Services (AWS) like EC2, EBS, ELB, S3, Route 53, RDS,
- Redshift and EMR in a highly available and scalable production environment
- Experience with continuous integration and deployment automation tools such as Jenkins,
- Harness, AWS Cloud Formation, Salt, or Puppet, Chef, Ansible
- Experience with SQL (MySQL, PostgreSQL)
- Experience with open source technologies (Kafka, Hadoop, HBase, Zookeeper, Oozie)
- Network knowledge (TCP/IP, UDP, DNS, Load balancing) and prior network administration
- experience is a plus
- Scripting experience with Shell, Python, or Ruby
- Experience documenting processes, systems, environments, and runbook procedures
- Experience with source control tools such as GIT/GitHub