Lead Site Reliability Engineer
Kuala Lumpur, W.P. Kuala Lumpur
|Hybrid
|Direct hire
|Job ID 7573|Posted Dec 6, 2024JOB DESCRIPTION
Lead Site Reliability Engineer
Kuala Lumpur, Malaysia
About Horizontal: Established since 2003 in the US, Horizontal solves complex challenges across two distinct businesses: Horizontal Digital and Horizontal Talent. We are consistently recognized for being a top workplace and one of the fastest-growing private companies. Horizontal Talent specializes in staffing for IT, Digital & Creative, and Business & Strategy markets. We have global offices in US, UAE, India, and Malaysia.
Job Description: We are seeking a Senior Site Reliability Engineer (SRE) with a strong technical foundation in software development within an enterprise environment. The ideal candidate will be responsible for designing, developing, testing, delivering, and supporting large-scale data pipelines utilizing big data technologies such as Elasticsearch, Logstash, Kibana, Kafka, and more. Additionally, the successful candidate will demonstrate a passion for automation and will play a key role in automating incident resolution and enhancing the reliability aspects of our products and services.
Responsibilities:
Requirements:
Compulsory Technical Experience and Knowledge:
Preferred Technical Skills:
The above description is not designed to cover or contain a comprehensive listing of activities, duties, or responsibilities that are required of the employee for this job. Duties, responsibilities, and activities may change at any time with or without notice.
Kuala Lumpur, Malaysia
About Horizontal: Established since 2003 in the US, Horizontal solves complex challenges across two distinct businesses: Horizontal Digital and Horizontal Talent. We are consistently recognized for being a top workplace and one of the fastest-growing private companies. Horizontal Talent specializes in staffing for IT, Digital & Creative, and Business & Strategy markets. We have global offices in US, UAE, India, and Malaysia.
Job Description: We are seeking a Senior Site Reliability Engineer (SRE) with a strong technical foundation in software development within an enterprise environment. The ideal candidate will be responsible for designing, developing, testing, delivering, and supporting large-scale data pipelines utilizing big data technologies such as Elasticsearch, Logstash, Kibana, Kafka, and more. Additionally, the successful candidate will demonstrate a passion for automation and will play a key role in automating incident resolution and enhancing the reliability aspects of our products and services.
Responsibilities:
- Take ownership of incidents across designated clusters and automate incident resolution.
- Analyze and enhance reliability aspects of products and services, including post-incident reviews and reduction of toil.
- Automate repetitive tasks across various tools without enduring value.
- Provide innovative solutions and remove roadblocks, demonstrating out-of-the-box thinking in a diverse tool and environment landscape.
- Define Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
- Exhibit a strong interest in large-scale distributed systems.
- Contribute to incident response efforts and perform on-call duties.
- Meet key metrics including Automation Rate, Automation Success Rate, Mean Time to Detect, Mean Time to Repair, and SLIs defined based on business needs.
Requirements:
- Bachelor’s/Master’s degree in Engineering, Computer Science, IT, or equivalent experience.
- Minimum of 10 years of experience in Site Reliability Engineering or as a DevOps engineer.
- Experience working in an Agile-driven team.
- Strong interpersonal skills, customer-centric attitude, ability to deal with cultural diversity.
- Demonstrated ability to collaborate with local and remote teams in different time zones.
- Extensive background in providing maintenance and support for deployed systems.
- Strong critical thinking and problem-solving skills.
Compulsory Technical Experience and Knowledge:
- Proficient in Unix commands.
- Strong understanding of cloud concepts, DevOps standard methodologies, and CI/CD pipelines.
- Extensive technical expertise in Java full-stack / python development.
- Proficiency in CI/CD tools.
- Advanced Deployment experience with Ansible, Helm, ArgoCD.
- Experience with microservices development and understanding auto-scaling.
- Experience with REST API integration and design.
- Ansible, ElasticSearch, Logstash, Kafka (ELK), Kubunetes, Python, Unix, CI/CD, Collaboration Tools (Confluence, Jira), Rest API
Preferred Technical Skills:
- Familiarity with building application logging solutions and log management.
- Platform knowledge: RHEL, Openshift.
- Pipeline: Jenkins, Cloudbees.
The above description is not designed to cover or contain a comprehensive listing of activities, duties, or responsibilities that are required of the employee for this job. Duties, responsibilities, and activities may change at any time with or without notice.
Horizontal is proud to be an Equal Opportunity and Affirmative Action Employer.
We seek to provide employment opportunities to talented, qualified candidates regardless of race, color, sex/gender including gender identity and/or expression, national origin, religion, sexual orientation, disability, marital status, citizen status, veteran status, or any other protected classification under federal, state or local law.
In addition, Horizontal will provide reasonable accommodations for qualified individuals with disabilities. If you need to request a reasonable accommodation in order to complete the application or interview process, please contact us.
All applicants applying must be legally authorized to work in the country of employment.