Vacancy title:
Senior Site Reliability Engineer - Observability
Jobs at:
Cellulant CorporationDeadline of this Job:
23 December 2022
Summary
Date Posted: Monday, December 12, 2022 , Base Salary: Not Disclosed
JOB DETAILS:
Job Description:
As a member of the Observability team, you will be responsible to maintain and create new insights utilizing data from various sources. You will use a data-first approach to solving problems and implement solutions using your software development knowledge. You will be expected to champion automation efforts within the team from the deployment of your code to identifying opportunities for end-to-end automation in event and incident management.
Core Responsibilities:
• Your role is to Build, scale and manage our observability stack across our multi-tenant infrastructure including managing our observability tooling clusters, logging pipelines and telemetry system data.
• You will Actively engage and help our developers to improve the monitoring of their services.
• Actively drive initiatives towards better system design and implementation of new technologies.
• You will work to develop additional capabilities on our observability platforms by incorporating additional data types like clickstream data and frontend user interactions.
• You will drive key initiatives in modern observability concepts like, SLIs, SLOs, error budgets, distributed tracing, canonical logging, etc.
• You will collaborate with architects, leads and managers to foster a data-driven culture based on observability and reliability
• You will be responsible for developing machine learning capabilities into the observability systems to enhance signal and reduce noise.
• You will participate in observability on-call rotation to support any issues affecting the observability systems and to support other technology teams in investigations during major incidents.
Qualifications & Experience:
Must have experience:
• Familiar with programming language concepts (Go, Java, Ruby, Python, Javascript).
• Experience with cloud infrastructure and services, especially AWS.
• Experience defining, measuring, and improving Reliability Metrics (SLO/SLI), Observability (Monitoring, Logging-Tracing solutions), Operations Processes (Incident, Problem Management), and Operations Toil Reduction through Automation.
• Experience in programming/Scripting. Knowledge in python and MySQL, bash, Terraform, Ansible, gitlab and other scripting/ automation tools.
• Experience with distributed systems in a production operations environment.
• Multi-tasking and effective oral and communication skills.
Experience that will count in your favor:
• Good understanding of AWS services (Glue and Athena, Amazon Cloudwatch, QuickSight), Kubernetes (EKS), ElasticSearch/ OpenSearch , Newrelic or similar observability tools, Zabbix, Grafana, PagerDuty.
• Proven use of AI (ML/ DL) for data management/ Analysis.
• Solid experience is Software Development and/or Systems Administration.
Qualifications:
• Bachelor’s degree in an appropriate field of study, including computer science, engineering, information technology, Statistics, or related study.
Skills:
• Programming and Scripting Skills - PHP, Python, Bash, Perl, Java.
• Systems Administration.
• Data Analytics and Visualization
Personal attributes:
• Teamwork
• Initiative
• Willingness to learn
• Ability to represent the team autonomous squads.
Job Experience: No Requirements
Work Hours: 8
Level of Education: Bachelor Degree
Job application procedure
Use the link(s) below to apply on company website.
Senior Site Reliability Engineer - Observability
All Jobs
Join a Focused Community on job search to uncover both advertised and non-advertised jobs that you may not be aware of. A jobs WhatsApp Group Community can ensure that you know the opportunities happening around you and a jobs Facebook Group Community provides an opportunity to discuss with employers who need to fill urgent position. Click the links to join. You can view previously sent Email Alerts here incase you missed them and Subscribe so that you never miss out.