About the Role:
As a System Analyst (Service Monitoring), you'll be a critical part of our Technology Operations team, ensuring the reliability and performance of our platform. You'll be on the front lines, proactively identifying and resolving issues, keeping our services running smoothly for our users. This role offers a fantastic opportunity to gain hands-on experience with modern monitoring tools and contribute directly to the success of a growing company.
Responsibilities:
- Proactively monitor the infrastructure health check, performance, capacity, response time, as well as our success rate / fail rate of services using monitoring tools.
- Ensure all batch processing are triggered successfully. Troubleshoot it as needed.
- Support changes in Production environment for smooth transition.
- Mitigate incident impacts such as reproduce usecases, enable maintenance mode.
- Follow Standard of Procedures for monitoring and incident management.
- Keep documenting all issues, resolutions, and actions taken in our ticketing system.
- Escalate issues to relevant teams, ensuring seamless communication and collaboration.
- Communicate effectively with internal and external stakeholders, keeping them informed of service status and ongoing incidents.
- Automate routine checks and tasks using scripting and tools.
- Contribute to the development and improvement of our monitoring and alerting systems.
- Participate in on-call rotations (24/7/365 shift pattern 12-hour shifts, 4 on 3 off).
Qualifications:
- Passion for technology and a desire to learn and grow in a fast-paced environment.
- Bachelor's degree in Computer Science, Information Technology, or a related field.
- 0-3 years of experience in NOC operations, IT support, or a related role.
- Familiarity with monitoring tools (e.g., Grafana, Datadog), automation tools (e.g., Jenkins, Rundeck).
- Strong sense of responsibility and urgency to resolve issues.
- Excellent communication and collaboration skills, both written and verbal.
- Ability to work effectively under pressure and prioritize tasks in a fast-paced environment.
- A proactive and positive attitude, with a strong sense of ownership.
- Experience with cloud platforms (e.g., AWS) is a plus.