Design, develop, and maintain CI/CD pipelines using GitLab CI/CD to support a robust and efficient software delivery lifecycle.
Manage and maintain infrastructure on cloud platforms (especially GCP) using Infrastructure as Code (IaC) practices with tools such as Terraform.
Deploy and operate applications on Kubernetes (GKE), including configuration for autoscaling, security, and cost optimization.
Utilize tools such as ArgoCD to implement GitOps and automate deployments across multiple environments.
Build and manage monitoring and alerting systems using tools such as Prometheus, Grafana, Cloud Monitoring (Stackdriver), and other observability tools.
Collaborate with Development, QA, and Security teams to ensure systems are performant, secure, and auditable.
Manage and maintain database systems such as PostgreSQL, MySQL, and MongoDB with a focus on performance, availability, and backup strategies.
Investigate and resolve production issues related to system performance, reliability, and scalability.
Contribute to the design of Disaster Recovery and High Availability strategies.
Advocate and lead the adoption of DevOps best practices within the team and provide technical mentorship to peers.
Qualifications
Minimum 3 years of experience in DevOps, SRE, or Infrastructure Engineering.
Strong expertise in cloud platforms (e.g., GCP, Huawei Cloud), including services such as networking (VPC), databases (Cloud SQL), load balancing, and IAM.
Proficient in containerization and orchestration, including Docker and Kubernetes (both workload and cluster operations), with experience in tools within the Kubernetes ecosystem such as Helm and Kustomize.
Experience in Infrastructure as Code (IaC) using tools like Terraform or Pulumi.
Hands-on experience with CI/CD pipelines (e.g., GitLab CI/CD) and GitOps practices (e.g., Argo CD).
Experience with observability and monitoring tools such as Prometheus, Grafana, and cloud-native monitoring solutions (e.g., Google Cloud Monitoring).
Solid understanding of system security, including IAM, secret management (e.g., HashiCorp Vault), network policies, authentication/authorization (OAuth2, OIDC), and access control (RBAC/ACL), as well as integration with Identity Providers (IdP).
Strong Linux administration skills, including shell scripting (bash/sh) for automation and troubleshooting.
Experience designing and managing service mesh architectures (e.g., Istio), including traffic management, mTLS, and policy enforcement.
Experience managing both relational and NoSQL databases (e.g., PostgreSQL, MySQL, MongoDB).
Proven experience operating large-scale, highly available, and scalable systems, particularly in microservices architectures.
Strong communication skills and ability to collaborate effectively with cross-functional teams (Dev, QA, Product, Security).