Search by job, company or skills

SKY ICT PCL

Lead DevOps Engineer / Senior DevOps Engineer

Fresher
Save
  • Posted 17 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Senior technical leader driving cloud-native, provider-agnostic infrastructure strategy across hybrid environments (Proxmox VE on-prem, hyperscaler cloud, GCP minority cloud). Balances technical direction with people leadership, ensuring systems are portable, resilient, secure, cost-optimized, and developer-friendly.

Responsibilities:

Infrastructure Leadership

  • Architect hybrid infrastructure spanning on-premises and multi-cloud.
  • Define provider-agnostic standards to avoid lock-in.
  • Own IaC strategy with Terraform multi-provider modules.
  • Lead disaster recovery, HA, SLA/SLO governance.

On-Premises Infrastructure (Proxmox VE)

  • Manage Proxmox clusters, HA groups, VM/LXC provisioning.
  • Govern storage (ZFS, Ceph, NFS/iSCSI) and SDN.
  • Integrate with IaC/GitOps workflows.
  • Operate Proxmox Backup Server, enforce RTO/RPO.
  • Use Proxmox as compute layer for Kubernetes clusters.
  • Monitor via Prometheus/Grafana, enforce CIS security baselines.

Cloud-Native Platform Engineering

  • Standardize Kubernetes runtime across environments.
  • Drive GitOps-first delivery (ArgoCD/Flux).
  • Use Helm/Kustomize for packaging.
  • Adopt OpenTelemetry for observability.
  • Enforce service mesh (Istio, Cilium, Linkerd).
  • Apply policy-as-code (OPA, Kyverno).
  • Design provider-agnostic CI/CD pipelines.

Multi-Cloud Strategy

  • Select providers based on workload/cost, not inertia.
  • Manage GCP for analytics/Kubernetes.
  • Onboard additional providers (Alibaba, OCI, Hetzner, etc.) seamlessly.
  • Enforce cross-cloud networking via WireGuard/Tailscale.
  • Centralize identity federation via OIDC/SAML.

Internal Developer Platform (IDP)

  • Own IDP as a product with roadmap and SLAs.
  • Provide self-service provisioning across environments.
  • Enable ephemeral environments on demand.
  • Maintain service catalog/developer portal (Backstage, Port.io).
  • Enforce RBAC/policies via OPA/Kyverno.
  • Measure DevEx via DORA metrics and adoption rates.

FinOps & Cost Governance

  • Treat cloud spend as engineering concern.
  • Use provider-agnostic FinOps tooling (Kubecost, OpenCost).
  • Apply per-provider governance (AWS/Azure RIs, GCP CUDs, Proxmox TCO).
  • Integrate billing APIs into unified dashboards.

People Management & Leadership

  • Lead platform engineers around cloud-native teams.
  • Build skills in open-source tech (Kubernetes, Terraform, Prometheus).
  • Develop FinOps and DevEx champions.
  • Advocate ROI of cloud-native investment.

Strategic & Cross-Functional

  • Own infrastructure roadmap.
  • Make workload placement decisions based on cost, latency, compliance.
  • Partner with Security/Compliance/Engineering for unified governance.

Qualifications:

Required Skills

  • Cloud-Native Platform: Kubernetes, Helm, Kustomize, GitOps.
  • IaC & Automation: Terraform, Ansible, Packer, Crossplane.
  • Networking/Service Mesh: Istio, Cilium, Linkerd, WireGuard, OVS.
  • Observability: OpenTelemetry, Prometheus, Grafana, Jaeger, ELK.
  • Policy & Security: OPA, Kyverno, Vault, DevSecOps.
  • On-Prem (Proxmox VE): Cluster mgmt, HA, ZFS, Ceph, PBS, SDN.
  • Cloud Providers: AWS/Azure (primary), GCP (minority).
  • IDP Tools: Backstage, Port.io, Humanitec.
  • CI/CD: GitLab CI, GitHub Actions, ArgoCD, Tekton.
  • FinOps: Kubecost, OpenCost, CloudHealth.
  • Languages/Scripting: Python, Bash, Go, Terraform HCL, TypeScript.
  • Leadership: Team management, roadmap planning, stakeholder communication.

Plus Skills

  • Experience integrating non-hyperscaler clouds (Alibaba, OCI, Hetzner, Cloudflare, Huawei, IBM).
  • Ability to rapidly onboard providers into cloud-native stack.

Nice to Have – AI-Powered Automation

  • AI-assisted IaC generation (Terraform Copilot, Pulumi AI).
  • AIOps for observability (Grafana ML, Elastic AIOps).
  • AI-powered runbook automation (LangChain, OpenAI).
  • ChatOps with AI for infra queries and cost summaries.
  • Autonomous infra agents for natural language provisioning.

Key KPIs

  • Cloud-native adoption rate (% workloads on Kubernetes).
  • IDP adoption rate (% teams using self-service).
  • Developer onboarding velocity.
  • Deployment frequency & lead time (DORA metrics).
  • Proxmox cluster availability ≥99.9%.
  • Provider portability score (% workloads redeployable cross-cloud).
  • Multi-cloud + on-prem cost savings.
  • MTTR for production incidents.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 150599023