HPC Solution Architect

2-5 years
a month ago
Job Description

XENOptics Limited is a cutting edge research and development company with partners and offices worldwide. A member of the XENON Technology Group.

The Role

We are seeking an experienced and talented engineer to join our Professional Services team. This role involves close collaboration with XENOPTICS architects and engineers, customers and partners to design, install, configure, integrate, test, deploy, manage, maintain, support, and troubleshoot our systems and solutions.

  • Analyse challenging customer requirements and develop leading-edge solutions in HPC, storage, networking, data management and protection, public/private/hybrid cloud
  • Integrate XENOPTICS & XENOPTICS partner products into a variety of environments
  • Execute integration, deployment, and migration projects
  • Develop customer-facing tools and documentation
  • Support and maintain our solutions and solve complex issues in the field
  • Strong presentation and verbal communication skills
  • Strong understanding of infrastructure components and how they are tracked and managed in various systems
  • Support escalations with onsite consulting management

Ideal Profile

We require a candidate with strong knowledge of Linux system administration, compute/network/storage/data management, HPC/clustering solutions, virtualisation and containerisation technologies, as well as strong support, troubleshooting and consultative skills. AWS and Azure technical certification will be highly regarded.

Required Skills & Experience

  • Linux cluster design, configuration, management
  • HPC cluster management solutions: Bright Cluster Manager, OpenHPC, etc.
  • AI and Deep Learning solutions and frameworks: GPU accelerated systems, Tensorflow, Theano, Torch, etc.
  • Job management systems: PBS, Slurm, LSF, torque, SGE, etc.
  • Virtualisation technologies: KVM, vmware, etc.
  • Monitoring tools: Zabbix, Ganglia, Nagios, Icinga
  • Storage Technologies and Filesystems: Ceph, Lustre, Spectrum Scale/GPFS, Weka, BeeGFS, NFS
  • Storage configuration, management, backup, archive, tiering
  • Scripting and programming: bash, Python, PHP, Perl, C, C++, Fortran, etc.
  • Cloud environments (AWS, Azure, GCP, Oracle): architecture, configuration, management, cloud bursting, cost control
  • Linux distros in various flavours (Red Hat, Rocky, CentOS, Ubuntu, Debian, SUSE, etc.) Windows and Windows server environments
  • Container technologies (especially docker, Kubernetes): full life cycle from designing and building containers to deployment, management, upgrades
  • Configuration management tools (e.g. Ansible): infrastructure as code, documented and reproducible environments
  • Networking: Infiniband and Ethernet, modern switch operating systems (Cumulus, Sonic, etc.)
  • More than 5 years of experience in Linux/Unix administration and support
  • 3+ years of experience with problem resolution in large scale production environments
  • More than 2 years of experience with direct professional/managed services, customer support, monitoring and troubleshooting solutions through remote access
  • Ability to travel domestically and internationally when required
  • Must be able to empathize with all partners and customers and take the time to fully understand their points of view, needs and problems to help delivery results.
  • Computer Science or Engineering degree desirable

If you're looking to join a highly innovative Australian company, tackle big challenges we would like to meet you.

What's on Offer

  • Relocation to Chiangmai Thailand
  • Visa and Workpermit are provided
  • A role that offers a breadth of learning opportunities

JOB TYPE

Industry

Other

Function

Skills

C
Php
AWS
C++
Remote Access
Nfs
Lsf
Hpc
Spectrum Scale/GPFS
BeeGFS
SGE
GPU accelerated systems
Linux/Unix administration
containerisation
public/private/hybrid cloud
direct professional/managed services
Torch
Slurm
pbs

People Also Considered