HPC Cluster management:
• Administration of HPC cluster for Computer Aided Engineering (CAE) and Render Cluster
• Maintenance of in-house shell scripts
• Failed computation investigation, problem determination, incident resolution, system support, coordination with vendor
• L1/L2 support on the HPC cluster for the customer
• Maintain application running on the cluster
• Manage network aspects (DNS, DHCP, internet access, …) with Network Team
• Perform daily monitoring, and ensure cluster high availability
• Manage patching and upgrade of the managed environment
• Monitor regular backup and ensure cluster high availability
• Create long term environment management centralization
• Collaborate with other technical team when required
Provide support when necessary for the customer’s project:
HPC Cluster migration to AWS Cloud
Support the customer when needed on the following (out of maintenance scope):
Support the customer when needed on the following (out of maintenance scope):