Our client's Enterprise Research Infrastructure and Services is immediately seeking a Senior Scientific Computing Systems Administrator. This position works within a multi-disciplinary Scientific Computing and Big-Data teams that architects, builds, maintains and supports the scientific computing and analytics systems IDEA (Integrated Data Environment for Analytics) and HPC Platforms for the research mission. The role is responsible for the build and administration of our Linux High Performance Computing (HPC) ecosystem and includes managing the systems from every aspect of Linux, the hardware and its network, deployment/patch automation and technical troubleshooting. The role is challenging and varied, requiring technical, interpersonal and problem solving abilities.
Using your knowledge of research infrastructure, high-throughput networked Linux systems and scientific computing tools, you will develop and maintain the infrastructure needed to support high-throughput data analyses and further the research that includes translational and clinical studies needed to transition findings from the bench to the bedside. You will be responsible for operating the computing cluster and assisting the user community in its use. Included in the cluster service are Linux remote desktop capability, 6000 CPU cores for batch-processing, GP-GPU equipped systems for machine learning, Intel Xeon Phi co-processors and numerous scientific applications. Key technical aspects of the role are Linux system administration, scientific software installation and troubleshooting performance issues relating to software, networking or server hardware. Experience using comparable computing clusters is essential. In the course of expanding and improving the service capability, you will interface with commercial hardware and software vendors, select and deploy new technologies and create how-to guides for end users. Working with the Information security and Privacy teams you will ensure systems and procedures adhere to organizational security standards, values, and HIPAA guidelines. Management of the environment utilizes industry standard tools for job scheduling, hardware monitoring and software configuration management.
location: Somerville, Massachusetts
job type: Contract
salary: $55 - 65 per hour
work hours: 9am to 5pm
- Cluster and Systems Administration: Manage and administer production systems used by researchers and Research Centers
- Analyzes result of server monitoring and implement changes to improve performance, processing and utilization. Proposes, maintains and enforces polices, practices and security procedures.
- Develop and maintain system documentation as well as user-facing knowledge base articles and how-to guides.
- Evaluate, select and deploy hardware and/or cloud solutions for research scientific computing. This includes CPU and GPU-based compute, high speed networking and data storage.
- Analyze and resolve customer and technical problems: Tuning cluster scheduling parameters, memory / CPU contention, scientific application compilation and run-time issues. Troubleshoot scheduler submission problems.
- Configure job scheduling parameters for equitable resource sharing and optimum throughput
- Provide break/fix support, setup/installation support, escalation support, and solutions support
- Create and close tasks/tickets and work orders within the enterprise service tracking tool using established standards
- Develop, publish and maintain knowledgebase articles and documentation on systems features, best practices and usage how-to's as well as training and reference materials for the community using the ERIS wiki and knowledge management tools.
- Responsible for the inventory and tracking of HPC computer related equipment.
- field work within the corporate datacenter
- Maintain department service standards, with attention to personal/behavioral, staff teamwork and customer-staff interaction guidelines.
- BA/BS/engineering degree required or equivalent combination of skills/experience. Advanced degree in engineering or related scientific discipline preferred
- 5 years minimum experience in managing/administering Linux server environments. Preferred experience within scientific computing.
- Strong verbal and written communication, ability to write clear technical documentation.
- Excellent customer support skills.
- 3+ years experience using computational clusters, HPC and/or grid computing environments.
- Demonstrated interest in the fields of medical informatics, health sciences and research.
- RHEL certifications a plus.
- Experience with at least one cluster scheduling system (LSF, GridEngine, Slurm)
- Experience with server deployment technologies (kickstart, PXE, IPMI)
- Experience with one or more scripting technologies (shell scripting, Ruby or Python).
- A combination of education and experience may be substituted for requirements
Equal Opportunity Employer: Race, Color, Religion, Sex, Sexual Orientation, Gender Identity, National Origin, Age, Genetic Information, Disability, Protected Veteran Status, or any other legally protected group status.