We are looking for a passionate and experienced Senior Site Reliability Engineer to join our team and play a crucial role in ensuring our cloud platform's security, reliability, scalability, and operational excellence.
location: Irvine, California
job type: Permanent
salary: $150,000 - 180,000 per year
work hours: 8am to 5pm
education: Bachelors
responsibilities:
- Serve as a technical subject matter expert for implementing and operating microservices on Kubernetes cloud-based platforms.
- Collaborate with the Cloud Technical Development and DevOps teams to deploy services to the multi-cloud platform.
- Perform load tests and chaos tests to ensure the scalability and reliability of microservices.
- Build observability for microservices and cloud platforms like AWS, OCI, Azure, and GCP.
- Write and execute disaster recovery plans in collaboration with the development and DevOps team.
- Analyze and resolve production risks caused by insufficient resources, such as node groups, CPU, memory, HPA scheduling, and JVM pre-warming.
- Write and maintain scripts for automation using languages like Python, Go, or Bash.
- Define and maintain the KPIs (SLA/SLO/SLI) for all cloud microservices with development teams to better understand the business.
- Create and maintain technical documentation, including architecture diagrams, design documents, and standard operating procedures.
- Guarantee adherence to security and compliance standards, including ISO27001, SOC2, and GDPR.
- Lead incident response efforts to troubleshoot and resolve production issues quickly.
- Perform post-incident analysis to identify root causes and potential workarounds or solutions.
- Assist with product/technology selection, including the implementation of proofs of concept (POCs).
- Be fluid and open to change and evolving processes and tools.
- Help to mentor and train less senior members of the team.
- Ability to be part of an on-call rotation and provide support after work hours and on weekends.
qualifications:
- Bachelor's degree in Computer Science, Information Technology, or a related field.
- 5+ years of experience as a Site Reliability Engineer.
- Proficiency in programming and scripting languages like Java, Python, Bash, or PowerShell.
- Hands-on experience in SRE, DevOps, cloud operations, and cloud security best practices.
- Strong knowledge of security technologies, including Identity and Access Management, Network Security, Application Security, and Data Protection.
- Strong problem-solving and analytical skills, with the ability to work independently and as part of a team.
- Experience in developing and maintaining technical documentation and implementing compliance requirements.
- Expert-level cloud certifications, including AWS Solutions Architect Professional, Azure Solutions Architect Expert, and GCP Professional Cloud Architect.
- Experience with container orchestration technologies (e.g., Kubernetes).
skills:
- Kubernetes
- Linux
- Docker
- AWS
- TroubleshootingCode/Configuration
- Communication clarity, collaboration, problem solving
Equal Opportunity Employer: Race, Color, Religion, Sex, Sexual Orientation, Gender Identity, National Origin, Age, Genetic Information, Disability, Protected Veteran Status, or any other legally protected group status.
At Randstad Digital, we welcome people of all abilities and want to ensure that our hiring and interview process meets the needs of all applicants. If you require a reasonable accommodation to make your application or interview experience a great one, please contact HRsupport@randstadusa.com.
Pay offered to a successful candidate will be based on several factors including the candidate's education, work experience, work location, specific job duties, certifications, etc. In addition, Randstad Digital offers a comprehensive benefits package, including: medical, prescription, dental, vision, AD&D, and life insurance offerings, short-term disability, and a 401K plan (all benefits are based on eligibility).
This posting is open for thirty (30) days.
Qualified applicants in San Francisco with criminal histories will be considered for employment in accordance with the San Francisco Fair Chance Ordinance.
Qualified applicants with arrest or conviction records will be considered for employment in accordance with the Los Angeles County Fair Chance Ordinance for Employers and the California Fair Chance Act.
We will consider for employment all qualified Applicants, including those with criminal histories, in a manner consistent with the requirements of applicable state and local laws, including the City of Los Angeles' Fair Chance Initiative for Hiring Ordinance.