job summary:
Randstad Digital is hiring and we're looking for someone like YOU to join our team! If you are seeking a new opportunity, looking to grow in your career, or you know someone who is - we want to hear from you! Take a look at the below opportunity, or feel free to visit RandstadUSA.com to view and apply.

location: Iselin, New Jersey
job type: Contract
salary: $100.78 - 105.78 per hour
work hours: 8am to 5pm
education: Bachelors

responsibilities:
We are looking for a highly skilled Senior Site Reliability and Operations Engineer (SRE) with extensive experience in implementation of Kubernetes-based distributed caching and solutions. This role requires a robust foundation in software development, infrastructure automation, reliability engineering and large enterprise scale implantations. Candidate will be responsible for designing, implementing, and maintaining high-performance distributed systems, ensuring reliability, scalability, and efficiency.

qualifications:
Required Skills & Qualifications:

- robust experience in Kubernetes (OpenShift and on-prem/cloud clusters).-

- Understanding of programming languages like Java, Go, or Python.

- Experience with containerization technologies (Docker, Helm, etc.).

- robust knowledge of CI/CD pipelines (Jenkins, ArgoCD, GitHub Actions, Harness).

- Hands-on experience with observability tools (Prometheus, Grafana, Loki, Jaeger).

- Understanding of networking, service meshes (Istio/Linkerd), and security best practices in Kubernetes.

- Experience with multi-cluster and hybrid cloud Kubernetes deployments

skills: Development & Implementation:

- Design, develop, and optimize distributed caching and compute grid solutions on Kubernetes/OpenShift

- Understanding of microservices and containerized workloads using Kubernetes, Docker, and Helm.

- Implement high-throughput compute grid solutions using Apache Ignite, GridGain, Coherence or similar technologies.

- Optimize application performance by leveraging caching strategies, load balancing, and efficient data distribution.

Site Reliability Engineering (SRE):

- Ensure high availability, scalability, and reliability of distributed systems.

- Implement observability, logging, and monitoring using tools like Splunk, Prometheus, Grafana, ELK, or OpenTelemetry.

- Automate infrastructure provisioning and deployments using Ansible, and Helm Charts.

- Understanding of CI/CD pipelines for seamless software deployment.

- Troubleshoot and resolve incidents related to platform, infrastructure and distributed caching and compute grids, ensuring minimal downtime.

Equal Opportunity Employer: Race, Color, Religion, Sex, Sexual Orientation, Gender Identity, National Origin, Age, Genetic Information, Disability, Protected Veteran Status, or any other legally protected group status.

At Randstad Digital, we welcome people of all abilities and want to ensure that our hiring and interview process meets the needs of all applicants. If you require a reasonable accommodation to make your application or interview experience a great one, please contact HRsupport@randstadusa.com.

Pay offered to a successful candidate will be based on several factors including the candidate's education, work experience, work location, specific job duties, certifications, etc. In addition, Randstad Digital offers a comprehensive benefits package, including: medical, prescription, dental, vision, AD&D, and life insurance offerings, short-term disability, and a 401K plan (all benefits are based on eligibility).

This posting is open for thirty (30) days.

senior site reliability and operations engineer.

job details

share this job.

related jobs.

scientist, personal care

quality assurance associate - now hiring

director of cloud operations

let similar jobs come to you