job summary:
We are seeking a senior-level technical candidate to support production systems, with a strong emphasis on Site Reliability Engineering (SRE) principles, incident and problem management, change/release processes, and observability maturity. The role requires deep collaboration with development and business teams to understand application functionality and drive operational excellence.

You will be responsible for transforming and maturing global support services, promoting adoption of core toolsets, and strengthening partnerships across technology and business. The environment includes large-scale, Tier 1 applications (e.g., IIS/.NET/SQL), multi-tier web hosting, clustering, and load balancing. You will ensure compliance with corporate policies on security, documentation, audit, and change control.

This role also includes leading a team of onshore and offshore engineers to maintain system availability, performance, and reliability through automation, monitoring, and continuous improvement.

In this role, you will:

Lead complex, high-impact initiatives including systems consultation and SRE strategy implementation.
Drive observability improvements by identifying gaps in monitoring, logging, and tracing across platforms.
Collaborate with engineering teams to define SLIs, SLOs, and error budgets.
Automate operational tasks and incident response workflows using modern programming languages (e.g., Python, Go, Bash).
Design and implement scalable, resilient systems using infrastructure-as-code and CI/CD pipelines.
Conduct root cause analyses and postmortems to improve system reliability.
Consult on technical changes and enhancements with a focus on performance, scalability, and fault tolerance.
Partner with architects and engineers to align with enterprise strategies and ensure secure, maintainable solutions.
Lead and mentor a distributed team, fostering a culture of continuous learning and operational excellence.

Required Qualifications:

5+ years of Systems Engineering, Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
5+ years of experience in SRE, platform engineering, or production support roles.
5+ years of experience performing engineering and support tasks on Linux/Unix and Windows Servers
Experience with observability tools such as Prometheus, Grafana, AppD, or Spunk.
2 years of experience programming in one or more languages such as Python, Java, or Go.
1+ years of experience with Cloud technologies

Desired Qualifications:

Strong understanding of distributed systems, cloud platforms (OpenShift, Azure, GCP), and container orchestration (Kubernetes).
Familiarity with CI/CD workflows, version control systems, and infrastructure-as-code tools (e.g., Terraform, Ansible).
Experience with ThousandEyes and BigPanda
Proven ability to identify and remediate gaps in system observability and performance.
Excellent problem-solving skills and ability to lead cross-functional teams.

location: Charlotte, North Carolina
job type: Contract
salary: $70 - 78 per hour
work hours: 8am to 5pm
education: Associates

responsibilities:
We are seeking a senior-level technical candidate to support production systems, with a strong emphasis on Site Reliability Engineering (SRE) principles, incident and problem management, change/release processes, and observability maturity. The role requires deep collaboration with development and business teams to understand application functionality and drive operational excellence.

You will be responsible for transforming and maturing global support services, promoting adoption of core toolsets, and strengthening partnerships across technology and business. The environment includes large-scale, Tier 1 applications (e.g., IIS/.NET/SQL), multi-tier web hosting, clustering, and load balancing. You will ensure compliance with corporate policies on security, documentation, audit, and change control.

This role also includes leading a team of onshore and offshore engineers to maintain system availability, performance, and reliability through automation, monitoring, and continuous improvement.

In this role, you will:

Lead complex, high-impact initiatives including systems consultation and SRE strategy implementation.
Drive observability improvements by identifying gaps in monitoring, logging, and tracing across platforms.
Collaborate with engineering teams to define SLIs, SLOs, and error budgets.
Automate operational tasks and incident response workflows using modern programming languages (e.g., Python, Go, Bash).
Design and implement scalable, resilient systems using infrastructure-as-code and CI/CD pipelines.
Conduct root cause analyses and postmortems to improve system reliability.
Consult on technical changes and enhancements with a focus on performance, scalability, and fault tolerance.
Partner with architects and engineers to align with enterprise strategies and ensure secure, maintainable solutions.
Lead and mentor a distributed team, fostering a culture of continuous learning and operational excellence.

Required Qualifications:

5+ years of Systems Engineering, Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
5+ years of experience in SRE, platform engineering, or production support roles.
5+ years of experience performing engineering and support tasks on Linux/Unix and Windows Servers
Experience with observability tools such as Prometheus, Grafana, AppD, or Spunk.
2 years of experience programming in one or more languages such as Python, Java, or Go.
1+ years of experience with Cloud technologies

Desired Qualifications:

Strong understanding of distributed systems, cloud platforms (OpenShift, Azure, GCP), and container orchestration (Kubernetes).
Familiarity with CI/CD workflows, version control systems, and infrastructure-as-code tools (e.g., Terraform, Ansible).
Experience with ThousandEyes and BigPanda
Proven ability to identify and remediate gaps in system observability and performance.
Excellent problem-solving skills and ability to lead cross-functional teams.

qualifications:
We are seeking a senior-level technical candidate to support production systems, with a strong emphasis on Site Reliability Engineering (SRE) principles, incident and problem management, change/release processes, and observability maturity. The role requires deep collaboration with development and business teams to understand application functionality and drive operational excellence.

You will be responsible for transforming and maturing global support services, promoting adoption of core toolsets, and strengthening partnerships across technology and business. The environment includes large-scale, Tier 1 applications (e.g., IIS/.NET/SQL), multi-tier web hosting, clustering, and load balancing. You will ensure compliance with corporate policies on security, documentation, audit, and change control.

This role also includes leading a team of onshore and offshore engineers to maintain system availability, performance, and reliability through automation, monitoring, and continuous improvement.

In this role, you will:

Lead complex, high-impact initiatives including systems consultation and SRE strategy implementation.
Drive observability improvements by identifying gaps in monitoring, logging, and tracing across platforms.
Collaborate with engineering teams to define SLIs, SLOs, and error budgets.
Automate operational tasks and incident response workflows using modern programming languages (e.g., Python, Go, Bash).
Design and implement scalable, resilient systems using infrastructure-as-code and CI/CD pipelines.
Conduct root cause analyses and postmortems to improve system reliability.
Consult on technical changes and enhancements with a focus on performance, scalability, and fault tolerance.
Partner with architects and engineers to align with enterprise strategies and ensure secure, maintainable solutions.
Lead and mentor a distributed team, fostering a culture of continuous learning and operational excellence.

Required Qualifications:

5+ years of Systems Engineering, Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
5+ years of experience in SRE, platform engineering, or production support roles.
5+ years of experience performing engineering and support tasks on Linux/Unix and Windows Servers
Experience with observability tools such as Prometheus, Grafana, AppD, or Spunk.
2 years of experience programming in one or more languages such as Python, Java, or Go.
1+ years of experience with Cloud technologies

Desired Qualifications:

Strong understanding of distributed systems, cloud platforms (OpenShift, Azure, GCP), and container orchestration (Kubernetes).
Familiarity with CI/CD workflows, version control systems, and infrastructure-as-code tools (e.g., Terraform, Ansible).
Experience with ThousandEyes and BigPanda
Proven ability to identify and remediate gaps in system observability and performance.
Excellent problem-solving skills and ability to lead cross-functional teams.

skills: We are seeking a senior-level technical candidate to support production systems, with a strong emphasis on Site Reliability Engineering (SRE) principles, incident and problem management, change/release processes, and observability maturity. The role requires deep collaboration with development and business teams to understand application functionality and drive operational excellence.

You will be responsible for transforming and maturing global support services, promoting adoption of core toolsets, and strengthening partnerships across technology and business. The environment includes large-scale, Tier 1 applications (e.g., IIS/.NET/SQL), multi-tier web hosting, clustering, and load balancing. You will ensure compliance with corporate policies on security, documentation, audit, and change control.

This role also includes leading a team of onshore and offshore engineers to maintain system availability, performance, and reliability through automation, monitoring, and continuous improvement.

In this role, you will:

Lead complex, high-impact initiatives including systems consultation and SRE strategy implementation.
Drive observability improvements by identifying gaps in monitoring, logging, and tracing across platforms.
Collaborate with engineering teams to define SLIs, SLOs, and error budgets.
Automate operational tasks and incident response workflows using modern programming languages (e.g., Python, Go, Bash).
Design and implement scalable, resilient systems using infrastructure-as-code and CI/CD pipelines.
Conduct root cause analyses and postmortems to improve system reliability.
Consult on technical changes and enhancements with a focus on performance, scalability, and fault tolerance.
Partner with architects and engineers to align with enterprise strategies and ensure secure, maintainable solutions.
Lead and mentor a distributed team, fostering a culture of continuous learning and operational excellence.

Required Qualifications:

5+ years of Systems Engineering, Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
5+ years of experience in SRE, platform engineering, or production support roles.
5+ years of experience performing engineering and support tasks on Linux/Unix and Windows Servers
Experience with observability tools such as Prometheus, Grafana, AppD, or Spunk.
2 years of experience programming in one or more languages such as Python, Java, or Go.
1+ years of experience with Cloud technologies

Desired Qualifications:

Strong understanding of distributed systems, cloud platforms (OpenShift, Azure, GCP), and container orchestration (Kubernetes).
Familiarity with CI/CD workflows, version control systems, and infrastructure-as-code tools (e.g., Terraform, Ansible).
Experience with ThousandEyes and BigPanda
Proven ability to identify and remediate gaps in system observability and performance.
Excellent problem-solving skills and ability to lead cross-functional teams.

Equal Opportunity Employer: Race, Color, Religion, Sex, Sexual Orientation, Gender Identity, National Origin, Age, Genetic Information, Disability, Protected Veteran Status, or any other legally protected group status.

At Randstad Digital, we welcome people of all abilities and want to ensure that our hiring and interview process meets the needs of all applicants. If you require a reasonable accommodation to make your application or interview experience a great one, please contact HRsupport@randstadusa.com.

Pay offered to a successful candidate will be based on several factors including the candidate's education, work experience, work location, specific job duties, certifications, etc. In addition, Randstad Digital offers a comprehensive benefits package, including: medical, prescription, dental, vision, AD&D, and life insurance offerings, short-term disability, and a 401K plan (all benefits are based on eligibility).

This posting is open for thirty (30) days.

lead site reliability engineering (sre) -.

job details

share this job.

related jobs.

3rd shift quality technician

production manager

learning and development consultant

let similar jobs come to you