We're looking for Site Reliability Operations Specialist (SRO) to join our Media Operations team. As one of our SRO Specialists, you will be responsible for ensuring production services are operated in a consistent and scalable manner, delivering a high-quality experience to millions of viewers across digital platforms, mobile applications, websites, and 3rd party distribution partners. You will work very closely with Tier 1 Media Control Center (MCC) Operations to develop procedures and technical documentation to establish tasks, monitoring patterns, and troubleshooting approaches. The SRO is a Tier 2-3 type role.
We're looking for a technical self-starter, interested in leaning into progressive approaches to the production operations of world-class media and digital services, who can partner effectively with product/engineering to deliver reliable, high-quality content experiences to customers.
location: Phoenix, Arizona
job type: Contract
salary: $30.00 - 42.50 per hour
work hours: 9am to 5pm
- Overall responsibility for Production Operations of content services, applications, and events, including event monitoring incident management, infrastructure, network, and cloud services management
- Create and maintain response plays across a variety of incident management and monitoring tools, such as Blameless, PagerDuty, Service Now, Dataminr, DataDog & New Relic
- Drive system enhancements and overall reliability through Dev, SRE, and other functions to prevent future incidents and improve system resiliency/quality; own the post-mortem/incident analysis process with involved teams to identify the root cause and remediation tasks, and identifying areas of concern and drive-thru resolution
- Handle the development and reporting of key operational metrics to drive improvements over time
- Lead creation and ongoing updates to documentation, including operational runbooks, support monitoring and remediation activities, providing guidance as support, and enabling function for Tier 1 Operations teams
- Support marquee events and day-to-day offerings by preparing documentation to guide event readiness and participate in the day of operational coverage
- Participate in the development and implementation of Digital Operations processes and SOPs and identify opportunities for automation of operational tasks, incident remediation, and scaling activities
- Carry out production changes to Digital systems, infrastructure, products, and platforms in support of event and release activities
- Act as a point of contact for incident escalation for the Tier 1 Operations team and triage and mitigate as needed
- Work closely with the information security team to ensure security requirements are effectively met
- Experience level: Experienced
- Minimum 4 years of experience
- Education: Bachelors
- incident management
- site reliability operations
Equal Opportunity Employer: Race, Color, Religion, Sex, Sexual Orientation, Gender Identity, National Origin, Age, Genetic Information, Disability, Protected Veteran Status, or any other legally protected group status.