Staff Site Reliability Engineer

  • location: Nashua, NH (remote)
  • type: Permanent
  • salary: $100,000 - $150,000 per year

job description

Staff Site Reliability Engineer

job summary:
As a Staff Site Reliability Engineer you will:

  • Assist with defining a roadmap for all engineering teams to utilize fully automated, self-service, highly scalable, cost-efficient, observable, auditable and reliable infrastructure services as standard practice
  • Work on the execution of this roadmap across the engineering organization, collaborating with SREs and senior engineers across engineering while also performing hands-on work on the most critical challenges
  • Provide expert technical guidance and ongoing engineering design review to teams planning and implementing large migrations, service-oriented architecture, broad architectural shifts, and capacity growth
  • Build a metrics-driven operational culture standardizing our practices for SLO definition and review as well as for logging, monitoring, alerting, and on-call practices
  • Make iterative improvements to blameless incident management processes, root cause analyses, outage prevention, and service recovery strategies across the engineering organization
  • Partner closely with security, quality, and product teams to achieve high priority security, privacy, compliance, reliability and business-continuity objectives on our overall roadmap
  • Propose and drive large improvements to production systems to achieve significant impact to our business and engineering teams
  • Mentor and coach engineers to be curious and effective at discovering and solving technical challenges
  • Participate in SRE 24/7/365 on-call rotation
You'll be successful if:

  • You have proven experience (7-10years) demonstrating hands-on technical leadership and business impact in combining software engineering skills with systems engineering skills to solve complex automation and reliability challenges
  • You have deep technical experience with various cloud providers, containerization technologies, automated deployment frameworks, orchestration frameworks, monitoring, logging, alerting, system internals, networking, databases, distributed systems, and service-oriented architecture
  • You have the skills to implement load, stress, performance and reliability testing standards at scale to improve service, platform and infrastructure resiliency
  • You promote openness, diversity of opinions and inclusive discussions at all times to evaluate a wide variety of ideas and perspectives in solving challenging problems
  • You demonstrate clear decision making and good trade-offs in complex situations comprising multiple opinions, needs, teams, technologies, cloud providers, and architectural settings
  • Multiple Cloud experience (AWS, GCP and Azure)
  • Monitoring expertise with DataDog, New Relic, Nagios
  • CDN experience is very desirable
  • AWS IAM, networking, security, architecture and general expertise a must
  • You communicate effectively with stakeholders ranging from executives to junior engineers across the breadth and depth of the engineering organization
  • You exemplify high accountability, integrity, and resilience to maintain focus on both big-picture goals and the milestones to get there
  • You enable the engineering organization to innovate and deliver with greater speed and safety
  • Proven experience demonstrating hands-on business impact in combining software engineering skills with systems engineering skills to solve complex automation and reliability challenges
  • Proficiency in more than one programming language or infrastructure automation tool including any of: Python, Java, Bash, Terraform, Chef, or similar
  • Monitoring expertise (Any of DataDog, New Relic, Nagios, Honeycomb, or similar)
  • ELK stack for centralized logging
  • AWS IAM, networking, security general expertise a must
  • Ability to proactively look at all systems, tools, processes and architectures with an open mind and make recommendations on scale, reliability, availability and automation is key
 
location: Nashua, New Hampshire
job type: Permanent
salary: $100,000 - 150,000 per year
work hours: 8am to 5pm
education: Bachelors
 
responsibilities:
  • Assist with defining a roadmap for all engineering teams to utilize fully automated, self-service, highly scalable, cost-efficient, observable, auditable and reliable infrastructure services as standard practice
  • Work on the execution of this roadmap across the engineering organization, collaborating with SREs and senior engineers across engineering while also performing hands-on work on the most critical challenges
  • Provide expert technical guidance and ongoing engineering design review to teams planning and implementing large migrations, service-oriented architecture, broad architectural shifts, and capacity growth
  • Build a metrics-driven operational culture standardizing our practices for SLO definition and review as well as for logging, monitoring, alerting, and on-call practices
  • Make iterative improvements to blameless incident management processes, root cause analyses, outage prevention, and service recovery strategies across the engineering organization
  • Partner closely with security, quality, and product teams to achieve high priority security, privacy, compliance, reliability and business-continuity objectives on our overall roadmap
  • Propose and drive large improvements to production systems to achieve significant impact to our business and engineering teams
  • Mentor and coach engineers to be curious and effective at discovering and solving technical challenges
  • Participate in SRE 24/7/365 on-call rotation
 
qualifications:
  • Experience level: Experienced
  • Education: Bachelors
 
skills:
  • Reliability
  • Python (4 years of experience is preferred)
  • Cloud

Equal Opportunity Employer: Race, Color, Religion, Sex, Sexual Orientation, Gender Identity, National Origin, Age, Genetic Information, Disability, Protected Veteran Status, or any other legally protected group status.

get jobs in your inbox.

sign up
{{returnMsg}}

related jobs


    Sr Staff Engineer

  • location: Nashua, NH (remote)
  • job type: Permanent
  • salary: $100,000 - $160,000 per year
  • date posted: 6/28/2021