Sr Site Reliability Engineer

  • location: Boston, MA
  • type: Contract
  • salary: $55 - $65 per hour
easy apply

job description

Sr Site Reliability Engineer

job summary:
We aim to break down walls between development and operations; participate in finding and building solutions which enable teams to deliver software updates in a way that is highly stable and operationally sound. We are strongly invested in the AWS Cloud, infrastructure-as-code, and monitoring-as-code. We favor the practical and pragmatic over the ideal, including finding right-sized solutions. We are anticipatory and forward-looking, reliable, and have a bias toward taking action. We understand that without our customers our efforts are worthless, and that operational changes are likely to have a direct impact on user experience. We understand that uptime is paramount, and we work backwards from there.

location: Boston, Massachusetts
job type: Contract
salary: $55 - 65 per hour
work hours: 9am to 5pm
education: Bachelors
Essential Accountabilities:


  • Listening to the needs of our teams, learning how they work best, and delivering solutions
  • The ability to collaborate with product teams and technical principals to prioritize our efforts.
  • Stay current on industry trends; conceive and present to management ways to improve current practices, to improve our standing in the marketplace, and remain on the cutting edge of technology.
  • Ability to take ownership over a project, drive it forward, "sell" it to other teams inside the company as a solution for a given problem, and work with teams to drive adoption.
  • If you see an opportunity to solve a problem or otherwise make something better, take the initiative.
  • Mentor team members; foster growth by setting high-reaching goals; providing support as needed to achieve them.

  • Hands-on design, understanding, and troubleshooting of highly-distributed, large-scale production systems - both modern and legacy, monolithic and micro.
  • Co-ownership with the development teams over reliability, uptime, capacity, and performance.
  • Ensuring the repeatability, traceability, and transparency of our infrastructure automation.
  • Identifying highest-impact opportunities to optimize existing systems; ensuring "right-sized" and cost-optimized solutions in consideration of technical and business constraints.
  • System design consulting for teams seeking to leverage or improve their production infrastructure.
  • Anticipate, build, and plan capacity for upcoming product/feature launches.
  • Working with application teams and product principals to fully operationalize software/systems projects (including security requirements).
  • Being part of an on-call rotation spread amongst the rest of the team. (The better we do at the things above, the quieter the rotation is!)
  • Client is a polyglot organization. Being "conversational" in JavaScript/TypeScript, Python, PHP, Ruby, Golang, Java, Bash, Markdown, reStructuredText, HCL, JSON, YAML, and TOML would be valuable. Must be fluent in 2-3 of them.
  • Must have the skills of a senior (or higher) level software application engineer.
  • Must have the skills of a senior (or higher) level cloud operations engineer.
  • Ability to translate knowledge and ideas into written-word as documentation/1-pagers.
  • Excellent presentation and communication skills.
  • Mastery of AWS services (IAM, EC2, S3, EBS/EFS, ELB/ALB, AutoScaling, RDS and replication techniques, VPC, Subnets, Elastic IP, Route53, CloudWatch, CloudFront, Lambda, CloudFormation, ECS, SNS, ElastiCache).
  • Expertise in container/container-fleet-orchestration technologies (Kubernetes, ECS, Docker).
  • Expertise integrating continuous-integration and continuous-delivery software development lifecycles (i.e., CI/CD) into one or more applications (using Jenkins, Circle CI, Travis CI, or other modern CI tools).
  • Expertise in infrastructure automation technologies (e.g., Terraform, CloudFormation).
  • Expertise with Lean/Agile deployment processes (e.g., blue/green, zero downtime, canary, and DNS strategies).
  • Significant experience troubleshooting interactions among concurrent and distributed systems.
  • Cloud database operations and deployment experience (e.g., RDS MySQL/Postgres/Aurora), caching operations & deployments (e.g., Memcache, Redis).
  • Ability to design and manage escalation response plans - from monitoring, to reaction/response/remediation, to retrospection/post-mortem in culturally-aligned (proactive, customer focused, collaborative, proven-with-data) ways.
  • Familiarity with site and infrastructure monitoring systems (e.g., CloudWatch, Datadog, New Relic, Sumo Logic, Thousand Eyes).
  • Cloud and container-native Linux administration/build/management skills (e.g., AMIs, Packer).
  • Strong problem-solving, root cause understanding, and systems engineering skills.
  • Expertise with software development lifecycle branching and distributed source code management systems (e.g., Git/Mercurial, Git-Flow, GitHub-Flow).
  • B.S. Degree in Computer Science (or related technical field, or equivalent industry experience).
  • A non-trivial background in open source is a HUGE plus.

Equal Opportunity Employer: Race, Color, Religion, Sex, Sexual Orientation, Gender Identity, National Origin, Age, Genetic Information, Disability, Protected Veteran Status, or any other legally protected group status.

easy apply

get jobs in your inbox.

sign up

related jobs

    Senior DevOps Engineer

  • location: Cambridge, MA
  • job type: Permanent
  • salary: $140,000 - $150,000 per year
  • date posted: 3/11/2020

    Senior Estimator

  • location: Southborough, MA
  • job type: Permanent
  • salary: $110,000 - $120,000 per year
  • date posted: 4/1/2020

    Sr Supply Chain Analyst

  • location: North Billerica, MA
  • job type: Temporary
  • salary: $30 - $50 per hour
  • date posted: 3/17/2020