Sr Site Reliability Engineer

  • location: Boston, MA
  • type: Contract
  • salary: $55 - $65 per hour
easy apply

job description

Sr Site Reliability Engineer

job summary:
We aim to break down walls between development and operations; participate in finding and building solutions which enable teams to deliver software updates in a way that is highly stable and operationally sound. We are strongly invested in the AWS Cloud, infrastructure-as-code, and monitoring-as-code. We favor the practical and pragmatic over the ideal, including finding right-sized solutions. We are anticipatory and forward-looking, reliable, and have a bias toward taking action. We understand that without our customers our efforts are worthless, and that operational changes are likely to have a direct impact on user experience. We understand that uptime is paramount, and we work backwards from there.

 
location: Boston, Massachusetts
job type: Contract
salary: $55 - 65 per hour
work hours: 9am to 5pm
education: Bachelors
 
responsibilities:
Essential Accountabilities:

Leadership:

  • Listening to the needs of our teams, learning how they work best, and delivering solutions
  • The ability to collaborate with product teams and technical principals to prioritize our efforts.
  • Stay current on industry trends; conceive and present to management ways to improve current practices, to improve our standing in the marketplace, and remain on the cutting edge of technology.
  • Ability to take ownership over a project, drive it forward, "sell" it to other teams inside the company as a solution for a given problem, and work with teams to drive adoption.
  • If you see an opportunity to solve a problem or otherwise make something better, take the initiative.
  • Mentor team members; foster growth by setting high-reaching goals; providing support as needed to achieve them.
Technical:

  • Hands-on design, understanding, and troubleshooting of highly-distributed, large-scale production systems - both modern and legacy, monolithic and micro.
  • Co-ownership with the development teams over reliability, uptime, capacity, and performance.
  • Ensuring the repeatability, traceability, and transparency of our infrastructure automation.
  • Identifying highest-impact opportunities to optimize existing systems; ensuring "right-sized" and cost-optimized solutions in consideration of technical and business constraints.
  • System design consulting for teams seeking to leverage or improve their production infrastructure.
  • Anticipate, build, and plan capacity for upcoming product/feature launches.
  • Working with application teams and product principals to fully operationalize software/systems projects (including security requirements).
  • Being part of an on-call rotation spread amongst the rest of the team. (The better we do at the things above, the quieter the rotation is!)
 
qualifications:
  • Client is a polyglot organization. Being "conversational" in JavaScript/TypeScript, Python, PHP, Ruby, Golang, Java, Bash, Markdown, reStructuredText, HCL, JSON, YAML, and TOML would be valuable. Must be fluent in 2-3 of them.
  • Must have the skills of a senior (or higher) level software application engineer.
  • Must have the skills of a senior (or higher) level cloud operations engineer.
  • Ability to translate knowledge and ideas into written-word as documentation/1-pagers.
  • Excellent presentation and communication skills.
  • Mastery of AWS services (IAM, EC2, S3, EBS/EFS, ELB/ALB, AutoScaling, RDS and replication techniques, VPC, Subnets, Elastic IP, Route53, CloudWatch, CloudFront, Lambda, CloudFormation, ECS, SNS, ElastiCache).
  • Expertise in container/container-fleet-orchestration technologies (Kubernetes, ECS, Docker).
  • Expertise integrating continuous-integration and continuous-delivery software development lifecycles (i.e., CI/CD) into one or more applications (using Jenkins, Circle CI, Travis CI, or other modern CI tools).
  • Expertise in infrastructure automation technologies (e.g., Terraform, CloudFormation).
  • Expertise with Lean/Agile deployment processes (e.g., blue/green, zero downtime, canary, and DNS strategies).
  • Significant experience troubleshooting interactions among concurrent and distributed systems.
 
skills:
  • Cloud database operations and deployment experience (e.g., RDS MySQL/Postgres/Aurora), caching operations & deployments (e.g., Memcache, Redis).
  • Ability to design and manage escalation response plans - from monitoring, to reaction/response/remediation, to retrospection/post-mortem in culturally-aligned (proactive, customer focused, collaborative, proven-with-data) ways.
  • Familiarity with site and infrastructure monitoring systems (e.g., CloudWatch, Datadog, New Relic, Sumo Logic, Thousand Eyes).
  • Cloud and container-native Linux administration/build/management skills (e.g., AMIs, Packer).
  • Strong problem-solving, root cause understanding, and systems engineering skills.
  • Expertise with software development lifecycle branching and distributed source code management systems (e.g., Git/Mercurial, Git-Flow, GitHub-Flow).
  • B.S. Degree in Computer Science (or related technical field, or equivalent industry experience).
  • A non-trivial background in open source is a HUGE plus.

Equal Opportunity Employer: Race, Color, Religion, Sex, Sexual Orientation, Gender Identity, National Origin, Age, Genetic Information, Disability, Protected Veteran Status, or any other legally protected group status.

easy apply

get jobs in your inbox.

sign up
{{returnMsg}}

related jobs

    Senior DevOps Engineer

  • location: Cambridge, MA
  • job type: Permanent
  • salary: $140,000 - $150,000 per year
  • date posted: 3/11/2020

    Senior Estimator

  • location: Southborough, MA
  • job type: Permanent
  • salary: $110,000 - $120,000 per year
  • date posted: 4/1/2020

    Sr Supply Chain Analyst

  • location: North Billerica, MA
  • job type: Temporary
  • salary: $30 - $50 per hour
  • date posted: 3/17/2020