Site Reliability Engineer

Arbor Education

Site Reliability Engineer

£65000

Arbor Education, City of Westminster

  • Full time
  • Permanent
  • Remote working

Posted 2 weeks ago, 2 May | Get your application in now before you miss out!

Closing date: Closing date not specified

job Ref: e8a42ed1b4384f74937f263d857e0874

Full Job Description

We are looking for an enthusiastic and proactive Site Reliability Engineer to join our SRE team and help us ensure we provide world-class resilience and performance across the platform. The remit and focus of the role is to advise on all aspects of site reliability including availability, scalability, observability and capacity planning. It's a broad and exciting role, so we're looking for someone up for a challenge - if you're an energetic and a collaborative Site Reliability Engineer, this is the role for you.

Core responsibilities

  • Proactively monitor and analyse platform performance.

  • Collaborate with engineering teams to address performance bottlenecks and ensure scalability.

  • Assist engineering teams with implementing and reviewing SLOs

  • Continually improve observability through monitoring and alerting, and dashboards, using tools such as DataDog or Prometheus for example.

  • Work with other teams to ensure it is effective and provides full coverage.

  • Ensure the service is highly available and resilient

  • Champion best practices in design for high availability

  • Devise runbooks and run game sessions to test our DR plan, H/A and backups

  • Conduct assessments of capacity and plan for scaling to meet current and future business needs.

  • Work closely with the Head of Platform Engineering and Head of SRE to strategize and implement scalable solutions.

  • Work closely with the Platform team, feature teams and, 2nd line support and other stakeholders to ensure a good level of service is provided for our customers and embed SRE practices.

  • Key player in the response and troubleshooting of incidents, ensuring rapid resolution and minimising downtime.

  • Participate in blameless postmortems to identify root cause and corrective actions

  • Develop and maintain playbooks and documentation

    Experience in performance monitoring and analysis

  • Capacity planning experience

  • Scripting and automation skills, with experience in relevant technologies.

  • Experience with Infrastructure as Code, in particular, Terraform

  • Understanding of relational database technologies and their cloud versions (e.g. AWS Aurora)

  • Experience with messaging and distributed asynchronous workloads

  • Experience with nginx or similar technologies

  • Familiarity with SRE processes.

  • Aware of DevOps principles like the 3 ways and 5 ideals.


  • Bonus Skills
  • Experience with other database technologies and cloud platforms.

  • Past experience with enterprise solutions running at scale

  • Familiarity with kanban and agile development processes

  • Experience with containerisation, for example Docker

  • Familiarity with software best practices such as Refactoring, Clean Code, Domain-Driven Design and Test-Driven Development.

    At Arbor, we're on a mission to transform the way schools work for the better.


  • You've probably seen the headlines. Heavy workloads, constant change, admin pressure on teachers and staff at every level… sometimes it feels like this is just part and parcel of school life today. But it doesn't have to be this way.

    We passionately believe that there's a better way to work. And it starts by giving everyone the right tools and technology for the job.

    We're building a platform and products we believe in - as well as a strong, diverse team of experienced specialists, ex-teachers and Edtech engineers passionate about making a difference to the sector.

    Ultimately, we're here to help make our schools and trusts stress a little less, and focus on what matters most - improving the lives of teachers and outcomes of students everywhere.

    The chance to work alongside a team of hard-working, passionate people in a role where you'll see the impact of your work everyday. We also offer:
  • A dedicated wellbeing team who champion initiatives such as mindfulness, lunch n learns, manager training, mental health first aid training and much more!

  • 32 days holiday (plus Bank Holidays). This is made up of 25 days annual leave plus 7 extra company wide days given over Easter, Summer & Christmas

  • Private Bupa Dental Insurance

  • Enhanced maternity and adoption leave (20 weeks full pay) and paternity (6 weeks full pay) pay

  • 5 free return to work maternity coaching sessions, helping you adapt to this new exciting time of life!

  • Access to services such as Calm, Bippit (financial wellbeing coaching) and Health Assured (Employee assistance programme)

  • All of our roles champion flexible working and we are happy to discuss what this means to you!

  • Social committees that plan team, office and company wide events to bring people together and celebrate success

  • Dedicated professional development training budget (CPD courses, upskilling resources, professional memberships etc)

  • Volunteer with a charity of your choice for a day each year

  • Dog friendly offices!


  • Interview process
  • Phone screen


  • 1st stage


  • 2nd stage


  • We are committed to a fair and comfortable recruitment process, so if you require any reasonable adjustments during your application or interview process, please reach out to a member of the team at careers@arbor-education.com.

    Our commitment is also backed by our partnership with Neurodiversity Consultancy, Lexxic who provide us with training, support and advice.