Go to job search

site reliability engineer

Rockset

Posted 2 weeks ago, 14 Apr | Get your application in now before you miss out!

Closing date: Closing date not specified

job Ref: 8faa9cf02f03408ba2f257471b8556e9

Rockset

Full Job Description

As a site reliability engineer, you will be responsible for the automation, stability, security, configuration, monitoring, alerting, and capacity planning of Rockset's network, systems, and infrastructure. You will also build tools that help the rest of the engineering team be more productive, and including the ones that Rockset engineers use to deploy and manage their services. You will have a foundational impact on shaping the team and the systems we create. The on-call pager is shared by most of the engineering team, not just SRE.
Our infrastructure is completely hosted in Amazon Web Services. We use a variety of home grown, open source, and commercial tools, including Kubernetes, Docker, Kafka, Zookeeper, Prometheus, Grafana, Salt, Terraform, Phacility, and Buildkite. We try to deploy new code to our production environment twice a week, but as an SRE you can expect to make production changes on a daily basis.
You should expect to collaborate with all other engineering teams to develop solutions that meet reliability, security, and business requirements. Lastly, you will diagnose, triage, and build solutions for complex technical issues at scale.

At Rockset, we've built the real-time analytics database for the world's data applications. Our team and technology come from a rich heritage, rooted in the experience of building massive scale data systems at the world's leading companies, and we created Rockset to make those kinds of powerful data platforms available to real-time application developers everywhere. We are creating a world where developers can go from complex data sets to fast, interactive applications and analysis effortlessly.,

Passionate about distributed systems, database technologies, and highly scalable services

Poised under fire and willing to share an on-call rotation with the rest of the team

A self-starter who thrives in a fast-paced environment

Willing to learn new skills and technologies

Attentive to details and comfortable with ambiguity, Bachelor's or Master's degree in Computer Science or a related field, or relevant work experience

Experience as an SRE for 3+ years

Experience building and operating public-facing 24x7 web applications at scale

Experience working with cloud infrastructure and patterns (AWS preferred)

Strong programming skills in a scripted language (Python, Ruby, Bash)

Experience with Kubernetes, Mesos, Swarm, or similar container orchestration tools

Experience with Terraform, Salt, Chef, Packer, or similar configuration management tools

Experience with Grafana, Prometheus, Datadog, or similar monitoring tools

We're a fast-growing company that values curiosity, diversity, and open-mindedness. You will solve interesting problems, surrounded by exceptional people, while making customers happy. We work hard, but also take our personal lives and experiences seriously. Our investors include Greylock Partners and Sequoia Capital. We are headquartered in San Mateo, CA with offices in Boston, MA and London, UK.