Staff Site Reliability Engineer

Remote | United States

As the world’s leading provider of cloud-based software and technology solutions delivered by managed service providers (MSPs), Datto believes there is no limit to what small and medium businesses can achieve with the right technology. Datto offers Unified Continuity, Networking, and Business Management solutions and has created a one-of-a-kind ecosystem of MSP partners. These partners provide Datto solutions to over one million businesses across the globe. Since its founding in 2007, Datto continues to win awards each year for its rapid growth, product excellence, superior technical support, and for fostering an outstanding workplace. With headquarters in Norwalk, Connecticut, Datto has global offices in the United Kingdom, Netherlands, Denmark, Germany, Canada, Australia, China, and Singapore.

The Datto Remote Management and Monitoring (RMM) team enhances and maintains the software powering the remote management and monitoring cloud service delivered from around the globe from the Amazon AWS platform. In a nutshell, RMM provides a central support desk with the tools to audit, manage, monitor and support the distributed devices of their customers. It achieves this by installing an agent onto Windows, MacOS, Linux, iOS and Android platforms that communicate in real time with our cloud service in AWS. It’s both agent based and agentless, if a device needs management we aim to support it.

The successful candidate will join the SRE team responsible for developing and maintaining the applications and architecture that underpin our RMM Platform, with a focus on optimising, scaling and securing the platform. You’ll be responsible for …

  • Analyzing production logs, alerts, and metrics in order to identify potential issues and implement service improvements
  • Driving product observability improvements through monitoring, alerting, and application of software development best practices
  • Identifying creative ways to break products, uncover and report defects, and validate systems/solutions are operating as intended
  • Participating in incident retrospectives and performing root cause analyses

About You:

Being smart with an ability to get things done is key. This typically comes with years of experience but don’t let that stop you talking to us if you’re light on years but heavy on innovation - and you can demonstrate it. 

You should be comfortable working with the following:

  • Cloud technologies, deploying stateless microservices in a containerised environment, and working with Serverless deployments
  • AWS managed services, such as DynamoDB, RDS, SQS, Kinesis, and Lambda
  • Java 8+, Groovy, Kotlin, Spring Boot
  • Working with both relational and non-relational data-stores
  • Consuming REST API’s, working with web caches, and optimizing load balanced setups

Bonus skills and experience:

  • Experience with APM and Observability Technologies such as Datadog, NewRelic, Dynatrace, Grafana, etc.
  • Troubleshooting and optimizing JVM based deployments and web applications
  • Infrastructure-as-code solutions, such as CloudFormation or Terraform
  • Elasticsearch
  • Contributions to open-source software
  • Tech related blog or GitHub account details

At Datto, we believe our employees are our greatest asset and offer all full-time employees a wide-ranging benefits package, including:

Summary of benefits not showing up? View a summary here: Datto Benefits

By submitting an application, you acknowledge we will process your data to consider you for the position you apply for and for other open positions within our company for which you may be suited. We collect and store your data following our Recruiting Privacy Practices.

Datto is an equal opportunity employer.

Note: We are looking only for candidates willing to join us directly as W2 employees (No 3rd party candidates)

Staff Site Reliability Engineer


Sorry, your application was not successfully submitted

Hurray! Your application was successfully submitted

Back to Careers