Manager, Site Reliability Engineering

Albany | Boston | Norwalk | Rochester

Site Reliability Engineering (SRE) Manager - SaaS 

As the world’s leading provider of cloud-based software and technology solutions delivered by managed service providers (MSPs), Datto believes there is no limit to what small and medium businesses can achieve with the right technology. Datto offers Unified Continuity, Networking, and Business Management solutions and has created a one-of-a-kind ecosystem of MSP partners. These partners provide Datto solutions to over one million businesses across the globe. Since its founding in 2007, Datto continues to win awards each year for its rapid growth, product excellence, superior technical support, and for fostering an outstanding workplace. With headquarters in Norwalk, Connecticut, Datto has global offices in the United Kingdom, Netherlands, Denmark, Germany, Canada, Australia, China, and Singapore. Learn more at datto.com.

About You

More than someone who checks every box, we’re looking for people who are excited to work and grow at Datto. If that's you we hope you apply for the role!

You enjoy teamwork

You come with new ideas and a unique point of view. You look forward to collaborating with a diverse team. You eagerly seek and give help. Transparency tops your list of values, and you contribute to a culture of respect and inclusion.

You’re inquisitive

Inquisitive and focused, you see every challenge as an opportunity. You would rather create the future than wait for it.

You’re customer-focused and take pride in your work.

You put extra attention into details with all you do. You care about the work you provide to customers and how it reflects on yourself and Datto. When you find or see something wrong, you attempt to resolve it. You look for opportunities to not only better yourself, but others around you. You aim to be the best that you can be and always do the right thing. 

What You’ll Do

As a Site Reliability Engineering Manager at Datto you will lead the SaaS SRE team - working closely with Software Development, QE, Infrastructure, and Product Support/Problem Management teams to ensure SaaS protection is reliable, highly-available, and performs well at scale.  Ideally you’re a player/coach - someone who can participate in SaaS SRE initiatives at a technical level, lead the efforts of the team, and represent SRE with stakeholders and management. 

 

Your job function and responsibilities include:

  • Manage and grow a high-functioning team of SRE engineers - responsible for automation, orchestration, monitoring and other relevant activities to support the SaaS Protection service for Datto. 
  • Serve as a quality and reliability ambassador as part of an Agile software development team
  • Define and implement overall charter and strategy for Site Reliability for SaaS
  • Plan, manage, and communicate project tasks, timelines, and status
  • Collaborate with SRE teams and leadership across the broader engineering organization to define and set the overall vision of SRE, agree on common processes and tools, and share best practices
  • Drive product reliability improvements through monitoring, alerting, and application of software development best practices
  • Define and implement strategies for achieving high service availability/reliability, fault tolerance, and performance/scalability
  • Maintain open communication with Engineering and Product teams around system performance and reliability
  • Educate QA, Software Development, and Product teams around performance and scalability best-practices, metrics, and guidelines
  • Manage the SRE on-call rotation - regularly identifying opportunities for improvement 
  • Suggest and drive efforts to improve testing processes and methodology, metrics collection and success measurement, test coverage, and product reliability
  • Troubleshoot complex issues quickly and effectively; continually improve processes and reliability based on post-mortem analysis
  • Other duties as assigned by Management

 About You:

  • A Bachelor’s degree in Computer Science, Management Information Systems or Software Engineering; or equivalent work experience
  • Experience managing/leading a team of Site Reliability engineers
  • 5+ years of hands-on experience with performance, scalability, and reliability testing techniques, tools, and best practices
  • Familiarity with success measurement (SLI/SLO/SLA)
  • Experience with software build, package, configuration and release management tools (eg. Gitlab, Jenkins, Ansible, Salt, Puppet)
  • Proficient with Linux, MySQL, and Shell scripting
  • Familiar with containerization (Docker/Kubernetes) concepts
  • Knowledge of infrastructure (networking, hypervisors, storage, security, etc.) - experience working with a private cloud is a plus
  • Excellent problem-solving skills, and the ability to troubleshoot complex issues quickly and effectively
  • Able to find opportunities for improvement and tackle them without external direction
  • Ability to “think outside of the box” and find creative solutions to operational problems
  • Dedication to collaboration, “teaching others to fish”-style knowledge sharing and cross training. 
  • Excellent communication skills
  • Ability to operate in a fast paced environment
  • Self-motivated & willing to learn

Bonus Points:

  • Familiarity with object-oriented programming languages and concepts (Python, Java, Golang, etc..) and exposure to writing automated tests with common test frameworks such as Pytest
  • Experience with data visualization tools such as Kibana and Grafana
  • Experience with metrics collection, time series queries, middleware such as Telegraf, and backends such as OpenTSDB or Prometheus
  • Exposure to chaos engineering tools and load testing techniques at scale

Note: We are looking only for candidates willing to join us directly as W2 employees (No 3rd party candidates)

 

More About Datto

  • Datto, the world’s leading provider of IT solutions delivered through managed service providers, is looking for a Sr. Software Engineer to join a growing team. Datto is a creative company at its core and is an exciting and dynamic workplace. We're 100% focused on our managed service provider partners and believe that with the right technology, managed service providers can change how businesses around the world operate. Datto provides data protection, business continuity, networking, business management, and file backup and sync products that empower and protect the clients of our 14,000+ partners. We're headquartered in Norwalk, Connecticut and have 22 offices worldwide. You will report to the Manager of Software Engineering.

 

At Datto, we believe our employees are our greatest asset and offer all full-time employees a wide-ranging benefits package, including:

  • Comprehensive health-care benefits
  • Free lunch every Friday
  • Flexible working hours
  • Unlimited paid time off
  • Free food, drinks, and fresh organic fruit
  • Charity match program
  • And more!

 

By submitting an application, you acknowledge we will process your data to consider you for the position you apply for and for other open positions within our company for which you may be suited. We collect and store your data following our Recruiting Privacy Practices.

Datto is an equal opportunity employer.

Note: We are looking only for candidates willing to join us directly as W2 employees (No 3rd party candidates)

 

Manager, Site Reliability Engineering

loadingspinner

Sorry, your application was not successfully submitted

Hurray! Your application was successfully submitted

Back to Careers