Remote Senior/Staff Site Reliability Engineer Job at Wonders

January 30, 2024

Apply for this job

Job Description


About Wonders

At Wonders, we build products that delight restaurant managers and offload the operational burden of running the business. By enabling frictionless connection between restaurants and their customers, we enhance the experience for everyone.

As a company built by technologists and former restaurant operators, Wonders takes customer empathy seriously. We obsess over placing our customers first and working backwards, and fundamentally believe that when our customers succeed, we succeed.

Our metrics are strong

  • Wonders has achieved significant growth over the past two years, growing annual re-occurring revenue by almost 4x to $46 million and quadrupling our customer base of mom and pop restaurants.

Some of our teams

The Voice Platform team enables restaurants to handle customer phone calls no matter how many customers are calling in at the same time. This allows these mom and pop businesses to scale past their staffing limit and fulfill their customers’ orders on time, every time.

The Payments team provides payments solutions and enables restaurants to capture more revenue.

The Analytics Platform team powers the infrastructure used to make data accessible and useful across a variety of cross-functional teams at Wonders.

Role:

  • Working in a fast-paced, high-growth startup environment where you will always be learning
  • Building the foundations of the SRE organization as one of the first site reliability engineers
  • Influencing new designs and architecture, best practices and standards in supporting and improving technology platforms.
  • Defining and implementing internal service level indicators and objectives.
  • Defining external SLAs based on data from the above
  • Architecting self healing/scalable services and reducing toil via automation
  • Participating in our on-call rotation to resolve and incident management including a blame-free postmortem custure
  • Design and implement incident management improvements

About you:

  • 5+ years of engineering experience with a focus on improving the reliability and scalability of systems
  • 2+ years of specific SRE experience working with metrics, traces, and logs
  • Experience in automation via infrastructure as code
  • Strong experience in AWS, Terraform, and Kubernetes
  • Experience with container orchestration and management
  • Experience with APM tools, ex: Grafana Cloud or similar technology like New Relic, Datadog, Dynatrace
  • Experience with log management tools such as Grafana Loki or similar technology like fluentd/fluent-bit, splunk, ELK
  • Deep understanding of cloud and microservice architecture
  • History of building meaningful solutions in a cloud computing environment like AWS, GCP, etc.

Bonus points:

  • BS / MS / PhD in Computer Science or related field
  • AWS certification
  • Passion for the restaurant industry or restaurant technology
  • Ability to architect terraform infrastructure for AWS from scratch
  • Experience building products from 0 to 1 in an unstructured environment
  • Familiarity with or desire to learn our tech stack which includes, but is not limited to: Rust, Go, AWS, Terraform, K8s, GitHub Actions, etc.



Source link