This job was posted over 90 days ago and may no longer be available.

Lead Site Reliability Engineer - Remote

At StyleSeat, our mission is to help people look and feel their best. We are on the path to achieving this mission by being the go-to marketplace for consumers to discover, book, and pay for beauty and grooming services (hair stylists, colorists, nail artists, estheticians, barbers, etc). We are also the premier solution for all independent professionals in the industry to run and grow their business. We have powered over 120 million appointments booked and $10B in revenue for small businesses and are on the path to much more.

As a Lead Site Reliability Engineer at StyleSeat, you will have a rare opportunity to join a startup empowering small business owners across the country to be more successful doing what they love. Our team believes in discipline, hustle and supporting each other to make a positive impact for our community and bottom line. We believe that diversity makes us stronger and helps us better support our customers. Our community is the center of everything we do.

About the Role:

As a Lead Site Reliability Engineer at StyleSeat, you will be the first member of a newly-formed Site Reliability team, and will partner with the Senior Director of Engineering to define and build the structure of this greenfield opportunity. You will lead a fast-growing operations team responsible for a highly-scalable, cloud-hosted, service-oriented infrastructure. You will be responsible for maintaining the stability and health of our critical systems, and work toward achieving 99.999% up-time. You will work collaboratively with key members of the Engineering organization to design and build infrastructure components and operational processes that improve the robustness and scale of our application services.

Required Skills:

* 10+ years of experience in Linux Systems Engineering/Site Reliability/DevOps/Build & Release/Tooling
* Strong familiarity with multiple AWS services - EC2, S3, RDS, ElastiCache, ECS/EKS
* Experience performing Root-Cause Analysis (RCA) of service and system failures
* Experience in Application Performance Monitoring (APM), and resolving system performance bottlenecks on live production systems (e.g. New Relic, AppDynamics)
* Experience managing container orchestration systems (e.g. AWS ECS/EKS, Kubernetes)
* Experience with Infrastructure-as-Code (IaC), and automating infrastructure configuration and service deployments (e.g. TerraForm, AWS CLI/API)
* Experience supporting different mechanisms for code execution and/or deployment (e.g. Docker, Lambda)
* Ability to script various operational utilities using Python
* Experience load-profiling critical system components and services
* Experience creating and maintaining operations documents and procedures

Nice-to-have:

* Software engineering background to keep automation well-factored and under automated test whenever possible
* MySQL operations and performance tuning
* Knowledge of security best practices and standards (e.g. X.509, PKCS, AWS IAM, AES)
* Experience with Node.js, ElasticSearch, RabbitMQ

Desired Skills

Contact Info

Posted: Jan. 18, 2020

Apply


Get Updates