Site Reliability Engineer

Full Time
United States
Posted
Job description

Alianza is seeking an experienced Site Reliability Engineer. This position is a senior engineer who has experience in the DevOps/SRE space. He or she will be focused and responsible for the availability, scalability, and security of the Alianza platform. Alianza is on a mission to become the world’s best communications platform, scaling our platform to tens of millions of users with 99.999% uptime and unlimited horizontal scale – all while remaining easy to use and fully secure. You will be deeply involved in for platform design and architecture, as well as working with other teams to create a well-rounded and full-featured platform. You will work closely with Engineering and other DevOps engineers to define processes, policies, tooling, and standards. The qualified candidate will have had experience working as an SRE or equivalent in a large, highly scalable software/SaaS platform with industry leading availability.

Key Duties and Responsibilities:

  • Drive Platform Reliability
    • Combine software and systems engineering skills to help design and operate large-scale, distributed, fault-tolerant systems
    • Work with Architecture and Engineering to ensure all core domains and technologies are built for exceptional site reliability and scalability
    • Monitor platform performance, uptime, and scale
    • Work with InfoSec to ensure security is built into everything we do and that Alianza is architected properly for maximum security and privacy
    • Define and document overall platform architecture – the platform must be capable of supporting 20+ million users with 99.999% availability
    • Work with teams to ensure each team is clear on their domain architecture as it relates to high-availability, scalability, and security – hold engineers accountable to a high bar and drive overall HA across the platform
  • Drive a Culture of Reliability
    • Lead by example, coach engineers and teams where needed to prioritize availability and resiliency work and to have a sense of pride in their platform uptime
    • Push HA and scalability initiatives, evangelize to others in the company the importance of HA, scale, and security to Alianza’s long-term success
  • Own Platform Observability & Successful Deployment
    • Define, build, and maintain all core observability standards and tooling – alerting, monitoring, logging, and dashboarding
    • Work with teams to ensure high-quality CI/CD processes and pipelines, ensure teams have smart release standards and process that maximize platform availability
  • Drive Processes, Tools and Metrics
    • Work with Systems Architecture, Engineering, and DevOps to ensure we are using the right technologies, have the right processes, and are tracking the right metrics for platform HA and scale

Qualifications:

  • 3+ years of experience as a software engineer
  • 3+ years of experience as SRE, DevOps engineer or equivalent experience
  • 5+ years of experience in distributed systems, storage systems, or databases
  • Experience designing, analyzing, and troubleshooting large-scale distributed systems
  • Excellent communication skills and a sense of ownership, with systematic problem-solving approach
  • Experience and expertise with public cloud providers, cloud-native software best practices, containers and databases
  • Experience with infrastructure as code tools – CloudFormation or Terraform strongly preferred
  • Significant experience with monitoring, logging, and alerting tools – Prometheus and Grafana strongly preferred
  • Must have a strong sense of urgency and continual improvement mentality
  • Strong leadership competencies with ability to influence key stakeholders in matrix organization
  • Experience architecting and designing world class highly-scalable and highly-available software/SaaS platforms
  • Must have ready solutions available for availability, scalability, security, and observability challenges
  • Must be able to synthesize many competing priorities, understand the business impact of each and make the correct prioritization requirements – and then defend your decision
  • Must have a strong sense of ownership – owning all things within the engineering team and executing them in a consistent and professional manner without direct supervision or micro-management

caravetterealestate.com is the go-to platform for job seekers looking for the best job postings from around the web. With a focus on quality, the platform guarantees that all job postings are from reliable sources and are up-to-date. It also offers a variety of tools to help users find the perfect job for them, such as searching by location and filtering by industry. Furthermore, caravetterealestate.com provides helpful resources like resume tips and career advice to give job seekers an edge in their search. With its commitment to quality and user-friendliness, caravetterealestate.com is the ideal place to find your next job.

Intrested in this job?

Related Jobs

All Related Listed jobs