Staff Site Reliability Engineer I

17 Tage alt

Angaben zum Job

Firma	Remote
Kategorie	IT
Pensum	100%
Home Office	100% Remote
Benefits	Flexible Arbeitszeiten Vielfältige Weiterbildung Erfolgsbeteiligung
Einsatzort	Remote

Job-Inhalt

What this job can offer you

As a Staff SRE at Remote, you will own the technical direction of our SRE platform, shaping its architecture, reliability strategy, and long-term evolution. This is a leadership role as much as a technical one: you'll drive platform-wide initiatives, set the reliability bar for engineering teams across the organisation, and be a force multiplier for the engineers around you.

A key part of this role is identifying and leading opportunities to leverage AI: from reducing operational toil to enabling engineering teams to build, ship, and operate software more effectively. You'll work with a high degree of autonomy, translating technical risks into business impact and aligning with Engineering Managers, Team Leads, and Product teams to ensure reliability and engineering efficiency are built into everything we do.

What you bring

Technical

8+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering
Deep expertise in Kubernetes: operating, designing, and scaling production clusters
Proven experience designing and managing cloud infrastructure on AWS (or other cloud providers) at scale
Strong infrastructure-as-code practice with Terraform
Experience defining and operating reliability frameworks: SLOs, SLIs, error budgets, alerting strategies
Solid observability background: Datadog, Grafana/Prometheus, or similar
Proficiency with CI/CD platforms (GitLab CI, GitHub Actions, or similar) and deployment automation
Comfortable with Bash and scripting for automation; broader programming skills are a plus
Experience with container tooling (Docker) and the broader ecosystem around it
Curiosity and practical experience applying AI tools to infrastructure, operations, or developer tooling: whether through AI-assisted automation, LLM-powered workflows, or intelligent observability

Leadership & behavioural

Proven track record of driving platform-wide technical initiatives and influencing engineering direction without formal authority
Strong communicator: able to tailor messaging to technical and non-technical audiences, write clearly, and align stakeholders across teams
Self-directed: able to identify what needs attention, define the path forward, and execute with minimal supervision
Experience mentoring senior engineers and creating space for others to lead and grow
Comfortable navigating ambiguity, translating vague requirements into concrete solutions
Approaches technical problems with a business lens, understands the cost and value of engineering decisions

Nice to have

Excellent communication and interpersonal skills
Holistic debugging skills
Security knowledge and capabilities from a defensive and offensive standpoint

Key Responsibilities

Own the technical direction of Remote's SRE/Platform domain, its architecture, tooling, and long-term roadmap
Define and drive the reliability strategy across the platform: SLOs/SLIs, error budgets, observability, and incident management maturity
Lead complex, cross-team infrastructure initiatives from discovery through delivery, delegating effectively and keeping projects aligned with business goals
Identify and lead AI enablement initiatives across the engineering organisation, exploring where AI can reduce operational overhead, accelerate development workflows, improve incident response, and unlock new capabilities for engineering teams
Drive AI-powered automation for platform operations: intelligent alerting, automated incident triage, self-healing infrastructure, and AI-assisted runbooks, reducing toil and freeing engineers to focus on higher-leverage work
Contribute to capacity planning and cost-efficiency of Remote's infrastructure
Mentor senior engineers, raising the technical bar through code reviews, design feedback, and hands-on guidance
Collaborate with the Security team on platform hardening, threat mitigation, and compliance
Be a steward of engineering quality across the SRE team, championing best practices, managing technical debt deliberately, and raising standards over time
Contribute to hiring, onboarding, and continuously improving how the SRE team operates

Benefits

work from anywhere
flexible paid time off
flexible working hours (we are async)
16 weeks paid parental leave
mental health support services
stock options
learning budget
home office budget & IT equipment
budget for local in-person social events or co-working spaces

Bewerben

Bewirb dich direkt auf der Webseite von Remote.

Zum Originalinserat

Brauchst du Hilfe bei der Bewerbung?

Lebenslauf Bewerbungsschreiben Vorstellungsgespräch Lohnrechner