Deel

Observability Specialist

Gestern

Angaben zum Job

Deel
Firma Deel
Kategorie IT
Pensum 100%
Home Office 100% Remote
Einsatzort Remote

Job-Inhalt

Key Responsibilities

  • Design, implement, and maintain scalable observability solutions for cloud-native environments

  • Own monitoring across AWS and Kubernetes (EKS) environments, covering clusters and workloads

  • Operate and maintain self-hosted monitoring stacks (e.g., Prometheus, Grafana, Mimir, Loki, Tempo)

  • Manage and optimize DataDog (metrics, logs, APM, alerts, cost monitoring)

  • Improve observability architecture to support high availability, scalability, and fault tolerance

  • Implement monitoring cost optimization strategies (log/trace sampling, retention policies, storage optimization)

  • Automate observability infrastructure using Infrastructure as Code (Terraform, Helm, etc.)

  • Integrate monitoring and alerting into CI/CD pipelines (GitHub Actions is an advantage)

  • Support capacity planning and performance tuning initiatives

  • Collaborate with DevOps, SRE, and Engineering teams to embed observability best practices

  • Drive continuous improvement of monitoring standards, tooling, and reliability practices

Required Skills & Experience

  • 5+ years of hands-on experience in monitoring / observability engineering within cloud-native environments

  • Strong experience with AWS services 5+ years of hands-on experience working with Kubernetes

  • Solid knowledge of Kubernetes monitoring, including metrics, logs, and traces for clusters and workloads, alerting, SLOs, SLIs, and dashboards.

  • Proven experience operating and maintaining self-hosted monitoring stacks, advantage: Prometheus, Grafana, Mimir, Loki, Tempo Experience designing or improving observability architectures at scale

  • Experience with DataDog (metrics, logs, APM, alerts, and cost monitoring)

  • Strong understanding of high availability, scalability, and fault-tolerant architectures

  • Experience with monitoring cost optimization, including log and trace sampling strategies, storage and retention optimization

  • Ability to automate monitoring tasks using Infrastructure as Code and scripting (Terraform, Helm, etc.)

  • Familiarity with CI/CD pipelines and integrating monitoring into deployment workflows (GitHub Actions is an advantage).

  • Experience with capacity planning and performance tuning

Soft Skills

  • Strong problem-solving and analytical skills

  • Ability to work independently and take ownership of complex systems

  • Good communication skills, able to collaborate with DevOps, SRE, and other teams

  • Proactive mindset with a focus on continuous improvement

Benefits

  • Stock grant opportunities dependent on your role, employment status and location
  • Additional perks and benefits based on your employment status and country
  • The flexibility of remote work, including optional WeWork access

Bewerben

Bewerben Sie sich direkt auf der Webseite von Deel.