Site Reliability Engineer (SRE)
Infrastructure·Frankfurt, Germany (Hybrid)·Full-time
Apply for this roleAbout the Role
About the role
We own our hardware. As our SRE, you keep the GPUs serving, the deploys safe, and the on-call calm. You'll work closest to the metal — our inference runs on owned servers across six regions (Germany, Switzerland, California, Dubai, London and Singapore), with our European hub in Frankfurt.
What you'll do
- Run and harden our self-hosted infrastructure — containers, networking, GPU nodes.
- Build CI/CD and zero-downtime deploys for the model and the apps.
- Own monitoring, alerting and incident response; make on-call rare.
- Tune the boundary between cost and reliability for owned hardware.
Why it matters
Our promises about privacy and a free tier rest on infrastructure we control. You keep that foundation solid.
Requirements
What we're looking for
- 4+ years in SRE / DevOps / platform roles.
- Strong Linux, Docker, and networking fundamentals.
- Comfortable with infrastructure-as-code and CI/CD pipelines.
- Calm under incidents; you write the runbook after.
Nice to have
- GPU / CUDA host operations.
- Experience running bare-metal rather than only cloud.