Senior Site Reliability Engineer – Growth
Kraken.com
📋 Descripción del Trabajo
Our Krakenites are a world-class team with crypto conviction, united by our desire to discover and unlock the potential of crypto and blockchain technology.
What makes us different?
Kraken is a mission-focused company rooted in crypto values. As a Krakenite, you’ll join us on our mission to accelerate the global adoption of crypto, so that everyone can achieve financial freedom and inclusion. For over a decade, Kraken’s focus on our mission and crypto ethos has attracted many of the most talented crypto experts in the world.
Before you apply, please read the Kraken Culture https://www.kraken.com/culture page to learn more about our internal culture, values, and mission. We also expect candidates to familiarize themselves with the Kraken app. Learn how to create a Kraken account here https://support.kraken.com/hc/en-us/articles/226090548-How-to-create-an-account-on-Kraken.
As a fully remote company, we have Krakenites in 70+ countries who speak over 50 languages. Krakenites are industry pioneers who develop premium crypto products for experienced traders, institutions, and newcomers to the space. Kraken is committed to industry-leading security https://blog.kraken.com/crypto-education/security-at-kraken, crypto education https://blog.kraken.com/category/crypto-education, and world-class client support https://blog.kraken.com/crypto-education/support-at-kraken through our products like Kraken Pro https://pro.kraken.com/, Desktop https://www.kraken.com/desktop, Wallet https://www.kraken.com/wallet, and Kraken Futures https://www.kraken.com/features/futures.
Become a Krakenite and build the future of crypto!
PROOF OF WORK
THE TEAM
This role is fully remote. We consider candidates across the Americas (PST–EST–LATAM timezones).
Our Growth business unit is responsible for building and scaling the experiences that drive Kraken’s user base—covering Onboarding, Acquire, and Engage teams. As part of this team, you will help ensure the reliability, scalability, and performance of the systems that power our growth initiatives.
As a Senior Site Reliability Engineer, you will partner with development teams to manage infrastructure, improve CI/CD pipelines, and support operational excellence across Growth. You will bring your expertise in infrastructure, monitoring, and automation to ensure our services are performant, resilient, and continuously improving.
THE OPPORTUNITY
– Manage and support infrastructure for Growth teams, including Nomad, Hashistack, databases, and any other underlying systems
– Maintain and troubleshoot GitLab CI pipelines, ensuring reliable and fast build, test, and deployment cycles
– Provide operational support across Onboarding, Acquire, and Engage teams, helping debug issues in staging and production environments
– Participate in incident response and post-incident reviews to improve system resilience
– Consult with teams on performance, monitoring, and alerting best practices
– Build tooling, automation, and dashboards to improve observability and empower development teams
– Collaborate with developers, QA, and product managers to streamline development and release cycles
– Support a fully distributed team operating across multiple timezones
SKILLS YOU SHOULD HODL
– 5+ years in a DevOps or SRE role
– Strong experience managing infrastructure with Consul, Vault, and Terraform
– Proficiency with databases (SQL and NoSQL) and experience operating them in production
– Proficient in Git source version-control and CI/CD configuration.
– Deep understanding of monitoring and alerting systems, preferably Prometheus and Grafana
– Ability to debug complex issues involving distributed systems, networks, and Linux operating systems
– Experience with containerization and orchestration (Docker, Nomad, Kubernetes a plus)
– Strong scripting skills (e.g., Bash, Python, or Go)
– Self-starter with the ability to thrive while working independently and remotely in a fast-paced environment
– Ability to collaborate effectively with multiple teams and switch context across projects
– Interest in security and consideration of the security implications of development and operational decisions
NICE TO HAVES
– Experience with benchmarking, performance tuning, and identifying system bottlenecks
– Familiarity with incident management best practices and tooling
– Interest in lower-level programming languages such as Rust
– Experience integrating with APIs (GitLab, Jira, Slack)
– Background working with distributed systems and technologies (Kafka, gRPC, Redis, etc.)
– Passion for building reliable, user-facing systems that scale.
Unless a specific application deadline is stated in the job posting, applications are accepted on an ongoing basis.
Please note, applicants are permitted to redact or remove information on their resume that identifies age, date of birth, or dates of attendance at or graduation from an educational insti