Job Description
<p><h3>Overview</h3><p><b>Principal Platform Engineer, Reliability and Observability</b></p><p>Ncounter is hiring a senior Platform Engineer to own reliability and observability across a mission-critical trading platform. This is a deeply technical role focused on keeping complex, distributed systems stable, measurable, and predictable under real-time load. You will work directly on shared platform services that underpin trading and research workloads, where latency, partial failure, and blind spots in monitoring are not tolerated.</p><p>Observability is a core engineering concern here, not a bolt-on toolset. You will design and operate metrics, logging, tracing, and alerting pipelines that ingest high-volume telemetry, expose system behaviour under stress, and materially reduce operational risk. The role blends production engineering, platform tooling, automation, and reliability-led architecture, with direct ownership of systems running at scale.</p><h3>Responsibilities</h3><ul><li>Owning reliability and observability for shared platform services in Linux and Kubernetes environments</li><li>Designing and operating high-throughput metrics, logging, and tracing pipelines for real-time systems</li><li>Hardening services against latency degradation, cascading failure, and outages using reliability engineering principles</li><li>Reducing operational toil through automation, GitOps workflows, and platform tooling</li><li>Improving on-call signal quality through alert design, runbooks, and post-incident learning</li><li>Partnering with engineers to bake observability and resilience into services by default</li></ul><h3>Core Technical Background</h3><ul><li>Strong experience in SRE, production engineering, or platform reliability with ownership of live systems</li><li>Deep Linux systems knowledge, debugging, and performance tuning</li><li>Software engineering with Python or Go, plus solid Git and CI/CD experience</li><li>Hands-on expertise with observability stacks covering metrics, logs, traces, and alerting</li><li>Experience operating systems at scale, including HA, DR, and incident response</li></ul><h3>Nice to Have</h3><ul><li>Infrastructure automation with Terraform or Ansible</li></ul><p>This is a role for engineers who enjoy understanding how systems really behave under pressure and who want to own reliability as a first-class engineering problem. If you like solving hard platform problems where observability directly drives system correctness, this is worth a conversation.</p></p>#J-18808-Ljbffr
Create Your Resume First
Give yourself the best chance of success. Create a professional, job-winning resume with AI before you apply.
It's fast, easy, and increases your chances of getting an interview!
Application Disclaimer
You are now leaving Jobsbrampton.ca and being redirected to a third-party website to complete your application. We are not responsible for the content or privacy practices of this external site.
Important: Beware of job scams. Never provide your bank account details, credit card information, or any form of payment to a potential employer.