Site Reliability Engineer

Location: Remote - Chicago, United States

The Role

***THIS ROLE CAN BE FULLY REMOTE****

This role sits within the Site Reliability Engineering team and is part of the wider Crisp Technical Services business unit. Our suite of SaaS, distributed systems and product integrations help our internal stakeholders run their critical business operations and provide customers in turn with industry leading threat detection technology products. You’ll play a key role in the formation of a new area within Crisp: that aims to drive operational excellence and customer focus into the operation of our SaaS hosted application suite.

As a Site Reliability Engineer, you’ll combine your software and systems engineering expertise in coding, system design, integration, deployment and ongoing maintenance to help build and run our industry leading SaaS solution. You will help build a new key Site Reliability Engineering (SRE) function within Crisp, implementing best practice monitoring, processes and tooling focussed on uptime, performance, and reduction of toil. You will work closely with engineering teams in Development and Delivery to uphold contracted Service Level Objectives (SLOs). You will be tasked with ensuring our internal and externally available systems have reliability, and uptime appropriate to user needs.

The makeup of our systems is changing rapidly, and you’ll play a key part in helping us drive this forward. We’re moving towards a modern DevOps landscape with technologies like Docker, IaC and microservices.

You will contribute towards driving an organisational change into Crisp focussing on the core principles of Site Reliability Engineering, namely the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of our SaaS services.

This role is specific to supporting the team out of normal UK operational hours, the role will support the business following the SEA operating hours of would be 19:00 - 03:30 working with the team to bring in a 24/7 business operation with on-call required on a rota basis to cover non-business days.

Requirements

  • Build strong, collaborative relationships acting as the glue between in-house customer facing support and delivery teams, product management, and engineering (R&D) teams
  • Create frameworks to continually improve:
  • Observability through logging, monitoring, and alerting
  • Capacity analytics and demand management
  • Dashboards, internal and external status pages
  • CI / CD pipelines, release processes
  • Automation of manual processes, tooling and IaC including security checks and break-glass procedures
  • Ownership of some cross-cutting implementation like logs / metrics infrastructure
  • Team processes, driving technical debt down
  • Triage, response, and recovery times
  • Disaster Recovery models and planning
  • Reduce toil (work that is largely manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as our services grow), maximising engineering capacity
  • Bring expertise and a streetwise perspective to problem solving, reduction of complexity and reliability patterns
  • Play a key role in change management and delivery pipeline into production ensuring safety, predictability, repeatability, and auditability of all build and deploy processes
  • Proactively manage delivery of key SLOs covering Detection / early warning and self-healing
  • Own all monitoring and alerting applications, services and infrastructure
  • Act as key stakeholders in the technical debt reduction of our Products by contributing towards the Tech Debt backlogs for our R&D teams

Essential Experience

· Elastic Search

· Operations - networking, firewalls, monitoring

· SQL - Intermediate or above

· Cloud & Platform Security

· Automation & Scalability

· Business Communications

· Incident Management

· Performance Engineering

· Cloud Knowledge

· Infrastructure as code experience

Desirable Experience

· Google Cloud Certifications

· Prior SRE Experience

· Graph Database Knowledge (Neo4j)

· Release and Deployment Tooling

· Social Media API Experience

Benefits

Our rewards are as unique as our culture, and we want to attract the best people and retain them. Not only will we ensure that your development is key, but you will be joining a fantastic team of like-minded people who work together as one team to achieve a shared vision.

  • Market competitive salary based on your skills and experience.
  • Discretionary bonus scheme / commission scheme with payment based on revenue. generated as a result of generated sales leads.
  • 33 days holiday including Bank Holidays (20 days in the US).
  • Critical Illness insurance.
  • Life Insurance Cover.
  • Healthcare Cash Plan / Healthcare, dental and vision plan.
  • An attractive pension / 401k retirement plan scheme.
  • Cycle to Work Scheme.
  • Employee perks schemes offering discounts, rewards, giveaways and more.
  • Subsidised gym membership.
  • Mental health wellbeing portal and access to an in-house clinical psychologist.
  • Support and provision of supplies to facilitate home working.
  • Flexible working opportunities.

About Crisp:

Crisp stops toxic, harmful and fake online content from damaging kids, enterprises, social platforms and society as a whole. This content takes many forms - fake news, terror propaganda, child grooming, hate speech, disinformation, false rumours, threats and infodemics. This content spreads virally, at scale across closed social media groups and messaging apps, going undetected by traditional monitoring tools.

By combining Artificial and Human Intelligence, Crisp’s Extended Intelligence delivers 24/7/365 protection by continually fighting the weaponisation of social communications from whoever the source, whatever the language and whichever the online harm.

Crisp tracks and understands what ‘Bad Actors’ are saying to build accurate profiles and use predictive and behavioural analytics to identify trends and identify new, unknown harms. Crisp’s proprietary technology then scans the web continuously, capturing billions of pieces of data every week.

Crisp currently protects over $4 trillion of aggregate market capitalisation across our current customer base. This demonstrates both the value and uniqueness of our service and the trust our customers have in protecting their reputational risk.

Statement:

'This work meets the requirements in respect of exempted questions under the Rehabilitation of Offenders Act 1974, any applicants who are offered work for this organisation will be subject to an enhanced check from the Disclosure and Barring Service (DBS). This will include details of cautions, reprimands or final warnings as well as convictions. A criminal record will not automatically bar a person from successfully taking up this post.