background.png

Emergency Support

A reliable and experienced operations team ready to tackle your emergency infrastructure issues and get your business back up and running.

What Is “Emergency Support”?

Having the same team that performs changes on your setup, also be the one that actions any alerts or incidents, is a solid practice to follow. Such support is often required around-the-clock meaning that emergency response services, with resources available to action any alerts that may arise, are a necessary part of any DevOps team.

Your needs are unique; as such we are able to customize and define an emergency support solution that meets:

  • Expected first-response times (dependent on priority/severity)

  • Mutually agreed upon coverage times (ranging from a set period each day all the way through to 24/7 coverage)

  • A well-documented escalation path with clear responsibilities

Our Approach To Troubleshooting

From our wealth of experience, we follow our own tried-and-tested troubleshooting guidelines to identify, solve and document an issue:

  • Identify the problem

    • Gather information from log files and error messages

    • Determine recent changes to narrow the scope of the problem

  • Establish a plan of action to resolve the problem and implement the solution

  • Document findings, actions, and outcomes

    • Post-mortem documentation

      • Establish a theory of probable cause

        • Questioning the obvious

        • Considering multiple approaches, including top-to-bottom or bottom-to-top

      • Test the theory to determine the cause

      • Implement preventive measures

Of course, upon successful service restoration, we highly recommend taking actions to prevent future problems.  Any actions undertaken as part of the service restoration are to be followed up with a pull request (PR) to ensure the code is in sync with the environment and to avoid any infrastructure drift.

How Can We Help?

  • In the event that we need to escalate an incident, we can work closely with your existing support teams to resolve any issue, no matter how complex

  • We can assist with root cause analysis and implementing a fix for permanent remediation, ensuring that your systems are up and running smoothly

  • To better understand your specific needs, we can review your logs from your alerting tool for the last 3-6 months to see the types of alerts that are being generated

  • We will work closely with you to understand your current monitoring setup, how alerts are set up and actioned, and the targets of those alerts

  • We will help define and refine SLA times and the criteria that impacts severity and prioritization levels

Ultimately, our aim is for you and your team to have peace of mind knowing that your emergency technical issues will be addressed quickly and efficiently, allowing you to focus on your business.