
What Is “Emergency Support”?
Having the same team that performs changes on your setup, also be the one that actions any alerts or incidents, is a solid practice to follow. Such support is often required around-the-clock meaning that emergency response services, with resources available to action any alerts that may arise, are a necessary part of any DevOps team.
Your needs are unique; as such we are able to customize and define an emergency support solution that meets:
Expected first-response times (dependent on priority/severity)
Mutually agreed upon coverage times (ranging from a set period each day all the way through to 24/7 coverage)
A well-documented escalation path with clear responsibilities
Our Approach To Troubleshooting
From our wealth of experience, we follow our own tried-and-tested troubleshooting guidelines to identify, solve and document an issue:
Identify the problem
Gather information from log files and error messages
Determine recent changes to narrow the scope of the problem
Establish a plan of action to resolve the problem and implement the solution
Document findings, actions, and outcomes
Post-mortem documentation
Establish a theory of probable cause
Questioning the obvious
Considering multiple approaches, including top-to-bottom or bottom-to-top
Test the theory to determine the cause
Implement preventive measures
Of course, upon successful service restoration, we highly recommend taking actions to prevent future problems. Any actions undertaken as part of the service restoration are to be followed up with a pull request (PR) to ensure the code is in sync with the environment and to avoid any infrastructure drift.
How Can We Help?
In the event that we need to escalate an incident, we can work closely with your existing support teams to resolve any issue, no matter how complex
We can assist with root cause analysis and implementing a fix for permanent remediation, ensuring that your systems are up and running smoothly
To better understand your specific needs, we can review your logs from your alerting tool for the last 3-6 months to see the types of alerts that are being generated
We will work closely with you to understand your current monitoring setup, how alerts are set up and actioned, and the targets of those alerts
We will help define and refine SLA times and the criteria that impacts severity and prioritization levels
Ultimately, our aim is for you and your team to have peace of mind knowing that your emergency technical issues will be addressed quickly and efficiently, allowing you to focus on your business.