background.png

Monitoring

In DevOps, proper monitoring is seen as being proactive, not just reactive. Before any problems show up, monitoring will find ways to improve your applications. Additionally, monitoring constantly watches the tools themselves and highlights which areas may need more automation. In this way, monitoring can help improve the DevOps toolchain.

The Challenge

There are risks which accompany ineffective monitoring. According to Dunn & Bradstreet, approximately half of all Fortune 500 companies have a minimum of 1.6 hours of downtime per week. That works out to approximately 83 hours of downtime over a year. With this in mind, if a site averages around $6000 in revenue per hour, the downtime will cost the company over $500 000 per year.

Monitoring plays a major role in avoiding this downtime, while minimizing the downtime if an issue ever occurs.

Services 2.png
logging 2.png

The Benefits

Using a monitoring service can help your business immediately identify issues and the servers which the issues arose. Your IT team won’t have to spend time locating the cause, sifting through data, or running manual tests to figure out the issues. They will be more focused on what’s important - your app. In addition, monitoring can:

  • Identify gaps in the environment that could result on crashes, such as databases running out of disk space.

  • Identify potential risks of outages in an environment such as nodes running out of CPU and memory, applications not being able to handle the amount of requests.

monitoring 3.png

At stack.io we can:

  • Set up alert notifications for key infrastructure including:

    • Basic system-level metrics (CPU, disk, memory, load)

    • Database metrics

    • Proxy / load balancer / certificate expiration metrics

  • Design and implement a monitoring system based on best practices

  • Push alerts to a preferred company Slack channel / Pagerduty / email / etc.

  • Monitor your application execution using application performance monitoring tools




Untitled+design-22.jpg

DevOps Maturity

Where does your setup fit on our DevOps maturity scale?

+ Poor

  • We have no monitoring setup

  • We only know about things failing when the website goes down.

  • Our TLS certificates expire on us all the time, causing downtime.

+ Fair

  • We monitor one or two endpoints.

  • We have monitoring that lets us know when there is an outage.

  • When an alert fires, we can solve the issue after a detailed investigation.

  • My email inbox has 150k unread emails from our monitoring system.

  • There's so many alerts that we’ve started ignoring them and not all of the alerts are actually useful or meaningful.

  • Checking alerts is painful and we actively avoid doing it if there’s another task that needs to be done.

+ Good

  • We have alerts for most metrics and systems, though occasionally our alerts miss something.

  • We have dashboards for all of our systems so that we’re able to view the status of our systems at a glance, but are missing some metrics or applications (or have no application performance monitoring at all).

+ Great

  • Our alerts catch issues before they happen.
  • We have alerts set up for monitoring our costs.
  • We have alerts set up for broken pipelines.
  • We have alerts for high privileges to monitor security access.
  • We have great dashboards of both our application and system metrics for all of our systems and it’s easy to make new ones.