Grafana as Yet Another Tool for Technical Monitoring of Software Products We Build

One more article in Logicify Monitoring Tools series talks about Grafana, a software we use both for internal and external projects to visualize and analyze the data. The article could be of interest to CTOs, developers and DevOps, system administrators and Project Managers, and everyone interested.

Definition of Grafana

Grafana is an open-source platform for data visualization, monitoring and analysis. This tool, paired up with Graylog, is a part of our double-sided system of user behavior and performance monitoring. Grafana allows users to create dashboards with panels, each representing specific metrics over a set time-frame. Every dashboard is versatile, so it could be custom-tailored for a specific project or any development and/or business needs.

At Logicify, we mostly use Grafana with Elasticsearch and InfluxDB, but there is a variety of other supported data sources (Prometheus, MySQL, Postgres to name just a few) for this software. For each data source, Grafana has a customized query editor and specific syntax.

Grafana Notions

  • Panel is a basic visualization building block presented per the metrics selected. Grafana supports graph, singlestat, table, heatmap and freetext panels as well as integration with official and community-built plugins (like world map or clock) and apps that could be visualized too. Each panel could be customized in terms of style and format; all panels could be dragged, dropped, resized, and rearranged.
  • Dashboard is a set of individual panels arranged on a grid with a set of variables (like server, application and sensor name). By changing variables, you can switch the data being displayed in a dashboard (for instance, data from two separate servers). All dashboards could be customized and sliced and diced depending on the user needs.
    Grafana has a large community of contributors and users, so there is a large ecosystem of ready-made dashboards for different data types and sources.
  • Dashboards can utilize annotations to display certain events across panels. An annotation is added by custom requests to Elasticsearch; it shows as a vertical red line on the graph. When hovering over an annotation, you can get event description and tags, for instance, to track when server responds with 5xx error code or when the system restarts. This way, it is easy to correlate with a time, specific event and its consequences in an application and investigate system behaviour.

Logicify Best Practices with Grafana

Grafana in Internal Projects

For our internal IoT project (office weather monitoring solution), we connected Grafana to InfluxDB, a time series database, to visualize the changes in office weather parameters and react to them accordingly. A set of sensors measure temperature, humidity, atmospheric pressure and CO2 level in every zone of our Kherson office; these parameters are collected and visualized with Grafana graphs on a large kitchen monitor and online.

Grafana Dashboard with Logicify Office ZonesGrafana Dashboard with Logicify Office Zones

This way, we keep constant track of air quality parameters, and our Office Manager reacts to the changes: opens windows if the level of CO2 is too high, turns AC and humidifiers on and off.

Grafana Dashboard with Office Weather ParametersGrafana Dashboard with Office Weather Parameters

Through Grafana-displayed time series graphs and annotations, we analysed trends in office weather changes over months and seasons. We also used the tool to visualize some useful widgets and pieces of information (weather forecast, currency exchange rates, internal calendars) on a large kitchen monitor.

How to Use Grafana in Custom Web Apps

Grafana + Graylog

We use Graylog to store and manage the logs of web applications and monitor their performance both in development and production phases. Grafana is the tool that “translates” the logs stored in Graylog into visual forms for analytical and system monitoring purposes. For one of our ongoing projects, Grafana can figuratively be called a UI for a web application load and performance as well as customer flow. Graylog and Grafana exist independently from each other, and we made no custom and complex integration to connect them. Since Graylog stores all log data in Elasticsearch, one of Grafana’s data sources, we simply use certain Elasticsearch index where the logs are stored to connect Grafana to Graylog.

What metrics can be visualized in Grafana for a web application

Pure text logs or error notifications are not “interesting” to Grafana as its main purpose is to visualize the data in graphs, charts and tables. We wrote a custom module for Django to collect the data we’d like to track for every web/worker request and response processed. It is not just the success/failure status but a set of structured fields (both general and project-specific), such as:

  • app version
  • unique ID of every request
  • response time and status
  • error code (if any)
  • IP address from where the request was sent
  • user info (e-mail, username for registered users, role, permissions)
  • device etc.

Django pushes custom structured analytical record into Graylog, which stores them in a separate stream. Though these data could be visualized by native Graylog dashboards, they are not that good-looking as Grafana’s. So we make Grafana read these analytical data and visualize them. This way, we keep track of the application performance and load both in real-time and in retrospective.

Graylog-Grafana_Architecture_Diagram

Grafana as a debugging tool

Primarily, Grafana dashboards help us in debugging the application. If the end customer reports a problem, Grafana gives us a way to distinguish between errors on customer/server side and real bugs or loopholes in application logic. We track all web requests initiated by customer (using e-mail address), app admins and application itself within a given time-slot and find whom to blame by elimination.

We also do debugging and bug fixing if we notice an anomaly on the dashboard with application load and performance graphs. The following example of a Grafana graph shows the response time to web requests during a certain time-frame. For every web request, we could track a max, min and average response time. If we see a request that took us too long to process, we could scale a certain part of the graph and investigate the issue.

Example of Grafana Graph with Response Time to Web RequestsExample of Grafana Graph with Response Time to Web Requests

Another graph shows system load over a set time-frame and is useful for traffic tracking. If we see an unusual spike in activity on the graph, e.g. in non-business hours or on weekends, we investigate it. It could be caused, for instance, by Google crawlers who index the web-site content or evil bots scanning our system for vulnerability. Again, each case is investigated and addressed accordingly.

Example of Grafana Graph with Application LoadExample of Grafana Graph with Application Load

Grafana has a built-in alerting engine (e.g. email or Slack notifications) per some conditional rules. We do not use this option of Grafana as we have all notifications configured on Graylog side. However, some issues in system performance could be seen only after a runtime, e.g. unusually long response time to a web request. We would not receive a Graylog notification about this, yet the anomaly would be clearly seen in Grafana graph. So, both tools go hand-in-hand when we get to know about an issue: we check Grafana to understand what happened and why on the high level, then dig deeper in Graylog using a specific ID of a request.

Unlike Graylog, used both for apps under development and in production, Grafana is used only for the apps in production. The only exception when it is used for an app still underway is performance testing. We emulate system load with JMeter, then check Grafana dashboards to see how it responds.

Grafana as a business analytics tool

Apart from performance tracking and debugging purposes, Grafana dashboards are a powerful tool for informed business decisions. When setup properly (preferably, in tandem with Google Analytics), Grafana can visualize custom analytics on user behavior in the system in the form of pie charts, time bar graphs and other graphics. Based on these, product stakeholders could make the decisions on further scaling the application, adding or removing some functionality and improving customer journey.

Example of Grafana Dashboard with User Behavior in an eCommerce AppExample of Grafana Dashboard with User Behavior in an eCommerce App

Since the above dashboard is more business-oriented, developers use it internally, more like a collateral tool to keep abreast of the customers flow in ecommerce application: signups, logins, orders placed within a set time interval.

Here is a couple real project cases where Grafana helped to improve usability of a web app.

  • Via Grafana, we regularly monitor the status of recurring orders in the system and filter failed ones. These orders are subscription-based, which means they are generated in the system each month, and money is automatically withdrawn from the customers’ bank accounts. Sometimes, payments fail (not enough money / financial institution refuse), so system admins check the case and contact the clients to re-generate the order manually. This way, no order is left behind, so both clients and vendors are satisfied.

  • Using Grafana-generated reports for an eCommerce app, we figured out that a large percent of new clients jump off the Checkout page, though they already have products in their carts. This finding was backed up by Google Analytics reports, so the checkout procedure was analyzed step-by-step and improved - users are now able to complete the order in just a couple mouse-clicks. This increased the conversion rate and - consequently - the vendor’s profit.

Grafana is an important component of Logicify monitoring system both for internal and external projects. It is an open-source software with a large and active community of contributors, but what we like most about this software is its flexibility - it supports multiple data sources and allows easy customizations of dashboards and panels.

Related articles

Tags