Friday, September 27, 2013

Classifying metrics

To be able to create meaningful data representations of metrics that are useful to a business, it is important to identify the metrics that matter the most. This is important for any type of data representation, such as classic dashboards, interactive data environments or a data sonification. Those particular metrics are not always obvious and companies often take a lot of effort in identifying the metrics matter the most for their business. It is not unusual that those metrics are hidden in the raw data the company produces during their every day process, which means that they can only be revealed though calculation. Such calculations could be the difference between two correlating signals or the amount of a particular signal exceed its standard variance. One can differentiate between metrics that have immediate matter and relevance, such as the availability of the company's services, and metrics that matter in the long run and are examined retrospectively. The data sonification project "Listening to the Heart of Business" focuses on live metrics that possibly can have an immediate impact on the business.

From a DataShaka perspective, there is a large amount of metrics that have to be monitored constantly. DataShaka is a data unification platform, harvesting data from different sources for their clients, unifying that data and then storing and delivering it to their clients. Many data files are constantly harvested, processed, unified and validated on a cloud machine. Furthermore, a Microsoft Azure powered storage platform named DISQ (Dynamic Intelligent Storage Query) is storing and delivering that data. As all these processes are constantly happening in the background and vital for keeping the heartbeat of the business alive, it is important to know if everything is running smoothly or if problems are occurring and where these problems are coming from.

Below is a list identifying specific metrics that matter most for the DataShaka company, from a business perspective as well as a developer's perspective:

The metrics that matter
  • Number of Data Files
    • processing
    • stuck
  • Cloud Machine Statistics
    • CPU
    • Memory
    • Network
    • Free Disc Space
  • Speed Query Responds
    • duration
    • failure
  • UDPs (Unified Data Points)
    • uploaded
    • downloaded
  • Steps processing Data Files
    • failed
    • completed
  • User Login
  • Users logged in
  • Data file process kicked off

All these business metrics are structurally time series data. Additionally, each time series point (TSP) for these metrics contains some pieces of context.

The first classification that can be done on those metrics is to differentiate between metrics that are basically just events and only communicate that something particular has happened, and continuous metrics where the actual numbers are relevant. There are also metrics however, that are basically just events, but that event inherits a particular value which is very relevant. Consequently, there are three different categories these metrics can be classified to:
  • Binary Event Metrics
  • Complex Event Metrics
  • Continuous Metrics
Looking at classic sonification techniques, this could a possible way to apply sound to each type of metric:

Binary Event Metrics => Auditory Icons
Complex Event Metrics => Earcons
Continuous Metrics => Parameter Mapping

An explanation of sonification techniques can be found in a previous blog post here.

In every case, each metric is a point in time that contains a particular value, may it be for a constantly changing metric (such as CPU usage of a cloud machine), as well as a simple event, where the value is binary only switches between 0 and 1. All these metrics additionally inherit context, basically being what their value represents (query respond time, CPU, etc.).

This particular way of looking at data is coherent with DataShaka's data ontology TCSV, describing a time based content-agnostic and context-driven data representation. This particular data ontology and its relation to the sonification project will be further discussed in future posts. 

Looking at the metrics that matter identified above and applying them to the three classes that have been created, they can be structured the following way:
  • Binary Event Metrics (Auditory Icon)
    • Speed Query Responds
      • failure
    • Steps processing Data Files
      • failed
      • completed
    • User Login
  • Complex Event Metrics (Earcon)
    • Speed Query Responds
      • duration (Time)
    • UDPs (Unified Data Points)
      • uploaded (Amount)
      • downloaded (Amount)
    • Data file process kicked off (Size of File)
  • Continuous Metrics (Parameter Mapping)
    • Number of Jobs
      • processing
      • stuck
    • CPU/Memory/Network/etc
    • Free Disc Space
    • Users logged in

No comments:

Post a Comment