Rain
Condensing Timeseries Monitoring Data


About

Timeseries data are prevalent in large-scale computing centers. Systems often capture sampled metrics of performance, utilization, and even sensor data like temperature. These streams are used for monitoring, placement, optimization, and more: for example, task assignment algorithms use computer utilization to determine where to place jobs.

We propose a framework to manage massive data-center timeseries streams that are lengthy and bursty in nature. To effectively model and compress such bursty data, we propose a two-pass modeling approach. In the first pass, the incoming data streams are decomposed into multiple sub-streams with different sparsity and burstiness. In the second pass, sub-streams are feed into appropriate models for summarization and compression. Such a framework provides the potential to address a number of practical advances for data center efficiency:

  • Improve alerting and placement algorithms by predicting future usage. Even a short-term view of future usage can be valuable for decision-making. By playing the model forward, we can obtain this data.
  • Reduce the storage requirement of streams through compression. By storing only the model parameters and occasional original data points, potentially large timeseries data can be effectively summarized.
  • Detecting potential anomalies or alert conditions. Dramatic changes in model parameters can be predictors of problems or abnormalities.

Our framework incorporates several existing algorithms from the literature including Cypress, SPIRIT and Kalman filters. We have applied our framework to large data streams collected from a production cluster. Our focus is to evaluate fusing the two-pass framework to effectively manage large timeseries streams from datacenters.

Demo

Click here for a demo

The demonstration provided above is an example of the output of Rain after an analysis has been run.

Source

Rain is hosted on github. The tool is open-source and available under a BSD license. Additionally, we are providing the datasets described in the KDD 2012 paper to the community (see the data directory). See README.md for additional information.

github Download Rain

The Rain team is: Email us at contact@rainmon.com to get in touch.