RainMon

Condensing Timeseries Monitoring Data

Clips

Identify anomalous machines with a single scatterplot

Get a picture of overall cluster trends with hidden variables mined from monitoring data

Drill down to individual machines and compare metrics with custom zoomable plots

Get a snapshot of activity across machines with a heatmap that is linked to the timeseries plots

About

Timeseries data are prevalent in large-scale computing centers. Systems often capture sampled metrics of performance, utilization, and even sensor data like temperature. These streams are used for monitoring, placement, optimization, and more.

RainMon is a framework to manage massive data-center timeseries streams that are lengthy and bursty in nature. It uses a multi-stage modeling approach. In the first phase, the incoming data streams are decomposed into "smooth" and "spiky" components. In the second phase, the streams are summarized into a set that can be visualized and understood. In the third phase, predictions are made about the future state of the system. Such a framework provides the potential to address a number of practical advances for data center efficiency:

Detecting potential anomalies or alert conditions. Dramatic changes in model parameters can be predictors of problems or abnormalities.
Reduce the storage requirement of streams through compression. By storing only the model parameters and occasional original data points, potentially large timeseries data can be effectively summarized.
Improve alerting and placement algorithms by predicting future usage. Even a short-term view of future usage can be valuable for decision-making. By playing the model forward, we can obtain this data.

The framework incorporates several existing algorithms from the literature including Cypress¹, SPIRIT² and Kalman filters³. RainMon has been applied to large data streams collected from production clusters to detect real anomalies.

[1] Galen Reeves, Jie Liu, Suman Nath, and Feng Zhao. Cypress: Managing Massive Time Series Streams with Multi-Scale Compressed Trickles. In Proc. VLDB'09, pages 97-108.

[2] Spiros Papadimitriou, Jimeng Sun, and Christos Faloutsos. Streaming pattern discovery in multiple time-series. In Proc. VLDB'05, pages 697-708.

[3] Sam Roweis and Zoubin Ghahramani. A Unifying Review of Linear Gaussian Models. Neural Computation, 11(2):305-345, February 1999.

Demo

Click here for a demo

The demonstration provided above is an example of the output of RainMon after an analysis has been run.

Source

RainMon is hosted on github. The tool is open-source and available under a BSD license. Additionally, we are providing the CMU.net dataset described in the KDD 2012 paper to the community (see the data directory). See the guide for additional information.

Download RainMon

Guide

A copy of the users' guide is located here. For the most up-to-date version, clone the repository and look in the doc directory.

The RainMon team is:

Yoshihisa Abe
Vishnu Boddeti
Kai Ren
Ilari Shafer

Contributors:

Lennart Liberg

Email us at contact@rainmon.com to get in touch.