Network Ram (NRAM) for Clusters
Kei Imada, Liam Packer, Tia Newhall
The increasing popularity of big data in the modern era of high performing computers entails the increase in demand for larger memory pools to process massive amounts of information. A cluster computer — a collection of computers, or nodes, connected by a fast network — is one way to provide such vast memory pools. Although cluster computers are less expensive than most kinds of parallel computers, they usually have imbalances in resource usage, often causing significant slowdowns of parallel programs. Network RAM (NRAM) ameliorates this problem by abstracting a collective memory pool of the cluster computer, allowing applications on one node to access memory on other nodes with little additional overhead.
Many NRAM implementations reserve parts of node memory spaces to store data from remote nodes, but this fixed reservation mechanism leads to system slowdowns when node workloads become memory intensive.
To solve this problem, we investigated whether we could make memory intensive programs run faster on a cluster with a dynamic memory reservation implementation of NRAM that grows and shrinks in response to memory needs of node workloads.
Our approach was a multi-step process. We first identified important system characteristics for predicting high and low node memory usage, and then used them in an adaptive NRAM system that would solve the problem of fixed memory reservations.
We found that a statistical model was a promising and fruitful approach to our problem. Our results show that it is possible to take countermeasures and ameliorate slowdowns of the system during periods of high memory usage.