Nswap2L: Heterogeneous Swapping for Cluster Computers
Nswap2L: Heterogeneous Swapping for Cluster Computers
Tia Newhall and E. Ryerson Lehman-Borer
Most super computers these days are of a variety known as cluster computers. Cluster computers are built by combining many smaller computers, called nodes, and interconnecting them with a fast network so that they can communicate easily. Large jobs can be completed quickly by breaking them into small parts and assigning each node a part. Unfortunately, it is difficult to keep the workload even between all the nodes--it is often the case that some nodes will use all their main memory (RAM) while other nodes in the network have plenty of free RAM. Typically, when a computer fills up all its RAM, it will swap some memory from RAM onto a hard disk. This is convenient because hard disk space is much more cheap and abundant than RAM. The drawback is that hard drives are much slower. Nswap provides an alternate method of swapping. Rather than send excess memory to the hard disk, Nswap finds a node in the network whose RAM has free space and sends the memory to be held there. Since network speed is much better than disk speed, this is a faster solution.
Nswap2L builds on Nswap, and chooses whether to swap memory to network RAM, hard disk, flash drive, or some other storage device. Though a prototype for Nswap2L has been built, there is not a fully implemented version, and there are design decisions that must still be made. Work was done to test out several new features for potential use in Nswap2L, including DMIO, discard, and multi- threading. DMIO would allow Nswap2L to easily add, remove, and swap to many devices. Discard is a Linux kernel feature that notifies a device driver when memory is no longer needed. Writing a multi-threaded device driver allows a computer to run it on multiple CPUs simultaneously. Testing was done on all three of these features and results indicate that all can be feasibly implemented in the full version of Nswap2L.
Literature cited Tia Newhall and Douglas Woos, Proceedings of IEEE Cluster Conference, Austin, TX, September 2011