Open Source Management Tools to Watch: Ganglia
March 28, 2006 - 3:19 pmOriginally Posted in InfoWorld, Open Sources:
Open Source Management Tools to Watch: Ganglia
Filed under: Applications , Infrastructure , Open SourceIn his previous guest editorial, GroundWork CEO Ranga Rangachari promised to point out some specific open source IT monitoring and management tools that he thinks enterprises should be keeping an eye on. Today, he points readers towards the Ganglia project.
Open Source Management Tools to Watch: Ganglia
The use of commodity components for building clusters is one of the hottest growth areas in enterprise today. Since the inception of the Beowulf project more than 10 years ago, companies like Penguin Computing (CTO Donald Becker was the pioneer of Beowful) have evolved clustering from research, HPC origins and made it highly consummable for the enterprise.
So how does clustering — and tying together “clusters of clusters” across geographic boundaries — impact the systems monitoring requirements that the typical enterprise has?
Matt Massie is the original author of the Ganglia Monitoring System, an open source cluster monitoring technology started within UC Berkeley’s Millenium Clustered Computing Labs. Ganglia has quietly gathered a ton of momentum for its cluster monitoring capabilities. Ganglia has been downloaded over 110,000 times from 145 countries and currently has dozens of contributing developers.
“The biggest difference between computational clusters and run-of-the-mill IT systems management is that with clusters you’re looking at parallel systems where local failures have a wide effect,” said Massie. “With systems like mail, you tend to have redundancy, so if one server goes down, the load goes to another, and you don’t have a job failure. That’s not the case with clustering — where there’s so much interconnectivity between the nodes, and if one fails, you tend to lose the job. When there-s a failure in a cluster - which can happen on a daily or hourly basis, if you’re running thousands of machines - you need to know about it right away.”
Massie and his colleagues at Millenium built the Ganglia tool to serve as a much more lightweight and easy to use solution than what was then available with other SNMP tools. Leveraging multicast, Ganglia is basically a “list-and-announce” protocol where only hosts that are subscribed receive the packets — and it provides a much easier way to monitor (over any geographic distance) as groups of machines are brought up or taken down.
“The low level monitor in the cluster can use multicast or unicast, and then it distributes the clusters’ statuses using XML,” said Massie. “So you can have your local monitoring domains, and then link them all together by pulling the XML streams together.”
Ganglia already has pervasive use in the high performance community. A simple web search will show hundreds of Web sites that are pulling Ganglia feeds. The Grid 3 project used Ganglia to monitor a computational grid with more than 2000 CPUs at 25 sites distributed throughout the United States and Korea. Many commercial vendors, like Dell, are bundling ganglia into their business solutions software.
The fact that you can parse the XML feeds from Ganglia also makes the solution very friendly to plugging into other types of monitoring tools.
As large companies are increasingly creating server farms of thousands of servers, new server management and monitoring requirements are surfacing. On the monitoring side, polling breaks down after a few hundred servers, so you have to go to an interrupt-driven architecture with “list and announce” protocols. Also, aside from the technical issues, traditional monitoring solutions are prohibitively expensive when you roll out hundreds of servers at a time.
For these reasons, Ganglia will be a very interesting technology to keep an eye on as clustering continues to become more pervasive in enterprise IT.
After a day passed gazing at a system monitor, visualise a book as a kind of screensaver for your brain.