Power Supply Management: The IT Blind Spot
As part of building out my testing infrastructure, I've become more involved with system-level management tools and technologies. This effort has proven to be generally useful for overall resource management purposes (and particularly useful for resolving the various heat-related problems that have cropped up), but there are also some significant blind spots in the current crop of hardware management solutions. At the top of the list are system power supplies, which are currently treated as little more than opaque black boxes when it comes to manageability. However, there's hope on the horizon that this will change soon.
As it stands right now, I can use a variety of software packages to tap into a motherboard's sensor chips and gather voltage readings for the CPU and peripheral busses, the temperature of the CPU(s) and chassis, and even the rotational speeds of the various cooling fans, and I can make this information available through multiple management channels. Similarly, almost all modern hard drives and disk controllers also provide a variety of statistics and sensor readings via the industry-standard S.M.A.R.T extensions. I can even get some kinds of environmental readings from some video cards through vendor-specific interfaces.
On the other hand, none of my system power supplies have any kind of management interface whatsoever. I have no idea if they're loaded beyond their recommended or rated capacities, if an embedded fan is sputtering or failing, or if power conditions might be causing system-wide failures. In short, I have no way of knowing when a power supply is dying or about to die. Given the critical role that power supplies play in basic system operations—not to mention the role they play in secondary elements like power utilization and temperature—there's no good excuse for this absence.
This is especially true considering there's already a specification for standardized power supply management. Specifically, the Power Supply Management Interface (PSMI) Design Guide version 2.12 was published by the Server System Infrastructure (SSI) consortium back in September of 2005 and documents an extensive amount of management information that can be exchanged between a host computer and power supplies across a local SMBus connection.
Specifically, PSMI documents signaling methods for publishing a tremendous range of sensor data, including input and output voltage levels, multiple fan readings, multiple thermal readings, and a variety of failure codes and diagnostic signals. PSMI also allows for sending control signals back to the power supply from the host, thereby allowing the system operator to do things like turn the fan speed up or down through software controls. It even has signals that help with failover management. This is a pretty comprehensive spec. Better yet, SSI already has most of the big-name vendors as members, and near as I can tell the spec itself has zero royalty restrictions on nonmembers.
Indeed, the only problem I can find with it is that there doesn't appear to be very many power supplies that actually support it. In fact, I couldn't find any compliant products in my searches. Update: I have been informed via email that HiPro has four models in their pedestal-chassis series (these appear to be OEM power supplies, but I'm not sure), and that Delta Electronics also has a few models for Intel's dual-core Bensley platform.
This should be a no-brainer, but it looks like the end-user community is going to have to force vendors to actually develop and offer PSMI-compliant power supplies. If the community is going to be taken seriously about the issues of power and temperature management, buyers need to start adding PSMI to their purchase checklists, hectoring their sales reps about the technology, and rewarding the vendors that support it. I know I will.