Monthly Archives : July 2017

My Perfect Green Data Centre (9) – The Perfect Green Data Centre

In this series, I’ve tried to unload everything that I see as being wrong about the way we think about, design and build data centres. I’ve said that my perfect green data centre is not a data centre at all, but consists of solar panels, with computers mounted on them, installed on people’s roofs. A second-best is the ultimate air containment monocoque. And, as the least good, a bodge job for colocation.

But, at a higher level, nothing I’ve said addresses the fundamental problem, which is that computers turn electricity into heat.

To some extent this will always be the case, but it needn’t be the case to the extent that it is. Nearly all servers use NMOS chips, and NMOS chews power. There’s nothing to be done about this. The basic unit of all digital technology is the transistor. NMOS requires a transistor and a resistor to create the most basic logical circuit (an invertor), and more complex functions similarly require resistors. Resistors dissipate unwanted electricity by turning it into heat. So NMOS chips necessarily turn electricity into heat.

But the computer chips that power your phone are not NMOS – and your phone battery would last minutes rather than hours, and your phone would burn a hole in your pocket if they were. The chips in your phone are CMOS. And CMOS consumes an order of magnitude less power.

This is because CMOS chips use almost no resistors. For every one transistor / resistor pair in NMOS, there will be two transistors in CMOS. Those transistors only use power at the instant a state changes, whereas resistors use power nearly half the time. (In an NMOS invertor, whenever the input is 1 and the output is 0, the resistor is active. Assuming on average that the invertor will spend equal amounts of time in both states, and the resistor’s pumping out heat half the time. In a CMOS invertor, one transistor is on and the other is off, and neither dissipates heat, so the average amount of power pumped out is close to zero. Similar cases apply to more complex functions.)

As transistors take up more space than resistors, and are more complex, CMOS packs fewer logical functions into more space. Because the construction is more complex, the yield – the proportion of correctly functioning chips coming off the production line – is lower, so CMOS are more expensive. But the power consumption is much, much lower.

Conversely, NMOS chips pack more computing into less space, and are less costly than CMOS chips of the same die size. Hence their ubiquity, especially in server grade equipment. The downside is that resistors turn electricity into heat and transistors don’t, so NMOS chips emit a great deal more power.

But the limiting factor in solar power is the insolation: the amount of solar energy that reaches the ground. This, for any given part of the world on any given day is fixed. Even if we recover 100% of that solar energy – and we never will – the 1,350W per square meter that comes from the sun in the tropics will power at most two or three off-the-shelf commodity NMOS servers. Those two or three servers will occupy much less than a square meter.

But that same 1,350W will power maybe a dozen CMOS servers. And the resultant extra density more than compensates for the fact that CMOS chips tend to be less powerful than NMOS chips.

In other words, there’s an imbalance. NMOS consumes solar energy faster than we can generate it; CMOS uses solar energy at roughly the same rate per square meter that we can generate it. And because we’re using that energy to power state changes – which is what matters - rather than to preserve states – which shouldn’t take any energy at all - we pack in far more computing into each square meter than we can with NMOS.

So the perfect green data centre consists of PV panels on people’s roofs, with CMOS servers attached, connected by wifi towers or fibre as appropriate, and with data replicated at time zones +8 and -8 hours away (or +120 and -120 degrees of longitude) so that computing follows the clock.

Or, in other words, the perfect green data centre consists of our planet Earth itself, shorn of the ugly concrete boxes that turn electricity into heat. Who’s up for building it? I’d be delighted to hear from you – chris,maden [at] cpmc.com.hk.

Previous

My Perfect Green Data Centre (8) – Colocation: The Best I Can Do

I’ve seen some amazing items of equipment in colocation data centres. The winner has to a drum printer, but I’ve seen passive 4-way hubs, any number of free-standing modems, more desktops than I care to remember, and that’s before we get to the tape drives, optical jukeboxes and voice recording equipment. And that’s only the digital stuff. The analogue equipment for telephony comes in shapes and sizes of which Heath Robinson would be proud.

So, although pre-mounting stuff in trays and having robots insert those trays into a sealed monocoque of the type I suggested in the last post may work 90% of the time, the 10% will kill any attempt at robotics. The range of things that goes into a colocation data centre is just to vast for anything but the current row and aisle, human-access data centre. Having untrained IT guys clambering around a vertical chamber, laden with heavy and expensive equipment, is a non-starter.

As to robots, forget it. They’re too expensive to be cost effective. And human incursion is binary: one either designs for it or doesn’t. If there are likely to be even a few of incursions a year, one must design for it, and one reaps the costs of both the robots and the infrastructure needed to deal with humans.

So what is there to do? Quite a lot. They’re all little things, and most of them have to do not so much with the engineering, but with the dynamics of how colocation providers interact with their clients.

On the engineering, insulate your building, deploy geothermal piling, install hot or cold aisle containment, use evaporative cooling, but also, look at the load. It is much more energy efficient to cool three 4kW racks than to cool a single 10kW and two 1kW racks, yet clients arrange their IT in the latter way all the time. Put intelligent PDUs on each and every rack, monitor the heck out the whole white space with temperature sensors, and re-arrange your client’s equipment to balance the temperature. Yes, there are constraints on cable length and demands of proximity, but even within these, the equipment layout can be optimized.

To do this, the contracts need to be restructured to reward the clients for good behaviour. Most colocation contracts I’ve seen (and I’ve seen many) rather than rewarding the client for good behaviour, reward the data centre operator for their colocation clients’ bad behaviour. It does not have to be this way. Colo providers sell space, electricity, network bandwidth and time. Break the electrical component down into base IT load (which is fixed) and cooling (which can almost always be reduced), and you incentivise the client to work with you to reduce his heat load.

Don’t allow your clients to use cages. I’ve mapped out the topology of enough networks by looking at the backs of racks to know that it can be done – but I had keys so didn’t have to peer through the grate. But the bigger points are that all IT estates consist of the same stuff – servers, storage and network - and unless you know what the hardware does and what the IP addresses are (yes, they’re supposed to be labeled…) knowing that the estate consists of servers, storage and networks is like knowing that a building consists of bricks, wood and metal. Furthermore, with virtualization technologies, the hardware is so abstracted from the network topology as to be almost irrelevant. So the cages serve almost no useful purpose in securing the estate. Yet cages are terrible for data centres: they screw up the airflow, stressing the cooling system. If anything, they offer a false sense of security: the real threat’s at the end of the wire, not some guy photographing your racks..

And anyway, cold air containment systems provide most of the perceived security advantages of cages.

And then there’s the Network and Security Operations Centres. Here’s the next generation NOC/SOC:

Phone

I’m not advertising or advocating the Huawei 9 Mate – the picture’s just to make the point: the technology is already available to monitor all systems and alert people when unexpected things happen. It can tell if a human is in the white space, check that against whether or not there should be a human in the white space, and which part of the white space that human should be in. SNMP is hardly new and any half-decent DCIM will send text messages, e-mails and the like if it thinks a component is breaking. All those flashy monitors and screens in the NOC and SOC serve no useful engineering purpose.

And then there are the little things. Use LED lighting, not fluorescent tubes, and switch the lights off it they’re not needed. Stay away from CFCs if you have DX units. If you’re using evaporative cooling, there’s no chilled water so no need for a raised floor – so don’t install one. I think that there’s no need for any permanent staff in a data centre but, if you must have people there hard at it doing nothing most of the time, give them LED lighting, efficient air-con, too.

Plant a few trees.

And finally, source clean power. If you must offset, offset, but at least put a solar panel on the roof. If nothing else, it can keep the staff cool and in light.

Previous  Next

My Perfect Green Data Centre (8) – A Robotic Cloud Data Centre

Having concluded in the previous post that the best data centre is no data centre, in this and the next post I’ll go for the second-best options for a cloud service and colocation data centres.

The reason I distinguish between cloud and colocation is that the IT in a cloud data centre is much more homogenous than that in a colo data centre. Where colocation tenants tend to pack their racks with all manner of stuff, cloud operators and internet service providers tend to buy thousands of identical of identical servers with identical disks, and identical network equipment. This opens possibilities.

The first is to keep humans out. Humans bring in dust, heat, moisture and clumsiness. We present a security risk. And we add a lot less to computing than we like to think.

If we are to keep humans out, what happens when things break? A simple answer, and one that I suspect will be the most cost-effective over the life-span of a cloud data centre, is “nothing.” Given the low cost of computer power, fixing a single broken server, disk or motherboard is hardly going to make a difference. That computer manufacturers no longer even quote the Mean Time Between Failures, a measure that determines the reliability of mechanical things, suggests that out of, say, 5,000 servers, 4,999 will be working just fine after five years.

If, however, we are going to insist on fixing things, how about robots?

I was told five years ago when I first thought of robotics in data centres, that robots in data centres would never work. For a colocation data centre, I came to agree. The best we can do would be to build a pair of robotic hands that are remotely operated by a human. This would keep the dust, heat and moisture out, but instead of having a weak, clumsy human bouncing off the racks, we’d have a strong, clumsy robot. On top of that, for remote, robotic hands to work, they need to be tactile, and to this day, tactile robotics costs some serious money.

But my friend Mark Hailes had a suggestion: for a cloud data centre, mount all the equipment in a standard sized tray with two power inputs and two fibre ports. Rather than have a fully-fledged robot, use existing warehouse technology to insert and remove these trays into and from the racks. There could be standard sized boxes, 1U, 2U and 4U, to accommodate different servers.

My friend Richard Couzens then inspired a further refinement, which is to have a rotunda rather than aisles and rows. This minimises the distances the robots have to move, and therefore reduces the number of moving parts, which ought to improve the reliability of the whole.

Finally, with robots doing the work, there is no particular need to stop at 42U: the computers can be stacked high.

Put these together, and we get a very different layout from that of a conventional data centre:

Pod 1

Pod 2

If that looks familiar, it’s because it’s the same layout I suggested here to take advantage of natural convection currents. So here’s the whole story.

Before operations, the tower will be populated with racks which are designed to receive trays. The racks will be fully wired with power and data. Human beings (as a group of humans can work more quickly than a single robot) will fill up the trays with computers and the racks with trays. When everything’s been plugged in, the humans leave. The entire tower is then sealed, the dirty air pumped out, and clean air pumped in. The tower will use evaporative cooling; that will be switched on and the computers powered up.

We only supply DC power. There is no AC in the tower, and consequently no high voltages. The robot (which is more of a dumb waiter than a cyborg) can be powered using compressed air or DC motors – that’s something for a robotics guru.

The tower is designed to be run for many years without human incursion. If something in a tray breaks, the robot will retrieve the tray and deliver it to the airlock, where a person picks it up and takes it away for repair or disposal. People outside the data centre will pack computer equipment into trays, insert the trays into an airlock at the bottom of the tower, from whence a robot will pick the tray up, transport it to its final location, and insert it.

When it’s time to replace the entire estate, the tower is depowered and humans can empty it out and build a new estate.

The robot uses the central core of tower. All it has to do is move up, down, around in circles and backwards and forwards. These are simple operations but nonetheless, mechanical stuff can break and does require preventative maintenance. When not in use, the robot therefore docks in an area adjacent to the airlock, so it can be inspected without humans entering the tower. As a last resort, if the robot breaks when in motion, or if something at the back of the racks breaks, a human can get in, but must wear a suit and breathing apparatus. That may sound extreme, but the tower will be hot and windy.

It may also be filled with non-breathable gas. I’m told that helium has much better thermal qualities than natural air, and filling the tower with helium would not only keep humans out, but would also obviate the need for fire protection. The lights (LED lighting, of course) would be off unless there’s a need for them to be on, and a couple of moveable cameras can provide eyes for security and when things go wrong.

Each tower would be self-contained. Multiple towers could be built to increase capacity. Exactly how close they could be, I don’t know. But this is a design that achieves a very high density of computer power in a very small area: that makes it suitable for places where land is expensive.

Last but not least, in this most aesthetically-challenged of industries, these towers would look really cool. So: Google, Facebook, Microsoft: run with it!

Previous  Next

My Perfect Green Data Centre (8) – Solar Meta-Topology

So far I’ve discussed the topology and engineering within a data centre. In this post, I take a big step back to produce a blueprint for any global internet provider such that, by combining follow-the-clock computing with a much lower-density arrangement, we get to a much lower carbon footprint.

I will now unpack that in reverse order.

Let’s start at the IT load. In an early post, I stated the ASHRAE A3 standard recommended thermal guidelines, which I have since assumed. But that’s not the whole story. Getting holding of the ASHRAE standards has become rather difficult as, like everyone else, they’ve become selfish with the information and want people to pay. However, there’s the link here guided me to the paper Clarification to ASHRAE Thermal Guidelines, which I believe is in the public domain, and which shows the full picture: although 18-27C is the recommended operating range of temperatures and <60% the recommended humidity, the allowable range is 5-40C / <85% (and, for A4, 5-45C / < 90%). Ask my laptop: computers can work in high heat and humidity and, if they’re solid state, can last for many years. Put a motherboard built to ASHRAE A4 standard in the middle of a room in the tropics, aim a fan at it, and it will run almost indefinitely.

However, in a data centre, we don’t have a single motherboard in the middle of a room. We have many motherboards. This leads to lots of heat being generated in a relatively small area, and the problem of cooling in data centres is not caused by the computers per se, but rather by our insistence on packing lots of computers into a small space. The more we spread the computers out, the easier they become to cool. If we spread them out sufficiently, and if they’re built to withstand a high intake temperature, all we need to do is extract the hot output air.

The reason that we pack them in is that, in the past, it was held to be important that the computers were close to the people they served. This meant that data centres were built in or near urban areas where land is expensive, so the extra cost of an expensive cooling system was justified on the basis that computers could be packed.

But things have changed. From a security standpoint, the farther away the data centre is from urban areas, the better. From the point of view of land-cost, likewise. And it is now far cheaper to run fibre to a rural area than to buy land in an urban area.

Next, let’s take 1,350W/m2 insolation as a physical constraint. I suspect that we’ll never do much better than 50% recovery, so we can generate, say, 650W/m2. One of the problems I’ve been tackling in the last two posts has been that our load has a much higher density: if we’re stacking ten servers to a rack, then for every 0.6m2 rack we need 6m2 of solar panel. Allowing for the fact that the racks themselves are spread out, I concluded that every m2 of rack needs 2.5m2 of solar power (whether PV or CSP).

Here’s my proposal.

The power from a single panel, 650W, is enough to power a mid-range motherboard with a few disks. So, rather than having a huge array of panels over there powering a whole bunch of low- to medium-density computing over here, put the computing where the sun shines. Assuming the motherboard is built to ASHRAE A4, the problem is not intake temperature, but extracting the heat. To address this, we design the motherboard such that all the hot stuff is at the top, and we put a couple of fans at the bottom to blow the heat out. We protect the motherboard from rain by enclosing it (hopefully in some low-footprint material).

Panel with Computer

Okay, no prizes for the artwork. The rhomboid-ey thing is the solar panel, the computer’s strapped to the back, and the arrows indicate cool(ish) air coming in and hot air being blown out.

Or, perhaps, we have compute-panels and disk-panels, arranged in some repeating matrix:

comps and Disks

 

Each panel+motherboard is in principle self-contained. However – a problem for a mathematical topologist – panels should be interconnected in such a way that a cloud passing over doesn’t cause dead zone to zoom across the array. The entire array, and the weather, should be monitored so that, when it gets cloudy, selected computers can be de-powered depending on the extent of the darkness.

At night, instead of using batteries, de-power the entire solar-compute farm, and off-load the computing to the next solar-compute farm to the west. In the morning, computing will arrive from the sol-comp farm to the east.

Now let me push it further. In a world with starving people, it makes little sense to replace tracts of agricultural land with solar arrays when there are millions of small villages across the tropics that already have roofs. Put a few solar-compute panels on each roof, and interconnect them by sticking a wifi tower in the village. Pay the village or villagers not with rent, but by providing each house with a battery, some LED lights so that the children can study at nights, and an induction cooker so that villagers don’t cut down trees for firewood. Recruit a couple of villagers and put a NOC/SOC on their tablet to give them responsibility for their own village solar-compute array. Make this part of a training and recruitment program so that the NOC/SOC have a future that goes beyond installing the solar-compute panels and fixing them when they break.

If computing is going to follow the clock, sooner or later it’s going to fall into an ocean. Most oceans have land to the north or south, but the Pacific Ocean is a particularly large hop. In the north, it’s possible to string data centres along the west coast of North America and the east coast of Eurasia, and in the south, the Polynesian archipelago could keep the bits and bytes being crunched.

That covers the edges. For the ocean itself, with the oil industry in its death throes (what a shame they don’t use all that money and power to switch business models to renewables rather than sticking with the cook-the-planet model), quite a lot of large floating structures – oil rigs, supertankers and the like – will be available on the cheap. They could form a string of permanently moored data centres that run off some combination of wave, wind and solar power.

Does this sound more like science fiction than engineering? Perhaps. But nothing in the above is beyond the reach of today’s technology and engineering: we can design motherboards, chips and disks that run reliably in hot and humid places; PV technology is already almost there; the underlying control systems, network connectivity and ability to shunt data are also already there. What’s lacking is not the engineering or the technology, but the will. Google, Amazon, Facebook, where are you?

Not on this blog. So, in the next post, I’ll take all the bits and put them together in a more conventional, less frightening way,

Previous  Next

My Perfect Green Data Centre (7) – Concentrated Solar Power

In the previous post, I came to the sad conclusion that harnessing sunlight with PV panels is not feasible as a reliable, green power source for a 24/7 data centre. It is at best an auxiliary power source, or a boost to the primary source during daylight hours.

This is because nature is not very good at storing electricity. But nature is good at storing other types of energy, and is very good at storing heat. And this is where Concentrated Solar Power comes in. There’s a detailed wiki article here; the short story is that CSP uses multiple mirrors to reflect the sun’s heat to a small point. That point in turn heats up molten salts, which in turn super-heat water to drive a conventional steam turbine.

The downside is that where PV has a single conversion – a photon knocks an electron free – CSP has multiple conversions – from solar radiation to heat to kinetic motion which drives a turbine and finally, via induction in an alternator, to electricity. Hence, although each of these steps can be very efficient, the theoretical limit on efficiency is going to be lower. However, as even domestic solar water heaters are 70% efficient, we’re starting at a much higher baseline than PV.

The joy of this is that rather than battle nature by trying to store electricity, CSP opens the possibility of storing the heated molten salts in a tank for release during the dark hours of the night. Insulating a tank is much easier than persuading molecules to host itinerant electrons. And, as I keep saying, there’s more to greenness than efficiency. Building a tank to contain hot stuff requires far fewer nasty chemicals than batteries. In addition, the tank’s life time is more or less indefinite, whereas batteries die.

Given these tensions, the overall viability of CSP is still up in the air. Predicted efficiencies for CSP are below theoretical ones, and the costs are 2-3 times those of PV panels (albeit, PV panels without the battery packs). So this is a technology that’s still not quite ready for the big time. But because energy is stored as heat, CSP offers the promise of 24/7 solar power without the horrors of batteries.

The wiki article appears to be 2-3 years out of date. More up to date information is on the National Renewable Energy Laboratory website. This suggests another problem, which is that the power densities from CSP just aren’t in the same league as those from PV panels. The site has an index of projects here, and most CSP plants cover large areas.

And this comes back to the nature of solar radiation: even if we recover up to 60%-70% of insolated energy, and even if we find ways of storing solar energy overnight that are 80% efficient, we still need a rainy-day buffer. Put this altogether, and the 1,350W / m2 diminishes fast. 70%*80% = 56%. Add a rainy-day buffer of a day or two, and we’re down to 30%. That’s 400W / m2. A typical data centre will consume 4kW / rack. After allowing for the usual factors, each rack takes about 4-5 m2 of space, so say 1kW / m2. That means for every 1 m2 of IT load, we need 2.5m2 of solar capture. That’s a 2.5:1 ratio at best. In practice, it’s going to be much worse.

Hence we find ourselves on a trajectory back to off-setting.

But before we go there, perhaps we’re solving the wrong problem. And that’s for the next post,

Previous  Next