When a server loses power, even for a split second, it can damage the hard drive.
True, very true...that's why you fit a UPS to bridge the gap between mains failure and backup generation.
Manchester workmen were blamed for knocking out hosting business UKFast's data centre today, after it seems some hapless bod cut a cable with a pickaxe. According to the Brit firm's status page, the problem arose this morning at 11am, affecting its MANOC 5 data centre. The incoming mains supply was lost to the site and …
"Gas mains are no more impervious to pickaxes than power cables are."
But you are very very unlikely to loose both at the same time. They are entirely diverse services. The gas only needs to be there when the electricity isn't. As I would have thought was obvious.
Hence why companies like eBay, Google, FedEx, Bank of America, Walmart, Coca Cola, etc. etc. use Bloom Energy servers instead of wasting power via online UPS conversion and keeping batteries charged.
This post has been deleted by its author
>>There's nothing especially "clean" about Bloom, it's just a fuel cell that runs on gas, conceptually the same as a standard gas or diesel genset."
Well there is - fuel cells emit far less CO2 than a typical gas generation plant and near zero of the other common related pollutants such as NOx, SOx and VOCs
>> Produces 735-849 lbs of CO2 output per MWh.
As opposed to typical values of 2,117 lbs/MWh for current coal generation and 1,314 lbs/MWh for existing gas power plants - so a much lower CO2 output for fuel cells.
As opposed to typical values of 2,117 lbs/MWh for current coal generation and 1,314 lbs/MWh for existing gas power plants
And around 600 for modern cogeneration (CHP) sets which are popular for local off-grid supply.
Bloom isn't dirty by any means, but it's not especially clean either.
Bloom Energy servers instead of wasting power via online UPS conversion and keeping batteries charged.
Bloom Energy units are just specialised fuel cells, effectively solid-state generators. It's just local generation, with the grid as backup, nothing especially new in that and as environmentally unfriendly as a standard gas generator. You don't need a UPS because you assume the grid is "always on" so you have no need for a UPS to fill in the generator start time should your primary source fail. If a backhoe should take out both gas and electricity conduits you're screwed.
This post has been deleted by its author
Standard gas generators emit lots of CO2 and are submit to grid and transformer losses. Fuel cells emit hardly any CO2 and the primary output is water vapour (and electricity!)
That's simply not possible. You're consuming a hydrocarbon in the presence of oxygen, it doesn't matter if it's a hot or cold combustion, the chemical products will be the same. See the data sheets on the Bloom website for the CO2 figures. Only a hydrogen-fed fuel cell will produce water alone.
Also there'll be no grid or transformer losses from a local gas generator, which will not be connected to the grid.
"and as environmentally unfriendly as a standard gas generator"
No, standard gas generator plants emit about double the CO2, various other pollutants and are submit to grid and transformer losses. Bloom Energy fuel cells emit about half the typical CO2 output of a gas generator plant and a quarter of a typical coal plant.
"If a backhoe should take out both gas and electricity conduits you're screwed."
You could say the same if someone took out all the fibre connectivity to your data centre. Which is why diverse services are fed to your data centre some distance apart via concrete lined trenches. Usually with dual feeds too. If you look at the rather impressive customer list above, clearly it works.
Microsoft - generally acknowledged as one of the world leaders in datacentres - are putting these directly into racks:
http://www.datacenterknowledge.com/design/microsoft-launches-pilot-natural-gas-powered-data-center-seattle
Having worked on gas mains, leaking and otherwise in a previous incarnation, I can assure you that only a very ancient, rusted-out one is vulnerable to a pickaxe. But a trenching machine...or a backhoe...that's different. BTW, I once helped drive a 1 1/2" gas pipe squarely through an underground telephone cable, knocking out service to portions of two adjacent states. It was fascinating to see service trucks and company carsful of engineers screech in from all points of the compass, form a seething knot and bicker about who was entitled to scream at whom.
No UPS can be guaranteed to function through a short-circuit or other dangerous situation (e.g. phase crossing).
However, a datacentre uses UPS only as a brief stopgap, and the slightest delay in starting up the generators will mean dead batteries and a power blip inside.
But "UPS" don't provide "uninterruptible" power. They just provide a backup, like any other. When a dangerous situation exists, even a high-end UPS will cut out for safety. Yes, I've seen them do it. In one case, a phase-crossing accident would literally hard-power-off the UPS instantly without warning or beeping or anything - just a single red light. Just bang, down, wait for power to return to normal. UPS was doing its job, before, during and after.
A pickaxe through a cable is exactly the kind of thing that can bridge the live and earth, or multiple phases for instance, and UPS can't completely isolate the inside from the outside.
When a dangerous situation exists, even a high-end UPS will cut out for safety
Which is why there's a distinction between High Availability (the UPS + generator) and Disaster Recovery (the second site with the hot standby on completely different power circuits). Each protects against a diferent type of fault.
I can tell you for a solid fact that APC UPS units get mighty peeved when they see 220 on the _ground line_. In that incident, the UPS went into full isolation and shut down hard, which protected the server that was on it from getting it's hardware blown up. (unlike two of the brand new workstations at that site, which decided to set their power supplies on fire as the electrolytic caps blew out.) Thankfully, the server passed it's fsck and carried on after everything was brought back to normal.
(mushroom cloud icon, because it's not every day one sees flames jetting out the back of a brand new computer's power brick.)
>I can tell you for a solid fact that APC UPS units get mighty peeved when they see 220 on the _ground line_.
What, it's 2017 and this is still a problem! :)
Back in the late 1970's I worked for a company that installed IT systems next to railway lines ie. signalling systems. Where fluctuating voltages on the ground line were regular events (ie. every time a train went by). So the company had developed some rather fancy switch gear that sorted the problem. The other problem (to delicate IT systems) we saw in the early 80's were the power spikes caused by the then newly introduced thyristor controlled systems, these were particularly troublesome as they were invisible to the then new digital scopes but not to the analogue scopes.
@Roland6: It's usually only a problem when it's intentionally done by an electrician who got the bill for the outage they caused to the businesses in that office complex, *and* our bill for the replacement of hardware, technician labor, call out, etc. :)
If I recall correctly, said electrician shorted the 220V line in to the something- now that I'm thinking, it might have been the neutral line and not the ground line. (US uses hot, neutral, and earth ground for most things) In any case, two of the workstations did not like whatever they did and the capacitors in their power supplies blew up rather messily. the UPS did exactly what it was supposed to do- isolated the load entirely, then shut down.
It was fun walking into the shop in the mornings and smelling freshly cooked power supplies.
"when they see 220 on the _ground line_"
If you have decent input and output protection this "should" never happen.
You can get away with assuming it's all fine 99% of the time, but it's that 1% that gets you - and in a lot of cases the UPS power supply systems are overcautious about shutting down under conditions that the mains keeps going under. Lots of arguments between power people and data centre people revolve around what's acceptable.
I worked as an engineer developing the circuitry and switching components for UPS systems running the safety systems at two nuclear facilities in the U.S.. these systems delivered 360v at 375amps uninterrupted
Rule #1 : Four independent UPS systems
Rule #2 : Two UPS off grid powering the safety systems. One at 100% drain, one at 50% drain
Rule #3 : One UPS being discharged in a controlled manor to level battery life and identify cell defects
Rule #4 : Recharge the drained battery
Rule #5 : Fourth UPS drain and recharge separately
Rule #6 : Two diesel generators off grid
This system may not guarantee 100%, but it is far better than five-9’s. There can absolute catastrophic failure on the supplying grid and it does not impact the systems even one bit. This is because the systems are never actually connected to the grid. And before you come back with issues or waste related to transference, the cost benefits far outweigh the losses because the life span of 90% the cells are extended from four years to an additional 3-5 years by properly managing them in this fashion. And the power lost at this level is far less expensive than replacing the cells twice as often.
P.S. before you call bullshit, there was extensive (corroborated) research at Univeristy of South Florida over a period of 15 years on this one topic.
"Those things cook batteries until after 3 years they are no good. "
A lot of that has to do with the discharge cycles they see. Lead Acid batteries DO NOT like being discharged and the deeper the discharge the fewer cycles they'll endure.
AGM batteries are compact and don't gas off, but that's their only advantage. If you want cells that last for decades, then use traction batteries or a string of flooded deep discharge telco cells like these exide ones: https://www.exide.com/en/product/flooded-classic-ncn (those are the nuke type, read the PDF to see the selection choices you have)
You'll need 24 of them.
Even at end of life they're impressive. A certain exchange engineer strapped 12 old satellite exchange ones into the back of an original Fiat Bambino, replaced the engine with an electric motor and would commute the 5 miles from home to work for a week on a single charge - he did this for 20 years (until he retired) and there was no noticeable degradation in range.
AGM's also don't generally like being charged all the time, either- that's what usually kills the battery packs on the APC units. Generally what happens is one or more of the batteries in the pack get tired of dealing with the overcharge and go dead open, at which point the pack stops charging entirely, and you lose protection entirely. Given the prices that APC charges for replacement battery packs, I'm more fond of buying a set of replacement batteries and re-building the packs- the downside to that is that you void the connected equipment coverage by doing that. I'm not quite certain what the big 3-phase Emerson-Liebert beasts we use at RedactedCo use, but I don't have to worry about it because we have them on a maintenance contract, and the company we are using is quite reliable..
P.S. before you call bullshit, there was extensive (corroborated) research at Univeristy of South Florida over a period of 15 years on this one topic.
Be interested in a reference/pointer to the research, as suspect because the end result is fewer battery sales, this isn't something many in the vendor community would want to be widely known about.
"True, very true..."
So how does the drive suffer damage then? I assume you are talking about physical damage here. I've never seen a hard drive damaged by a power failure; data corruption yes but actual physical damage, no. Aren't they designed to auto-park the head when the power trips? Maybe it's different in large data centres with 000's of servers. Please enlighten me :)
I'll wager the components of a recovery plan were all documented and tested, including physical test runs of the gensets. I doubt they did a full "turn off the mains power" test, but if they had, they'd have been in the same position (sitting in a dark data centre, thinking "shit!").
The other possibility is that they have turned off the mains in tests, and everything went perfectly. That's a known problem with standby power - it only works most of the time. And on that subset of times when it doesn't work, you usually need it and everybody notices.
A question for the DR professionals: What is the ACTUAL failure rate of a completely successful, fully automatic handover from interrupted mains to on-site generators? My guess is nobody does it often enough to know.
"What is the ACTUAL failure rate of a completely successful, fully automatic handover from interrupted mains to on-site generators? "
We get around 400-600 power breaks per year (rotten power feeds in the surrey countryside). We've had about 5 unplanned outages in the last decade. That's with a flywheel kinetic system backed by diesel generators and at least one of those was due to the generator starter motor battery being dead. Most of the time the flywheel rides out the break and the gensets only start at the 10 second mark,
Exactly.
The article very carefully skims over that bit, doesn't it.
The incoming mains supply was lost to the site and generators failed to take over the service.
Every datacenter I've ever dealt with does weekly on-load generator tests, and UPS failover tests.
Now we all know shit happens, no matter how much we try to prepare, but this does feel like they haven't been taking enough time on planning or testing.
Why didn't their UPS have enough capacity to keep things up, even if the generators failed to start cleanly?
Every datacenter I've ever dealt with does weekly on-load generator tests, and UPS failover tests.
None of which tells you that you're safe against a breaker cascade as the whole A load switches to B and idling PSUs in blade chassis reactivate etc. There is no substitute for randomly[1] flicking breakers, PSUs and HVACs on a routine basis to verify TIII resiliency and DR will work as required. Unfortunately, that requires a degree of testicular fortitude entirely absent in facilities staff (other than perhaps those actually angling for a P45) and so this sort of thing keeps happening.
[1] It has to be random otherwise Ops will shift loads to other infrastructure to protect their uptime metrics and thus invalidating the results. Idling equipment draws less power.
Such testing is what I (as a conscientious consultant) recommend to my clients. But I also tell them: "If you ever have a real disaster (fire, flood etc) and 80% of services carry on working, you'll be a hero. If you do a disaster test and 98% of services carry on working, start looking for a new job."