back to article Fire alarm sparked data centre meltdown emergency

Fire alarm tests are a good idea; you generally want the warm feeling that when something decides to combust, you'll be able to tell people about it with a loud ringing or wailing noise. I used to run what you might consider a traditional machine room. We had a pile of ageing Sun kit – socking big CPU units and cabinets full …

  1. Sureo

    It might have been a thought to determine why relay #18 melted down, so it wouldn't happen again.

    1. kain preacher

      yep sound like either to much current or voltage was going through. Next time it might fry the entire panel causing a fire with no working alarm system.

      1. Anonymous South African Coward Bronze badge

        Now that will be real fun.

        NOT.

      2. hplasm
        Headmaster

        Stricly speaking...

        only"...too much current..." could flow through it...

        Professor Kirchhoff --------------------------------->>>

    2. CustardGannet

      Determining *why* it melted ? That sounds suspiciously like Root Cause Analysis to me ! Do you think this company is made of money ? Just make it work again !

      (Pointy-Haired Boss icon goes here -> -> -> )

    3. Anonymous Coward
      Anonymous Coward

      why relay #18 melted down

      Obviously melted by the BOFH's cigarette lighter in anticipation of installing an interlocking halon system and remote door locks, so next time it was 'tested' they could ensure the boss was supervising from the inside.

    4. Anonymous Coward
      Anonymous Coward

      It's like the stupid people who have RCD circuit breakers, and their first response to them tripping out to shut off power is to immediately force it back on again, even without finding root cause - even if that means using sticky tape to keep it on.

      1. Anonymous Coward
        Anonymous Coward

        "using sticky tape to keep it on"

        Which it won't (if working correctly), as the designers thought about that one...

    5. Robert Carnegie Silver badge

      Since the same thing didn't happen next week (unless that is the story to tell next week), it may be a reasonable guess that what went wrong is the relay itself - an electric-motor-operated electric switch, in which, if the wrong two parts touch, you could well have melty materials situation and trouble to come afterwards.

      When an olden-times light bulb ceases to operate, you replace the light bulb and then test if the light comes on. From experience, if such a light is protected by a disposable 3 amps fuse at the wall socket then it often happens that the fuse also must be replaced, and that's what you test if the new light bulb doesn't light. You may also amuse yourself by studying the bulb carefully to see if the wire inside is broken, which it may be.

      It could be in my example that you're getting higher-voltage surges in your supply that cause popping of fuse and bulb, but I think you'd notice other lights flashing, your TV set exploding, etc.

      1. Baldy50

        The relay failed because it was one in constant use, possibly most of the others were used to switch something on and this would be a failsafe for a complete panel failure, the relays operating the sirens might be configured this way too but no contact current flow and No. 18 could have had to deal with supplying the power to large electromagnets 24/7.

        Sorry, but why do people insist that the BS 1361 in the plug must be 3 Amp?

        According to the NICEIC regs to BS7671 it does not and a 13 Amp will do, all BS rated bulb holders are rated at 16 Amps.

        1. Commswonk

          According to the NICEIC regs to BS7671 it does not and a 13 Amp will do, all BS rated bulb holders are rated at 16 Amps.

          I think you will find that that only applies of the cable is also rated at (not less than) 13A. Do not use a 13A fuse if the cable from the plug is rated at 3A!

          1. Martin an gof Silver badge

            I think you will find that that only applies of the cable is also rated at (not less than) 13A. Do not use a 13A fuse if the cable from the plug is rated at 3A!

            Indeed. It is common for the flex from the plug to (say) a table lamp to be somewhat less than the 1.25mm2 which is really the minimum required to be safe under the protection of a 13A cartridge fuse. A flex with wires of 0.5mm2 cross-section is not uncommon and really does need a 3A fuse to be sure of not melting under fault conditions.

            As for

            all BS rated bulb holders are rated at 16 Amps

            I don't know about that. If you consider the size of the terminals and suchlike and compare them with the (admittedly over-engineered, but that's not a bad thing) BS1363 13A plug...

            Even if the bulb holder itself is capable of more, once again it's usually the flex that is the limiting factor. For a ceiling pendant, a thin flex is very common. Lighting circuits are protected by 5A fuses, 6A MCBs or 10A MCBs under certain circumstances.

            The situation is a little more nuanced than that when you consider the harmonisation of electrical standards across Europe. Nobody else uses fuses in appliance plugs, for example, but then they don't usually run 32A to a wall socket either, as we do in the UK. Their socket outlet circuits are more commonly 16A. Hmm... unless they run separate circuits for lighting sockets (possible - it happens in the UK), do their table lamps have thicker flexes?

            M.

            1. Wzrd1 Silver badge

              " Nobody else uses fuses in appliance plugs, for example, but then they don't usually run 32A to a wall socket either, as we do in the UK."

              Well, many appliances also have thermal fuses within. Most motorized appliances have them inside of the motor windings (I've bridged more than a few to get them operating again at home (yeah, I know, not the optimal solution, but it's a coffee grinder)). Hot plates also tend to have thermal fuses.

              As for relays, I've burnished my fair share of relays before they overheated and melted. I've even separated welded contacts, burnished the contacts and set them back into service, pending a replacement relay for the next maintenance cycle.

          2. Alan Brown Silver badge

            " Do not use a 13A fuse if the cable from the plug is rated at 3A! "

            Correct. The fuse in the plug is there to protect the wiring, not to protect the bulb holder.

            Similarly the circuit breaker is there to protect the building wiring, not your equipment, on the basis that your equipment might go bang but having the building burn down too is generally regarded as a bad idea.

      2. ICPurvis47
        Holmes

        Strange Coincidence

        Exactly that happened to me just yesterday, an aged relative phoned me in great agitation to say that he'd switched on the living room light, and a bright flash had occurred inside the fusebox. I hightailed it over to his place and removed and checked the bulb - blown. Asked him where his spare bulbs were - blank look. Fuse wire? - another blank look. So, into the car and drive to the nearest hardware store (not many of them around these days) and purchased a pack of light bulbs and a card of fusewire. Back to his place and rewired the fuse holder. Didn't put in the new bulb yet, but checked that the fuse didn't blow when switch was operated, so checking that wiring and light socket were OK. Lastly, insert new bulb, and it works. Don't know how old the bulb was, but he now has spares and spare fusewire for the next 'emergency'.

        1. jtl

          Re: Strange Coincidence

          I have to admit, as an American (electrical engineer) the idea that replaceable-link fuses are still commonplace never ceases to boggle me!

          1. Robert Carnegie Silver badge

            Re: Strange Coincidence

            I think in UK old people's houses have the sort of "fuse box" that sometimes plays a part in black-and-white movies and they haven't replaced it. Modern installations do have "circuit breaker" safety. Well, mine does and it's about 20 years old - should I - ?

            1. Wzrd1 Silver badge

              Re: Strange Coincidence

              We once had a house in Philadelphia that was built in the 1920's. Featured, knob and tube wiring, old gas pipes in the walls from the now long gone gas lights and a fuse box.

              No wired fuses though, screw in and cartridge fuses only.

              We promptly installed a breaker box, segregated the circuits and blown fuses became a thing of the hopefully to be forgotten past. And honestly, hanged if I could find a fusible wire in the US.

        2. AndrewDu

          Re: Strange Coincidence

          Wired fuses? Srsly?

          Ye Gods.

      3. Martin an gof Silver badge

        if such a light is protected by a disposable 3 amps fuse at the wall socket then it often happens that the fuse also must be replaced

        Once upon a time, when people built things to proper standards, all incandescent light bulbs actually included a very low-current internal fuse. When the filament breaks, a surge in current is caused if an arc forms - the arc will often travel to the supply wires in the bulb, bypassing the filament and therefore creating a low resistance path for the current.

        A simple bulb failure resulting in all the lights in a house going out (as I note happened to a later commentard's relative) simply shouldn't happen, and the bulb's internal fuse was designed to prevent this.

        Sadly, beginning probably in the 1980s, many bulbs were built without such a fuse. This (in my experience) was more of a problem for "fancy" bulbs (e.g. "candle" bulbs) than for the normal ones, and certainly the small-capsule Halogen bulbs were / are a pain in this regard.

        In these days of CFL and (particularly) LED lighting, it isn't the failure of the bulb, per-se, that's a problem; it's the failure of the switch-mode power supply that is needed to run the thing. These power supplies should have some kind of internal fuse, but I've met far too many which don't. They do sometimes trip the main fuse, but more often than not they fail "safe" (for certain values of safe) by getting too hot, whereupon some component on the mains-side of the thing usually burns out, or burns the PCB sufficiently to break a trace.

        M.

    6. Anonymous Coward
      Anonymous Coward

      It's a common failure mode of relays themselves. The contact surfaces can break or wear down, there's not enough left for efficient conduction and they get hot. That's often the start of a death cycle.

      The circuit should have protection elsewhere for too much current. If the new relay doesn't immediately start melting in the on or off position then that protection is working correctly.

      1. Anonymous Coward
        Anonymous Coward

        "The contact surfaces can break or wear down, there's not enough left for efficient conduction and they get hot. "

        And they then weld together. Assuming both sets of contacts are metal.

        It is also possible in some circumstances to have relays where one contact is metal and its mating half is carbon.

        These contacts do not weld together.

        This is the kind of thing which good engineers used to think about when designing safety critical systems. Amongst other things, it makes the failure mode and effects analysis rather simpler. When did you see an FMEA done properly for a computer-based system?

        These days as long as you can tick the relevant boxes on some paper-based audit of the development process, it's probably acceptable.

        1. jtl

          If they are good (as in "old days") they are cadmium silver!

  2. frank ly

    Silkscreen artwork

    "We agreed that this was probably the faulty one. We pulled it off and sure enough, the number 18 was on the circuit board underneath."

    Intended to be manufactured, not to be fault traced or understood.

    1. Wzrd1 Silver badge

      Re: Silkscreen artwork

      "Intended to be manufactured, not to be fault traced or understood."

      Indeed.

      A handful of years ago, our building UPS, where backup power was supplied by a room full of batteries until the generator kicked in, had a veritable room full of defunct batteries.

      When a utility transformer blew, leaving prime power offline, the building UPS promptly went offline. Fortunately, nothing critical was offline - only all war communications to and from both Iraq and Afghanistan. Telephone service, offline, data services, offline.

      Suffice it to say, the staff meeting was quite lively with explaining to the General why his telephones and e-mail didn't work.

      The common meme was, "no budget for that room full of expensive batteries". Something I had placed in my weekly status report since my arrival in theater. Those in charge, who glossed over in pointy haired boss mode were sent scurrying into the shadows, lest they never be promoted again.

      Needless to say, our budget increased sufficiently, if only for a one time event, to replace that room full of expensive batteries.

      Six months later, the batteries finally cleared customs and were to be installed in the UPS room. The facility manager then decided that he'd switch the UPS into bypass mode - without reading the manual or consulting an electrician.

      Off went the power.

      He then decided to read the manual, set it properly into bypass mode and begin re-initializing all cryptological devices and noted that the primary routers for the entire enterprise remained stubbornly offline.

      I happened by to get a status report, as well as to ascertain just what in the hell was going on and I was brought swiftly up to speed on events that had just transpired.

      So, my first question, "Do you have an electrical diagram of the power circuitry?". Amazingly, the answer was a resounding affirmative and off to the diagrams, rolled out over the middle of the facility floor I went.

      I swiftly traced the power to the racks with the main routers back to their breaker and asked, "So, where is (not the actual circuit number) L18Y7 breaker?

      A befuddled look later, we went off to the racks and racks of breaker panels, locating L, then 18 and finally the proper breaker, which was thoughtfully installed behind one of the larger racks of batteries and out of sight unless one knew it was there to be found.

      I asked, "Do you want to await the arrival of the electrician or do *you* want to flip the breaker? It's not my responsibility and I won't be responsible if something goes wrong when it's flipped by an unauthorized person.", he flipped the breaker and the routers came up.

      The moral to the story: Keep the damned manager away from the damned UPS!

  3. cd

    Were it me there would have been a pilot lamp for the Test position, wired to #18, from then on.

  4. michael cadoux

    Okay this was early 1980s: the Doomsday Button was fairly high on the wall, and surrounded by a casing - but without a front face. Somebody (no really not me, not guilty) still managed to stick his elbow into it ("How did you even manage that without being 7 feet tall?"). Fire suppression system activated, power off, evacuate before room filled with some retardant gas ...

  5. Anonymous Coward
    Anonymous Coward

    In the 1970s mainframe systems had to be powered up and down in stages - often one unit at a time. So the process was automated with a mains sequencer - basically a motorised set of cams to signal remote relays in the units. The sequencer was in a box on the wall - and you set the sequence cams by opening a door on the front.

    As there was mains inside the box - there was a microswitch interlock on the door for operator safety. In theory you couldn't open the door without a key - but as this was a prototype test area the key was left in the lock.

    Inevitably someone opened the sequencer door while the system was powered up - and everything crashed down. It took 3 days to bring the system back to full operation.

    Another slight flaw in the system ergonomics was that the operator's desk had a panel with a few identical buttons. One was to toggle the power up/down sequence. Next to it was the button the operator pressed to signal the OS to produce a prompt.

    For some reason it was decided to swap the two buttons' positions for the first production machine that was being built. Thereafter it was not uncommon to hear a system unexpectedly powering down .

  6. Anonymous Coward
    Anonymous Coward

    The clanging of the air conditioning shutter reminds me of one of our machine rooms. Not quite sure what the trigger was - possibly a mains brown-out. The shutter kept clanging up and down - presumably as its sensor was on a voltage cusp with insufficient hysteresis provision.

    The 600MB fixed disk weighed nearly two tons and had water-cooled bearings. Some days the mains water pressure would be a bit low and the safety mechanism would start to power the unit down. That allowed the water pressure to appear to be ok - and it would then cycle up - causing the pressure to drop again - ad nauseam. As the process produced loud groaning sounds it was a very recognisable condition.

    1. jtl

      I loathe, yet am somehow perpetually amused by, these sorts of unexpected oscillators. I have had to deal with a 2000 ton chilled water plant which had similar shenanigans recently due to oscillating its "low water flow" cutouts and the motorized isolation valves on each chiller...full load, 900hp compressors...

  7. patrickstar

    There have been some cases of fire alarms literally killing disks due to the siren - apparently mechanical disks do not take well to 120dB of sound. Including at least one where an entire RAID array was toast.

    Reminds me of that YouTube video of a guy showing the latency of a disk array increasing as he screams in front of it...

    1. Anonymous Coward
      Anonymous Coward

      In the early days of PC style hard drives there was some departmental debate about what action constituted a damaging mechanical shock. The "environment" expert indicated that even a tiny drop onto a hard surface was enough to exceed the specs.

      I go cold when I see people with non-SSD laptops bouncing them onto desks while they are running. I still treat any movement involving a disk as if handling eggs.

      1. KjetilS

        Eggs can probably take more than a non-SSD spinning, though

    2. Wzrd1 Silver badge

      "There have been some cases of fire alarms literally killing disks due to the siren - apparently mechanical disks do not take well to 120dB of sound."

      Well, there was that time that the LAN/WAN shop supervisor and I were working in the main server room, when after a bit, we happened to work our way toward the end of the room that was close to the door and we noticed some odd noise.

      Upon opening the server room door, we noticed that the fire alarm was going off and we later learned, it had been going off for at least 20 minutes before we noticed it.

      Obviously, the server room had no siren, no annuniciator or any other form of signaling device to notify anyone within the server room that an emergency was ongoing.

      Of course, the entire data center was one that was hastily installed, so it was a converted warehouse with a wet pipe fire suppression system that was shut off, lest a leak turn into a multi-million dollar disaster.

      Once, we had a new installation fire marshal want to charge the system and test it, which all admins refused to permit, the country manager refused to permit and when authority to overrule all was sought from The General, well, it wasn't all that long after a major communications outage (see the UPS story above). Sir's response was both memorable, brief and negative.

  8. Alan Brown Silver badge

    Temperature

    "All of the expensive stuff was really sensitive to environmental anomalies and would happily cook itself in under an hour if the air-con failed."

    Which is why our last resort is a thermostat set to 35C wired across the server room emergency power cut off circuit. Nothing like a crowbar to ensure things go off and stay off.

  9. Anonymous Coward
    Anonymous Coward

    Testing fire zones in a Nuclear Power Plant is quite fun!

    Permanently powered solenoids keeping louver dampers open are quite entertaining to replace, as they give up working and shut random sections of the HVAC system down with them.

    Even more fun is them causing enough pressure differential in a nuclear airtight building, and causing every loo in that section to change their watery level. That's usually not a problem, except when a whole lot of them were already about to overflow on their own, and that same HVAC differential pressure was being tuned at the same time, causing the disruption in their level in the first place.

    You see, the best way to keep contamination from flying away from a contaminated building is to keep the air pressure inside it a bit lower than the outside.

    For the finishing touch, the differential pressure shift prevented the opening of some doors by the smaller members of the staff. It was just a few millibar, but given the large size of some doors, it was calculated to reach in the whereabouts of 50 kilograms of force, maybe more.

    You can see where this is going.

    Now, we keep a crowbar in the premises of the WC in those zones at all times.

  10. TXITMAN

    Never

    I have read so many of these 'data center shut down by xxx' over the years but Never have I seen a fire alarm nor emergency power off button save a person from injury. I have been in hundreds of data centers.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like