back to article *Thunk* No worries, the UPS should spin up. Oh cool, it's in bypass mode

Whatever can go wrong will go wrong. It's a law most IT people would understand and perhaps even fear. It was my third day as the new network manager for a reasonably sized estate across several sites, most inhabited by weirdy beardies who had jobs like counting bats, frogs and other animals you may never have heard of. It …

  1. wolfetone Silver badge
    Pint

    "I left a few years later but the last I heard the company had spent several million pounds on a new site built directly on a flood plain with the IT hardware in the basement."

    I, for one, can't wait to read about the sequel.

    1. Anonymous Coward
      Anonymous Coward

      IIRC one new data centre was on a flood plain close to a major river. It had large electric pumps should a storm surge flood breach the yard's walls. Unfortunately the pumps were only powered off the mains supply - not the backup generator.

      1. Anonymous Coward
        Anonymous Coward

        Not part of

        a nuclear power station was it? I recall an incident caused by this sort of scenario....

    2. Crisp

      Re: a new site built directly on a flood plain

      That's free water cooling. A cost saving right there!

      1. chivo243 Silver badge

        Re: a new site built directly on a flood plain

        Now, that's future readiness!

    3. handleoclast

      Re: I, for one, can't wait to read about the sequel.

      It's called Fukushima.

      1. Gordon 10

        Re: I, for one, can't wait to read about the sequel.

        I think you'l find its pronounced Fukubeancounter

        1. Anonymous South African Coward Bronze badge

          Re: I, for one, can't wait to read about the sequel.

          fukubeancounter

          ^ that ^_^

      2. Joe User

        Re: I, for one, can't wait to read about the sequel.

        > It's called Fukushima.

        It's also called New Orleans.

    4. Paul Hovnanian Silver badge

      "flood plain"

      How about consolidating a companies' distributed data centers to one location. Built right on top of the Seattle earthquake fault.

  2. hplasm
    Holmes

    Infestation-

    "Sometimes organisations just don't engage their brains – or it is someone else's problem. "

    It's Beancounters. Like woodworm, once they get into the fabric of the company, they weaken the structure and are hard to eradicate.

    1. Anonymous Coward
      Anonymous Coward

      Re: Infestation-

      Sometimes it’s beancounters. More often, it’s not (at least in my experience). In a lot of cases it’s actually easy to get what you want, but it’s up to IT to get the message across. In this case it wouldn’t have helped of course, it’s what the previous guy should have done.

      Get your disaster scenarios straight, and the options with detailed costs to mitigate. For each scenario, give an estimate of likelihood and potential cost to business (eg 1 site goes down, 100 users are twiddling thumbs for 2 days, that’s 200 days worth of wages, if it’s a warehouse add costs for late deliveries during peak season and so on and so forth). Put it all in a nice excel file with estimates from all and sundry. Sales & Marketing can be your friend here for the intangibles like reputation loss (they often tend to set their “disaster” cost estimates higher than they really are because their bonuses depend on it). If a customer has to wait longer for an order, in B2B environments for large orders and in B2C with social media shitstorms, the costs add up.

      If you did all that and the beancounters say no, ask why and make sure they give you the numbers on which they base their conclusion (in writing). Because basically in that case they’re telling you they don’t want insurance for when the house burns down.

      When disaster strikes, *then* you can blame them.

      I’ve had my share of conflicts with them, but we’ve reached an equilibrum nowadays. I’ve cut down on maintenance contracts by stocking more spares which looks good on the balance sheet after year one, they are more easygoing when I want secondary connections to the main sites or more redundant power to the secondary data center. It takes time, but it’s worth it.

      1. Anonymous Coward
        Anonymous Coward

        Re: Infestation-

        "give an estimate of likelihood and potential cost to business (eg 1 site goes down, 100 users are twiddling thumbs for 2 days, that’s 200 days worth of wages, if it’s a warehouse add costs for late deliveries during peak season and so on and so forth). Put it all in a nice excel file with estimates from all and sundry"

        In my experience, unless your likelihood figures are realistic, you need to have a chance of failure >33% before anyone from beancounting and/or senior manglement will take the risk remotely seriously. I've seen plenty of people who should know better assume that a 10% chance is the same as "will never happen". Even if you've got accurate ways of measuring the risk (because a lot of this is frequently finger-in-the-air wizardry), too many people will ignore the risk entirely anyway, because those IT guys are always pessimistic and grumbling.

        Of course, as Pterry so rightly pointed out, million to one chances come true nine times out of ten.

        A previous job we had several racks in DR using only one UPS, and we said the chance of one of the UPS failing during an actual DR was about 20% due to the increased load (hand-wavery here was we didn't know what the actual max load was going to be during DR, nor exactly what the maximum load on the aged UPSs could withstand before falling over). Beancounters and senior management said that this was an acceptable failure risk and said that, if the UPS failed, just plug some or all of the servers into the UPS in the next rack to take the load off the overloaded UPS. This was added to the DR plan by the beancounters without knowledge or signoff from the techies, because who needs their opinion?

        Guess what happened come the next DR (exercise thankfully, not an actual disaster). The business learnt the hard way that "cask aiding" doesn't mean assisting out a forlorn barrel that's down on his luck.

        Of course, it turned out it was actually ITs fault for not correctly identifying the risk, because the chance of failure in hindsight was actually 100%, and the beancounters couldn't have been expected to allocate budget accordingly if IT gives incorrect information. If half the racks hadn't have died, the chance of failure would have been 0% and thus IT would have been at fault for incorrectly saying there was a risk of failure and the beancounters would have been entirely right in denying ITs frivolous request.

        <need a Catch 22 icon>

        1. Anonymous Coward
          Anonymous Coward

          Re: Infestation-

          True, but often missing in the first pass analysis is the impact cost. If it's a 1-in-a-million chance you might be tempted to roll the dice...but what if the effect of hitting that 1-in-1m is that your business is destroyed?

        2. rskurat

          Re: Infestation-

          failure to manage should mean being drummed out of management, with one's coat & tie slashed to ribbons in front of the whole regiment

    2. veti Silver badge

      Re: Infestation-

      Hey, beans don't count themselves, you know.

      Beancounters are not your enemy. They've got a job to do, and it's a real (mostly boring, mostly thankless) job that needs doing.

      Management, there's your enemy. Not your line manager, although they may become so if you don't cultivate them properly, but the real management. You know, the ones who take decisions about what risks are "acceptable" and what memos to ignore.

      On a related note, another enemy is Chicken Little employees and consultants who send scaremongering memos about every conceivable risk, without properly quantifying it. When you tell the boss "a power cut will CRASH THE COMPANY", make sure you include quantitative assessment (likelihood per year of unscheduled power outage in this location, likelihood it will occur during business hours, and a specific projection of likely losses). The beancounters can actually help you with that: get them on your side.

      And a proposal to mitigate the effect using a UPS, obviously, needs to include an allowance for maintenance of said UPS.

  3. imanidiot Silver badge

    I left a few years later but the last I heard the company had spent several million pounds on a new site built directly on a flood plain with the IT hardware in the basement.

    You'd be surprised about how many people don't even know what a floodplain IS, let alone the consequences of putting important infrastructure there. Even in the Netherlands (a country known for it's water management) some companies and homeowners have had to learn this lesson the hard way.

    1. H in The Hague

      "You'd be surprised about how many people don't even know what a floodplain IS"

      In the UK:

      https://flood-map-for-planning.service.gov.uk/

      https://www.gov.uk/check-flood-risk

      In NL:

      http://www.overstromingsrisicoatlas.nl/

      https://www.risicokaart.nl/

      www.overstroomik.nl/

      "... some companies and homeowners have had to learn this lesson the hard way."

      The very hard way as I think Dutch insurance doesn't cover flood risks. All the reason to pay the Waterschapsbelasting (dyke maintenance levy) diligently :)

      1. Tabor

        re: H in the Hague

        I hope you meant dike maintenance and not dyke maintenance ? I am all for LGBT rights, but a levy to maintain only the L part seems a bit much. Joke icon needed, but can’t seem to find it on my phone while in private mode.

        1. Anonymous Coward
          Anonymous Coward

          Re: re: H in the Hague

          It does make for a lovely image though. I imagine them all lined, up, arms linked, facing the North Sea and muttering "bring it on".

        2. Fungus Bob

          Re: @Tabor

          But Holland is known for its dykes!

          1. Aladdin Sane

            Re: @Tabor

            I believe there's a bit in Good Morning Vietnam about that.

          2. tony2heads

            Re: @Tabor

            dyken

    2. JimboSmith Silver badge

      Floodplains

      The official monster raving loony party actually had a policy on floodplains.

      Part of it says

      Under a Loony government any prospective home purchaser be issued with a full description of such dictionary terms as ‘floodplain’, ‘coastal erosion’ and ‘exposed headland’. This will save time explaining why they have no house anymore after nature takes charge of the environment.

      Read more at http://www.loonyparty.com/5908/3058/floods/ could be a useful idea for companies too.

      1. Sgt_Oddball
        Alien

        Re: Floodplains

        How the hell does the monster raving loony party have a more sensible answer to this than the actual government?

        Next you'll be telling us Lord buckethead has a perfect solution to the housing crisis....

        1. JimboSmith Silver badge

          Re: Floodplains

          Some of their policies are actually quite sensible often with something bizarre tacked on at the end to make it a bit more loony.

          Any MP who’s constituency sells off a school playing field to developers will be required to relinquish his/her own back garden as a replacement sports facility for the school.

          The Loonies propose that a minimum requirement of Maths ‘O’ Level be made for all government ministers and their treasury advisers, thereby preventing two different rates of inflation when used to calculate raises in both state benefits and taxes.

          All third world debt will be cancelled. They’re not going to pay anyway. You know that. I know that. Don’t deny it.

          Some of the policies have been enacted and some become law such as:

          Passports for pets

          Abolition of dog licences

          Carnaby Street pedestrianisation

          All day opening of pubs

          Commercial Radio

          1. Adrian Jones

            Re: Floodplains

            Don't forget Votes for 18 year-olds.

            1. onceuponatime

              Re: Floodplains

              "And healthcare for Saxons and Normans."

        2. handleoclast

          Re: Floodplains

          How the hell does the monster raving loony party have a more sensible answer to this than the actual government?

          They frequently do have more sensible solution. To the extent that an idea which first appears in an MRLP manifesto is later adopted by one or more of the "sensible" parties.

          There are more ways of taking the piss out of conventional political parties than by standing candidates with silly names. Coming up with better policies is even funnier.

          1. Aladdin Sane

            Re: Floodplains

            Their current position on voting age is to lower it to 5 so that it matches the behaviour of MPs in debates.

        3. Martin an gof Silver badge

          Re: Floodplains

          How the hell does the monster raving loony party have a more sensible answer to this than the actual government?

          My 14-year-old got really interested in the last general election, having seen a party-political for the MRLP, and not just for the really odd policies like re-introducing mermaids. Put it this way, he's politically aware enough that he's currently wearing this T-shirt, and while he did disappear for a couple of minutes while we were queueing at the supermarket this morning, he came back with a copy of Private Eye (link included for non-UK readers who might not be aware of this publication).

          There's hope for the future yet...

          M.

          1. MyffyW Silver badge

            Re: Floodplains

            @Martin_an_gof your offspring should be congratulated on their choice of mid-20th century statesman.

            1. Martin an gof Silver badge

              Re: Floodplains

              your offspring should be congratulated

              Well, given that we live within the Aneurin Bevan Health Board not too far away from that man's constituency, and that said 14 year-old spent quite a lot of time in hospital as a baby, he is acutely aware that he probably wouldn't have had any younger siblings if we had been paying off for all that treatment. Either that, or he wouldn't have been as fit as he is now.

              M.

          2. rskurat

            Re: Floodplains

            Private Eye has a small but rabidly loyal following here in the US (not expats), more then 94 subscribers!

        4. imanidiot Silver badge

          Re: Floodplains

          Lord Buckethead doesn't but maybe amanfrommars does?

  4. Anonymous Coward
    Anonymous Coward

    Sometimes, there are ways round it.

    The Highways Agency have an IT location on a flood plain next to the M1 which is above them on an embankment. They know where they are and have built a moat around it and some very powerful pumps. In the event of a flood, the pumps engage and drain the moat by flinging the water OVER the M1 to the other side of the embankment.

    I gather the test was spectacular.

    1. Anonymous Coward
      Anonymous Coward

      Re: Sometimes, there are ways round it.

      I hope the pumps can still function if mains electric power goes out.

    2. Martin an gof Silver badge

      Re: Sometimes, there are ways round it.

      The Highways Agency have an IT location on a flood plain

      I heard a radio programme a couple of years ago on the new National Archives building in London, right next to the Thames. If I remember correctly, they put their IT in the basement because they would much rather the IT department (which presumably has decent off-site backups) flooded, than they ended up with a lot of soggy 1,000 year-old parchment.

      M.

      1. David Hall 1

        Re: Sometimes, there are ways round it.

        The Thames is pretty much a safe bet for building a DC next to because Thames barrier etc.

        Still a waste of nice river real estate!

        IBM have had their DC on the south bank for at least half a century and whilst it's pretty crap - I don't believe it's been wetted yet !

        1. TRT Silver badge

          Re: Sometimes, there are ways round it.

          I can think of a certain basement area next to the Thames that's been flooded in the last 20 years. But that was down to a large water main running parallel to the river cracking open and the water finding its way through the ancient, long since covered and built upon, tributaries of the Old Father.

          I don't know if it houses a DC or not; I suspect it does.

  5. Nick Kew

    Is it Friday already?

    ... Or has El Reg expanded its entertaining anecdotes beyond Friday (and the new Monday slot) to all-week?

    1. TRT Silver badge

      Re: Is it Friday already?

      Disaster can strike at any time. It just always seems like it's Friday when shit happens, just to take the shine off your weekend.

      1. Anonymous Coward
        Anonymous Coward

        Re: Is it Friday already?

        Nobody wants a disaster on a Friday, that would ruin weekend plans and the night out.

        In many shops you can only invoke the disaster process (if indeed you have one) with senior management approval.

        Some golf courses don't get very good signal on mobile....

        1. Anonymous Coward
          Anonymous Coward

          Re: Is it Friday already?

          "In many shops you can only invoke the disaster process (if indeed you have one) with senior management approval."

          The same goes for Academia, one University with a leading computer science dept finally moved it's below ground and often flooded data centre to an off-site location, has had a £175,000,000 fire due to an UPS problem (wheeling it in and simply plugging in a 300KVA UPS without checking the battery bolts is simply stupid) and has installed a total VOIP phone 'solution', in the event of power failure there are NO communications. Then during the recent bad weather the COO on a £250,000 salary followed Police advice stayed home as did the director of H&S, leaving it up to a former Household Cavalry officer and deputy VC, the VC was 'unavailable', to make the decision to close and send people home. He typical to type decided impassible roads, no buses etc was no reason to shut the site, they ended up bedding down stranded staff and students in the library!

          Those of us that also followed Police advice, like the COO and H&S director, not to travel also lost a days pay or holiday as we can't tele-work like they claim to have done...

  6. Anonymous Coward
    Anonymous Coward

    All your eggs in one voip

    Reminds me of the time when a satellite office network failed.

    Raise a help desk ticket? No, network is down.

    Phone help desk? No, the phones were voice-over-IP.

    Mobile phone help desk? Anyone know the real number? It's on the intran... Oh.

    Someone had the page open, so rang it. To find external calls were barred.

    Eventually solved via a personal call to a colleague's mobile at the main site who scurried down to the depths of the IT cave and caressed the offending router.

    1. Korev Silver badge
      Coat

      Re: All your eggs in one voip

      Reminds me of the time when a satellite office network failed.

      Well, it's very hard to string fibre up into orbit.

      Mine's the spacesuit -->

    2. Anonymous Coward
      Anonymous Coward

      Re: All your eggs in one voip

      As a telcoms guy...it's the network...it's always the bloody network!

      Phone system 4 years uptime....Network?....Well hopefully we'll make it to the end of the week before they f**k it up...again.

      1. Old Used Programmer

        Re: All your eggs in one voip

        What with living in Earthquake Country (SF Bay Area) that's why we have a POTS line separate from the cable system broadband. If my spaceship ever comes in and I can move to where I'd like to live and in a style to which I'd like to become accustomed, I'd get a dual-WAN router and have broadband from *both* phone company and cable company.

    3. Anonymous Coward
      Anonymous Coward

      Re: All your eggs in one voip

      There has got to be a joke in caressing a router within a cave, I just can't put my finger on it...

      1. onefang

        Re: All your eggs in one voip

        "There has got to be a joke in caressing a router within a cave, I just can't put my finger on it..."

        You're not caressing it right. Use all your fingers, but very gently.

        1. Mark 85

          Re: All your eggs in one voip

          "There has got to be a joke in caressing a router within a cave, I just can't put my finger on it..."

          You're not caressing it right. Use all your fingers, but very gently.

          And say "I love you"....and maybe promise it something.

  7. Stoneshop
    FAIL

    It was my third day as the new network manager

    You took this job without inquiring into such niggly little things like disaster scenarios and such BEFOREHAND

  8. Chairman of the Bored

    My one win over beancounters

    In gov't service...

    Whole rack full of Best(TM) UPS units with failed lead-acid batteries inside. Spent over one year fighting beancounters over purchasing replacements; the beancounters kept using "OMG!! They contain lead! Panic immediately! Oh, dear - the Californicators will all die of lead poisoning!!" As their excuse for inaction. Power failures and lost data? Oh, heck yeah. Did the multiple system failures help with the purchase? An emphatic "no".

    So what I did was work with the vendor to create a new part number for something called a "self-contained DC power supply". Turns out that anything flagged as a battery is on the USA's "No buy" list. But SCDCPS? Good to go! That's how I became - at least until my next screwup - hero of the team.

    1. Chloe Cresswell Silver badge

      Re: My one win over beancounters

      We had some servers on the landlord's UPSen in the cabinets. We slowly got our management to mirgate to our own UPS.

      That was 2 years ago. The big landlord owned UPSen are still to this day showing failed battery, and the only work carried out has been to turn off the failed battery alarm...

    2. Anonymous Coward
      Anonymous Coward

      Re: My one win over beancounters

      "self-contained DC power supply". Turns out that anything flagged as a battery is on the USA's "No buy" list. But SCDCPS? Good to go!

      Always good to be creative. At a previous job, internal policy was that books had to be ordered via, and borrowed from, the central company library (which was at the other side of the country). Manuals, on the other hand, just required a simple PO. We bought lots of programming manuals, database manuals,...

      I also remember someone who, following a major lightning strike, facetiously filled in a damage from with "cause: EMP", and then to general surprise it was discovered that they were insured against EMP but not against lightning strikes.

      1. Chairman of the Bored

        Re: My one win over beancounters

        Insured against EMP? Nice!! I've got to go check to see if that applies. Be good to know.

        For whatever it's worth my homeowner's policy specifically excludes damage caused by nuclear war. Some how I think my homeowner's cover is the least of my worries at that point.

        1. Chris G

          Re: My one win over beancounters

          Nuclear war exemptions have been standard in many British policies since the '60s I worked for a short time in car insurance, Saturday mornings we had to man the phones as the switchboard girls were off, because we were speaking to potential clients we had to have read the T&Cs on the policy.

    3. phuzz Silver badge

      Re: My one win over beancounters

      Our big Eaton UPS is complaining about the batteries being old, so soon enough I'm going to have to replace them (fortunately my boss has already given me the go-ahead to spend money).

      Theoretically I can put the UPS in bypass mode and just swap the batteries.

      Practically however I'm expecting the entire UPS to shit the bed and take down all the power in the server room. Sods law, if I prepare by shutting everything down, then the swap will go perfectly.

      1. David Hall 1

        Re: My one win over beancounters

        I am very surprised if your string is all in series. If it is that's your big problem.

        If it's not which it almost certainly definitely isn't - just isolate each shelf and replace shelf at a time.

        All you lose is some run time but no need to put into bypass.

        Of course if you are relying on the register for your ppm you and your company are almost certainly screwed!

      2. The Oncoming Scorn Silver badge
        FAIL

        Re: My one win over beancounters

        Building shutdown & I was advised from on-high that there was no need for me to be on a remote site to do anything (Mistake number one) or indeed putting the very very large UPS into "Passthrough" before they did a controlled shutdown could be done (but wasn't) by the site contact, the fact this was a annual event left me assured that it would just be a routine event (Mistake number two - Never assume it makes a ASS out of U & ME).

        Sunday morning 3am messages from the Sysadmin team in India left on my desk phone regarding the servers not remotely waking up didn't get to me for some strange reason of me not sleeping at my normal desk Saturday night\Sunday morning.

        On arrival at the remote site Sunday morning & ringing into the bridge as the sound of silence from the servers was deafening, the building power was on but the surge had tripped the UPS breakers & each battery had to be checked by the third party techs, before we got the go ahead to bring up the UPS & then bring everything else back up.

      3. prodromos65

        Re: My one win over beancounters

        Putting it in bypass mode is fine. The only issue is that no where in the manuals does it state how to take it out of bypass mode. Once you've changed your batteries you perform a battery test and if it completes successfully it will put the UPS back into ONLINE mode.

        This also follows if you have an external bypass switch. After you've done maintenance on the UPS, switch from BYPASS to TEST, power up your UPS and put it into internal bypass. Switch the external switch to ONLINE and then do the battery test. The SMART UPS's need to monitor the load while in internal bypass to be able to supply the correct load when it goes to online. If you put it online and then switch the external switch back to online the UPS will fill its pants and go into overload alarm.

        We once sent back a perfectly good UPS because the person responsible did not know the procedure and assumed we had just installed a faulty unit (replacing the old one which had failed)

    4. Norman Nescio Silver badge

      Creative Purchase Orders

      When New Scientist was worth reading, there was an interesting set of miscellaneous stories which included creative purchase order descriptions.

      One laboratory got away with ordering a new Digital Signal Generator, for a non-trivial amount of money.

      .

      ..

      ...

      ....

      .....

      ......

      .......

      ........

      .........

      ..........

      It was a piano.

      1. handleoclast

        Re: Creative Purchase Orders

        At a place I worked, many years ago, a salutory tale was recounted to me. It was of a time when company beancounters decided to place severe restrictions on capital expenditure. A bit of a problem since the various departments had a lot of test equipment and the older stuff needed to be replaced once it could no longer be repaired and/or brought into calibration. Without working test gear the departments could no longer perform the stuff they needed to do in order to make money.

        Non-capital purchases, however, were fine. That year Maplin sold a lot more oscilloscope kits than usual. Factor in the time of a skilled technician or engineer to build the scope and it worked out more expensive, but it kept the beancounters happy (until they balanced the books at the end of the year).

        Then there was the time (same company) when the beancounters decided cost-centre accounting was the big thing. They'd long had cost-centre accounting, in that they kept track, but now all cost centres had to run at a profit. Including the mail room.

        Prior to this wonderful idea, the mail room delivered mail (much of it trade mags which people read to look for jobs as well as to learn of technical innovations) to the desk of each recipient. Afterwards, the mail for a department got dumped on a table and you had to look through it to find your own. You don't need to be familiar with Knuth's Sorting and Searching to realize how inefficient that was. But it was egalitarian: everyone, including the Chief Engineer of the department, had to search through the pile. Hooray! The mail department had cut its costs by sacking a couple of juniors who used to distribute mail. The extra costs imposed on every other department, however...

        1. JimC

          Re: the mail for a department got dumped on a table and you had to look through it to find your own.

          At the start of my IT career data entry to the major business systems was done by trained typists who could type quickly, easily and accurately. By the end of my career large quantities of that data entry was done by managers who could do none of those at a vastly greater hourly rate...

      2. Inventor of the Marmite Laser Silver badge

        Re: Creative Purchase Orders

        Made a note of that. I'm sure it'll strike a cord somewhere and stave off future problems, so long as I can be sharp about it.

      3. Stork Silver badge

        Re: Creative Purchase Orders

        I worked a place where a manager wanted to reward his team and bought a soft drink dispenser. To get it past accounting it was officially a colour printer.

        They only figured out when the costs of "ink" exploded.

  9. DontFeedTheTrolls
    Facepalm

    Priorities

    Once when recovering services without a plan we asked the business for their priority list. Top of the list was the Management Information System (MIS).

    Me: "Are you sure you want MIS back first?"

    Senior Manager: "Yes, its critical"

    Me: "Really?"

    Senior Manager: "Don't question me, its top priority!"

    Me: "So you want to be able to report that none of your staff are doing any work rather than not be able to report but know they are doing something?"

    Senior Manager: "Maybe Workflow should be first priority then"

    1. Phil O'Sophical Silver badge
      Coat

      Re: Priorities

      Senior Manager: "Maybe Workflow should be first priority then"

      I call BS, no real Senior Manager would admit he was wrong that quickly!

    2. Anonymous Coward
      Anonymous Coward

      Re: Priorities

      "Top of the list was the Management Information System (MIS)..."

      I once did a piece of work for A Large City's transport authority during A Large Sporting Event about five or six years ago. At the time both operational and analytical workloads were running on the same database instances. It was known that, on occasion, big table scans could cause weirdness for those on the operational end of the system, like, say, passengers.

      So the database administrator types put their heads together and decided to quietly pause all their MIS and reporting jobs for the two or three weeks of this particularly high profile event, lest the worst occur during this time of high load.

      "Did anyone notice?" says I, "You've got twenty people wrangling these reports. Must be important!"

      "Did they shite," says they.

  10. x 7

    more than just IT

    this essay gives a fair assessment of the status in Lancaster after the 2015 storm.

    https://www.lancaster.ac.uk/media/lancaster-university/content-assets/documents/blogs/lancaster-power-cuts-blog.pdf

    Its not just IT departments which have to think in terms of disaster survival, but rather the whole of society.

    We are all too reliant on technology with no backups

    1. x 7

      Re: more than just IT

      more about the Lancaster outage

      https://www.raeng.org.uk/publications/reports/living-without-electricity

      1. Anonymous South African Coward Bronze badge
        Thumb Up

        Re: more than just IT

        Interesting read, thanks!

  11. Anonymous Coward
    Anonymous Coward

    Murphy rules

    Those that forget that Murphy rules will be in for a hard time.

  12. Herbert Meyer

    In the old bank, in the vault

    I was a customer of a bank in suburban Chicago, and I went into the bank, a new building with lots of glass, and found the power was off, and the tellers were using pen and paper and calculators to process transactions.

    I went home, called my mother in law, who had just retired from the bank, and asked her why they did not have a UPS. She said they did, a large diesel, that was in a crate in the vault in the old bank building. It was too large to move through the basement hallway into its planned location. It fit the elevator and the UPS room.

    They later excavated a hole in the parking lot, cut the foundation wall, and put it in sideways.

    1. David Hall 1

      Re: In the old bank, in the vault

      A generator is not a UPS.

      A UPS provides carry through time whilst your gennie starts up and can carry the load.

      A generator is a generator.

      1. Anonymous Coward
        Anonymous Coward

        Re: In the old bank, in the vault

        I have been informed of a permanently running generator which acts as a UPS for a large metropolitan university used for keeping both IT systems and cryogenic experiments running in blackouts.

        I was then informed what powered this never ending device. A small test, nuclear reactor.... fun times to find you lived not 2 miles away from a reactor.

        1. ShadowDragon8685

          Re: In the old bank, in the vault

          Now THAT is how you ensure an uninterruptible power supply.

          "Power cannot, under any circumstances, be cut to this facility. Do you understand me? Under no circumstances is a power loss acceptable. If there is a power cut, the loss to society, the organization, and science itself will be incalculable. My head will be on the chop just before my boss's head itself gets chopped, and that will be immediately postceding me swinging the axe at your neck. Do you understand?"

          "What if there's a flood?"

          "No, absolutely unacceptable."

          "No, I mean a big, BIG flood."

          "Did you watch the movie Deep Impact?"

          "Aye."

          "Remember that gigantic tsunami that took out New York?"

          "I do."

          "The rest of our facilities will stand up to that. Make sure the power does, too."

          "What if there's a war?"

          "Unless someone is dropping cruise missiles directly on our heads, the rest of the facilities will hold. The power had damn well better, too, ESPECIALLY since national power grids are likely to be targets in war!"

          "Godzilla and Cloverfield go Sumo Wrestling through the township?"

          "Short of Cloverfield getting suplexed through the roof, the facility will stand. The power supply has to, too."

          "So, let me get this straight: not fire, flood nor famine or war can cut the power?"

          "Right."

          "You don't want to hear any 'Acts of God' clause stuff here, because you want a power supply that will stand up longer than the building itself."

          "Now you're catching on."

          "In other words, you need uninterrupted power right up to the Godzilla Threshold - basically that unless the power supply failure is not the reason your facilities' work has been terminated, the power supply has to stand up."

          "That's my requirements. No interruptions are acceptable."

          "Right. Well then, fuck it mate, just fuck it; we're gonna have to install a reactor."

          "Make it so."

  13. Allonymous Coward

    it shouldn't take months to sort out a new circuit board

    Hold on, didn't you say this was public sector?

    1. Anonymous Coward
      Anonymous Coward

      "it shouldn't take months to sort out a new circuit board"

      Not only the public sector, work in any company with enough bureaucracy, and things like that really takes months to procure - you need to find the approved vendors, ask for a quote, have to quote approved, than activate the purchase, wait for n approvals, wait for managers trying to offload it to someone else's budget, meanwhile procurement people change, the new ones find something wrong in one of the approvals, send them back, the one assigned to the task delivers a baby, nobody else takes responsibility for her task while she's away, you find someone who knows someone working there and have a debt with you, she calls her friend which in some "oblique" way approve the purchase, just the quote is no longer valid because too much time had passed...

      1. The Oncoming Scorn Silver badge
        Mushroom

        Re: "it shouldn't take months to sort out a new circuit board"

        IIRC Somerset County Council while I was there, required the generator to come on after a outage - It didn't.

        Remedial replacing\refilling fuel or work was carried out & if I recall when they did a test, it then caught fire.

        1. Allan George Dyer
          Coat

          Re: "it shouldn't take months to sort out a new circuit board"

          @The Oncoming Scorn - "when they did a test, it then caught fire"

          So it worked according to spec... they ordered an "Emergency Generator", and, when tested, it generated an emergency.

      2. Anonymous Coward
        Anonymous Coward

        Re: "it shouldn't take months to sort out a new circuit board"

        As someone who has spent most of my life working for various sized organizations of both flavours it seems to me beyond doubt that the public / private bit makes hardly any difference compared to:

        a) how big the org is

        b) how long it has been around

        c) how many times it has been reorganized (including mergers & acquisitions)

    2. Anonymous Coward
      Anonymous Coward

      It took six months to get a piece of customer hardware repaired due to a series of cockups between supplier, finance, and admin. Actual time to repair once it was all signed off : two weeks.

  14. tfewster
    Pint

    Yay, "This Damn War" has returned to life and joins the BOFH. Now we just need "DPM's Diary" to complete the unholy trio.

  15. JJKing
    Facepalm

    Resistance to have a reliable backup.

    I work adhoc for a guy who calls me in on network & server issues. Went to a site he had for 10 months that he didn't have the Domain Administrator password for. Booted of the magic USB stick, changed the password and then looked at the setup. The users were using Outlook and the previous tech were storing their PSTs on the Server. They had some very nice backup software but the backup machine was a 10 year old PC with a single 4TB HDD and no UPS and the power had failed and the backup had been down for ............. 10 months.

    It took me 5 months to convince the payer of my invoices that this was not a good backup system coz every time the power went off the backup machine stopped and there was nobody with an clues in that office to know how to press the power button. When he finally went to the owner and suggested installing a NAS with a UPS, the owner complained that it was going to cost too much and he never had to pay so much for the previous tech guys. First we had a copy of the previous company's invoices when we were trying to recover corrupted PST files on the server. Second, this business was purchased for $1,500,000 and this dick was complaining about $700 for a NAS with hardware redundancy and a UPS.

    Similar situation at another client. His DOS based leasing software, dated from 1996, could no longer be reinstalled because the original programmer was no longer on the planet and they couldn't generate and activation key for it. Finally upgraded their 9 year old workstation to a real Server, NAS, UPS and an offsite backup. They got hit with Mr Crypto Virus but due to me cocking up the NAS permissions (I was still configuring it all remotely so had done a manual backup the previous night), crypto couldn't touch them and I had them back up and running in 4.5 hours with 2 hours of manual data to re-enter. This person complained all the while I was setting up this small network but 3 months later I received and email that said, "We dodged a bullet there, didn't we." He finally realised the importance of the backup and the minimal cost compared to what it could have cost. The NAS and HDDs cost less than $500 plus $38 for a new UPS battery for the NAS. Just a blip in the great game of life for his $1.4 Million turnover.

  16. GrumpenKraut
    Devil

    ...not prepared to pay for a single week of handover time.

    Uh, oh. Had this when I left a job, and there was actually a one month overlap with the guy supposed to take over. Every attempt from my side to do any sort of meaningful handover was blocked. It was "we have more important things to do", over and over, like a broken record.

    When in my new job I received an angry email from my old boss because no-one had clue even what was running where.

    I took extra time and care with the wording of my utterly polite reply. --->

    1. The Oncoming Scorn Silver badge
      Holmes

      Re: ...not prepared to pay for a single week of handover time.

      Never had to use this fortunately but "Your right to ask me anything about your systems & procedures ended once I left the company" is my prepared response.

    2. Anonymous Coward
      Anonymous Coward

      Re: ...not prepared to pay for a single week of handover time.

      The extent to which fools are prepared to shoot themselves in the foot is always depressing. I left a large employer at the beginning of a large reorg, which they were planning to follow with a platform migration. One of the reasons I left was because I was stressed to the roof as a result of being sidelined by fools. They couldn't work out who they wanted me to hand everything over to so I spent the last month doing almost nothing but going through system and DR documentation trying to think of everything and dot every single i and cross every single t on what I thought wasn't bad anyway. Three months after I left I got a enquiry to contract for a company who wanted someone to document all the systems at an unnamed organisation with remarkably similar technology in order to facilitate a platform migration.

      We soon established, yes, we were talking about the same place. OK, I said, no problem, have they given you all the system and DR docs, I reckon I can give you everything you need in a couple of days from that working at home. There was silence. Not entirely to my surprise the sideliners decided to veto me as a contractor, but I hope whoever got the short straw did at least get to see all the documentation. It seemed, incidentally, to take them about 5 years to complete a platform migration I reckoned could have been done in 18 months.

  17. Old Used Programmer

    Disasters I have seen (or seen dodged)

    Went through two actual data center disasters at one company in San Francisco.

    The first was when a construction project across the street drilled into a 16-inch gas main. Gas got sucked into our buildings air intakes and we had an explosive concentration of natural gas in the machine room. It also transpired that residual oils from the gas compression pumps tended to get into the lines...and the oils were contaminated with PCBs.

    The second incident was when a water line separated in the intake side of the water distribution unit that supplied the water-cooled IBM mainframes. The mainframe was on the 14th floor. The feed line was 1.5" soldered copper. It was being fed from a chiller and 10,000 gallon holding tank on the roof...of a 45 story building. (That's right: 30 stories of pressure head.) The shut off valves were under the false flooring and no one had ever told the machine operators where they were. The return drain lines ran above the false ceiling on the 13th floor. Some of the water went around the drain lines and the rest ran into the stairwell where it got into the power ducting for half the building, blowing out a 2 story high bus bar (a replacement was sent by air freight from Chicago...the closest place one could be located on short notice). Most of our offices were on the 13th floor and got soaked. There was water damage for 5 floors below us. The cause turned out to be a manufacturing defect in the distribution unit, so that company's insurance got all the bills.

    At another company, there were UPS for the systems, backed up by a diesel driven generator. The company did a quarterly test in which they picked a suitable Saturday and pulled the power to verify that (a) the UPS would pick up and carry the load, and (b) the engine would kick in and the generator would take over the load before the batteries went flat.

    1. Norman Nescio Silver badge

      Re: Disasters I have seen (or seen dodged)

      At another company, there were UPS for the systems, backed up by a diesel driven generator. The company did a quarterly test in which they picked a suitable Saturday and pulled the power to verify that (a) the UPS would pick up and carry the load, and (b) the engine would kick in and the generator would take over the load before the batteries went flat.

      Ahh yes. I may have told this story before...similar set-up UPS, backed up by diesel generators. Local substation servicing said data-centre (only) decided to blow a phase on Saturday morning. UPS performed flawlessly. Diesel generators started up with no problem. All looking nice and dandy, until electricity supply company people talk to the operations manager and say how long the repair of the substation was going to take, working flat out.

      The operations manager blanched. He knew (a) the capacity of the diesel tank and (b) how many gallons per minute the generators used. It meant the tanks would be making dry sucking sounds long before the substation was back. So he gets on the phone to his diesel supplier, looking for an emergency delivery. It turned out that such deliveries were not instant, and there was a gap that needed to be bridged.

      The method arrived at through desperation was to get hold of a 44 gallon drum and put it upright on the back seat of his Alfasud*. We then hared off to the nearest petrol station and filled it with diesel, and hared back, then siphoned the contents into the generator tank. The round trip time meant we could just keep up with the generator fuel consumption. After the first couple of trips, other members of the operations team turned up, who were deputed to keep doing this until there was enough diesel in the generator tanks to hold over to the expected delivery, plus a reasonable margin.

      As I wasn't actually an Ops team member, and there being no crisis for the bits of IT equipment I was responsible for (the Ops team were more than capable of shutting stuff down in a controlled manner, if necessary, and restart it in a controlled manner too.), I could wander off back home. When I came back on Monday, the generators were still running, a large diesel delivery had arrived as promised, and 'all' we had to do was wait for the substation to be handed over so we could go back to mains supply. This was done outside normal working hours in case of problems with the switch back. (There weren't any).

      Not long afterwards, the diesel tanks were substantially increased in size.

      I don't think he ever got the smell of diesel out of the Alfasud.

      *This happened a long time ago.

      1. tony trolle

        Re: Disasters I have seen (or seen dodged)

        At less he knew, I was at a site for a DR test and fuel ran out. Seems the two big generators had a smaller startup engine system each with its own tank. Guess which tanks was checked for fuel. The only reason the main tanks had a small amount of fuel was the delivery driver knew the difference of the tanks and the person ordering didn't.

  18. Anonymous Coward
    Anonymous Coward

    new site built directly on a flood plain with the IT hardware in the basement.

    I think they misunderstood what "planning for a disaster" meant.

    1. fredds

      Re: new site built directly on a flood plain with the IT hardware in the basement.

      Yes, a lot of discussion about the follies of building on flood plains in Australia at the moment.

  19. Matthew 3

    Reminded me of the tube's 'control room flooded with wet concrete' story from 2014

    You'd think it'd be a tale of months of disruption but, no, 24 hours later it was all fixed. I'd still like to read about how they did that.

    http://www.bbc.co.uk/news/uk-england-london-25873252

    1. BinkyTheMagicPaperclip Silver badge

      Re: Reminded me of the tube's 'control room flooded with wet concrete' story from 2014

      Read the comments. Concrete takes a while to set and you can include an inhibiting agent, in this case sugar - doesn't actually take a lot and with enough it'll stay liquid forever.

      After that it's a matter of a shovel, and a whole lot of cleaning.

  20. Denarius

    showing age again

    During the big NE Merkin powerfail in 1965 ? there were stories that the hospitals in NY had their backup generators started by mains powered electric motors. The more things appear to change, the less the stuffups do

  21. JeffyPoooh
    Pint

    Correct Risk Analysis regarding UPS

    Installing a UPS provides instantaneous back-up power *most* of the time, but not all of the time. Only a fool would assume that the UPS success rate is magically 100%. The percentage depends on the frequency of its use. If you have power failures every single day at 2:00pm, then your UPS will probably work correctly 99+% of the time. But if it's not been called into service for several years, then it's perhaps 50% or 75% when you need it.

    And that's the good news. The bad news is that your UPS itself may catch fire, or perhaps gently smolder with acid pouring out onto the floor. UPS themselves can CAUSE their own disasters. Our office building has been evacuated twice "for real" and both were caused by the UPS.

    A very wise man once posted the following about home UPS:

    You’re concerned about your family’s safety. So you get a guard dog. The dog costs a fortune. It immediately poops on the floor. Then it chews off the entire left side of your Bang and Olufson. It bites the postman’s fingers. It then sleeps through an actual burglary. And finally it eats one of your children.

    This is the UPS experience: If they’re not preoccupied with smoldering their lead acid batteries, then they’re busy buzzing and arcing. Then they blow an internal fuse on the output, and your Great American Novel is suddenly lost, again, for the third time. Then there’s an actually power failure (Yay!), so they turn on their patented 387 volt offset square wave, and your PC is instantly corrupted. Meanwhile battery acid squirts out onto the ceiling, again. Then, while you’re out trying to buy a replacement PC, the UPS catches fire and burns your house down.

    I’d happily pay $800 to not have one.

  22. sisk

    I left a few years later but the last I heard the company had spent several million pounds on a new site built directly on a flood plain with the IT hardware in the basement.

    Could be worse. When I started my IT career about a three lifetimes ago the shop I was in had sprinklers in the server room. Even being green to the gills that caught my attention right away. They were there for several years before someone finally convinced the bean counters that the risk of accidentally discharging sprinklers in a room with $4 million worth of IT hardware was worse than whatever the cost of a gas based fire suppression system was.

  23. Milton

    Beancounters and Managers

    Some interesting debate about whether beancounters or management are to blame which, with respect, is missing the key point.

    Yes, we need people to do the sums even in these spreadsheet days. Beancounters do have a role. But they should never, never, never, ever have senior management responsibilities.

    A beancounter is like an office cleaner and should be respected and paid for doing the job. But you absolutely do not let them make important decisions.

  24. Anonymous Coward
    Anonymous Coward

    Been There, Done that....

    I too have suffered a full-on power failure at our head office DC. A massive surge shortly before an outage in the local area blew a fuse on the UPS (which was recently services with a clean bill of health....) t this knackered whole system and the room went very dark and very, very quiet.... Back at my desk, it's fair to say a little bit of wee came out.....

    To bring us back to life, we threw the UPS in Bypass, fired up the Generator manually and ran the server room on Generator Power until the UPS was repaired. Despite recovering the core apps in an hour or so, getting all of our services back and a clear monitoring screen took all day and night - but we were still on the Generator

    Two days and 800 Litres of Diesel later, UPS was repaired and we were ready to cut back over to Mains Power, which was the single most nerve wracking experience of my professional life - was the UPS going to take the load, were all the 3 phases in sync?

    Because the process of going from Mains > UPS > Gene is fully automated, we effectively broke that chain and had to reverse back in a totally un-tested scenario. But..it was either that or shut everything down gracefully and fail over the power (my preference at the time). For a 24/7 operation, the business took the risk and we got away with it after some buttock clenching moments

    Funnily enough, shortly after that episode my Capex Budget request for a new UPS was miraculously approved....

  25. I Am Spartacus
    Thumb Up

    This is fascinating

    I have had three DR situations in my DC life:

    1) A planned DR, where the main European DC was shutdown by flipping the main incoming power, forcing the UPS and standby Gen offline, bolting the doors, and shutting down all phones. "This building is on fire and fill burn down. You are all dead. All tapes are destroyed. All documentation is destroyed. Now, lets see if your DR procedures work". I was at the standby site, where we partitioned the mainframes, and cleared a load of disk space. Meanwhile, people hire vans and went to the offsite tape store, someone got a list of emergency contact numbers and started telling people to go to the airport, whilst someone else went to Schipol and found out that if you have a big enough credit on your Amex Black card then sir, that will do nicely, and a chartered jet is available at gate 27 in 1 hour. We had Europe up and running in 32 hours - target was 48. Back in the 80's this was a success.

    2) 2 years later, a faulty bus bar in that backup DC arc'd and took out a meter of power distribution. UPS was fine, it took up the load. Standby generator kicked in, and we were all fine. Until we found out that the diesel had waxed, and the wax was now in the cylinder heads. The generator died. We had to hire a standby gen for a month whilst the busbar was fixed. Lesson learnt, and we drained the diesel tank once per year after that. It was an oil company, so you think that they might have known diesel waxed!

    3) Wind forward a few years. I get a call from the CIO saying that the computer room has flooded, and could I drive 50 miles to oversee what was happening (he, of course, was unavailable). My panic level went in to the red. It was only getting in to he car that I thought: "hang on, the machine room is on the 4th floor - flooded HOW?". It seems we had water fox fire suppressant and the pressurised pipes had failed, sending jets of water in to the Sequent, the AS/400, a couple of Vaxes and a Tandem. IBM had a team on standby for exactly this situation and got the AS/400 back quickly. The Sequent needed a couple of disks replacing, but no big deal. The Vaxs took a little longer (Digital were not as good as Big Blue on this occasion). Oh, the Tandem? despite the power being out, despite there being water 3 inches deep in the machine room, the Tandem kept processing card payments, without stopping, glitching, or even noticing!

  26. BinkyTheMagicPaperclip Silver badge

    Disasters aren't as fun these days

    I'll leave aside the fact that all the disasters here have been due to internal power issues, which leads to issues pretty much the same as older disasters.

    However, a total power failure should theoretically do nothing other than the lights going out and monitors going off. All user machines are laptops, infrastructure should be UPS based. All should be fine - at least until the UPSes run out..

    It was much more fun eighteen years ago when the lights go out, there's the sound of twenty hard disks simultaneously spinning down, and then utter silence apart from a plaintive beep from the UPSes in the machine room. Outside of the window is a guy in a JCB looking worried, and claiming that 'those utility cables should have been buried more deeply'.

    (plug in genny, some work possible on laptops, get sparky out quickly to hook up power as JCB has only taken out one of the three phases. Fix broken cable)

  27. Androgynous Cow Herd

    A UPS in bypass mode

    is not a UPS. It is just a PS.

    Or perhaps...a PoS.

  28. Stuart Castle Silver badge

    Potentially good for cooling..

    "I left a few years later but the last I heard the company had spent several million pounds on a new site built directly on a flood plain with the IT hardware in the basement."

    I work for a company that (before I worked for them) spent a lot of money building a server room in the leaky basement of a mid 60s office building on the banks of the Thames. Apparently the basement flooded regularly. Thankfully they'd had the sense to move the main server room to a ground floor office in another building. The company's various buidings were already networked, so this wasn't as much of a hassle as it could have been.

    On the plus side, given adequate waterproofing, a flooding server room would be good from a cooling point of view.. :D

  29. KatieHeath

    A nice read. We work with this Riello UPS maintenance company: https://www.tpm-ups.com/riello-ups-maintenance/ all the time not only to make sure that our system is at top condition. Since this is crucial for emergencies, it’s important that it can serve its purpose without fail. Getting a good UPS will be useless if not used at its fullest.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like