back to article HPE SAN causes four-day outage at Australian Tax Office

HPE's crack repair squad has laboured for over four days to replace kit at Australia's Taxation Office, with no guarantee that the Office's online services would be back online come Monday. HPE and the Australian Taxation Office's (ATO's) troubles started in mid-December 2016, when some ATO services went offline The ATO named …

  1. Anonymous Coward
    Anonymous Coward

    HPE Again?

    HPE == Huge Problems Everywhere/Everytime

    Why do people fall for this shell of once a great company to work for and do business with?

    HPE + CSC == Customers Shafted Everytime

    Not that the alternatives are much better.

    1. Anonymous Coward
      Anonymous Coward

      Re: HPE Again?

      Do not blame HPE. Blame SAN as an idea.

      If you centralize all of your backend storage on an array, like it or not you introduce a single point of failure in the system.

      So when it fails, all of your data fails with it.

      1. Lord Elpuss Silver badge

        Re: HPE Again?

        You do understand what an array is, yes?

        1. David Dawson

          Re: HPE Again?

          You do understand what an array is, yes?

          -----

          A highly available set of hardware that centralises storage onto a single platform.

          It may be clustered and HA, but it means you've traded isolation for efficiency all the same.

          Each server having its own storage means that you wont get this type of systemic failure. You'll have other problems, sure, but certainly not this.

          1. Anonymous Coward
            Anonymous Coward

            Re: HPE Again?

            "Each server having its own storage means that you wont get this type of systemic failure. You'll have other problems, sure, but certainly not this."

            This is a pipe dream! so you put storage in each node but and lash them together with software! what happens when the software fails!

            There is an old saying. Good Fast Cheap, you can pick two and you won't get the third!

      2. FuzzyWuzzys
        Facepalm

        Re: HPE Again?

        "If you centralize all of your backend storage on an array, like it or not you introduce a single point of failure in the system."

        *ring* *ring* 1987 just called and they want their storage designs back!

    2. Anonymous Coward
      Anonymous Coward

      Optional

      I heard it was this:

      CSC == Cuts Some Corners (look at their new logo)

      Or from previous customers:

      CSC == Complete Shower of Cu**s

      Or:

      CSC == Collection of Small Companies

  2. TRT Silver badge

    sounds all strangely familiar.

    Just like the KCL outage.

  3. Anonymous Coward
    Anonymous Coward

    Since HPE...

    ...dumped 99% of its highly skilled ITSM people there has been nobody to help HPE sales and HPE customers configure resilient and recoverable systems using appropriate configurations and best practice processes.

  4. Mystereed

    yes - didn't SSP have an HP SAN go pop too?

    http://www.computerweekly.com/news/450303913/Insurance-brokers-count-cost-of-lost-business-as-SSP-SaaS-platform-outage-enters-second-week

    That one was originally said to have been caused by a power problem, but it went on for ages.

    Is the HP kit or Support an issue or is it because there are so many of them about that there is more chance of a high profile story cropping up? Bit like when a 'self-driving' electric car has a problem it seems to be more often than not a Tesla?

    1. Anonymous Coward
      Anonymous Coward

      Re: yes - didn't SSP have an HP SAN go pop too?

      I can confirm that HPE support is atrocious. I unfortunately manage an enterprise wth everything on 3par, and I think I've had one good support call since I've been here

  5. Korev Silver badge
    Alert

    "key impacted stakeholders", "the community" (twice). Sounds like we need an icon for management buzzword bingo.

    Look on the bright side, they forgot "lessons will be learned".

  6. Doctor Syntax Silver badge

    Look on the bright side, they forgot "lessons will be learned".

    Honesty.

  7. Anonymous Coward
    Anonymous Coward

    That's funny. I hear that HPE is under a gag order by ATO lawyers and isn't allowed to say anything about what happened (but ATO can say whatever they want) and I overheard that ATO's implementation and DR plans were - lets say lacking at best and that is ultimately what caused the extended outage.

    So perhaps you should check in with HPE and get their statement about what really happened and why there is a gag order.

    1. sanmigueelbeer
      IT Angle

      RE: DR Plan

      The different private corporations that ran ATO's IT systems never had a "DR plan".

      When I worked for one of them, I remembered a paper-based "DR exercise" that was entirely based around people sitting around a large round table shuffling "what ifs" scenario to one another. Everything was all theoretical. No one, including from the ATO, had the guts to pull down the power lever to a row of racks because there were people who were terrified their >10 year old servers may not survive.

      1. TRT Silver badge

        Re: RE: DR Plan

        @sanmigueelbeer

        Why do I now have this clip running through my head? :)

    2. Roland6 Silver badge

      re: "I overheard that ATO's implementation and DR plans were - lets say lacking at best and that is ultimately what caused the extended outage."

      Don't know if you posted before or after the article was updated, but the final quote directly supports what you heard:

      "Our focus will now turn to building system resilience to best ensure the stability of our services to the community," the Office says.

      Which would also suggest why they are having problems recovering the situation back to normal, as they are having to work on the production system, with little or no fall back, with what that implies...

      1. Mr Wrong

        Everything what I've heard about this case since it began in December looks like it's ATO who didn't have any real DR, who had screwed up architecture big time and didn't even have backups that would be restorable in less than one month time. And it's HP who gets contineous bashing because of those obvious end customer faults. They have extreme patience that there are no even leaks to media saying how this really looks like, their only comments that "problem is not of a kind that has any probability to happen anywhere else" suggest to me strongly that reason was on customer/architecture side, not in HW. My internal translator tells me that they wanted to say that no one else is stupid enough to do such things. I'm not going to say they are saints, finally it's their HW that failed, but (sorry for being obvious) there's no HW on Earth that never fails. If they thought that they have one, it only adds to their complete lack of any competence whatsoever

  8. Matt Bryant Silver badge
    WTF?

    Customers need to understand their kit before they implement it.

    The 3PARs are very good arrays, and - if you know what you're doing - you can just about guarantee that five-nines (99.999%) planned uptime. However, the biggest threat to uptime is designing a solution to a price-point rather than a required capability. I have seen salesmen remove redundancy form SAN proposals (and that's all the major vendors, not just HPE) to win the deal, hiding some disclosure in the proposal along the lines that "there is a very small chance that, in the event of several unusual circumstances, the system will fail". A shelf dying on a properly configured 3PAR should be an issue but not lose data. The failure to have a proper backup solution implies there is something very wrong in the ATO's IT department.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like