back to article Disk firmware can kill a whole cluster how exactly? Cisco explains

Cisco’s issued a Field Notice warning that its USC servers and hyperconverged HyperFlex kit could be brought low by disk drive firmware. The Notice says “A drive firmware issue on select Self-Encrypting Drives”. The Register counts 16 different disk SKUs that could cause problems on UCS servers and one for HyperFlex. Cisco …

  1. Anonymous Coward
    Anonymous Coward

    Who else?

    They don’t build the drives themselves so others are affected too. If we know one thing, it’s Cisco being among the first to post bug reports (see clocking issue with Intel Atom) and then all others sheepishly admit they have it too but tried to keep it secret.

    So who else is affected?

    1. jake Silver badge

      Re: Who else?

      They don't build the drive hardware, but the firmware is proprietary to cisco. I suspect that this particular problem isn't going to affect anybody else.

      1. aaaaaaaaaaaaaaaa

        Re: Who else?

        Correct... All these Big Companies buy normal enterprise ssd/sas disks BUT decide to modify the firmware so YOU cannot go out and buy them off the shelf at half the price..... Thanks Cisco, HPE and IBM and others...

        1. Anonymous Coward
          Anonymous Coward

          Re: Who else?

          The bug is an Intel FW bug and was fixed by Cisco. This article is garbage.

        2. baspax

          Re: Who else?

          Reason for custom firmware, or rather validated and certified firmware, is interoperability. No-one really cares if your desktop PC goes on the fritz and you lose your photo collection.

          If that happens to a corporation someone will be held liable, including public egg on face.

          What I find more interesting lately is that apparently Cisco is really testing the crap out of their component suppliers and finding all these bugs. Someone mentioned below that this is an Intel bug, so not only kudos to Cisco for finding it but more importantly, shouldn't Dell and HPE with their much larger server orgs be the one who catch it?

      2. Anonymous Coward
        Anonymous Coward

        Re: Who else?

        Nope, Intel FW. Same shit like with the Atom

        1. jake Silver badge
          Pint

          Re: Who else?

          I stand corrected. Mea culpa.

          Thanks, guys/gals ... this round's on me.

    2. bsdnazz

      Re: Who else?

      I think EMC had a similar problem a few months ago.

      1. Anonymous Coward
        Anonymous Coward

        Re: Who else?

        I had Dell/EMC come to me and replace SSD drives that were working fine (as far as I knew) with new ones about a month back because they said their testing was having issues with them. I guess this is why they test things.

    3. Anonymous Coward
      Anonymous Coward

      Re: Who else?

      It's an Intel SED SSD bug that's been out for a while but under NDA.

      Everyone is affected but as usual HPE and Dell tried to keep it secret. Cisco blew their cover by going public.

      1. Anonymous Coward
        Anonymous Coward

        Re: Who else?

        Everyone...unless you do encryption in SW and not in HW (SED), like VxRail/VSAN does

        1. Anonymous Coward
          Anonymous Coward

          Re: Who else?

          Yeah, VMWare will save you ROFLMAO

  2. Anonymous Coward
    Anonymous Coward

    SED HDD are nightmares waiting to happen. Luckily they are still 50% more expensive than regular drives so take up hasn't been that great.

    1. Christian Berger

      Yeah, particularly in a RAID situation where you'll end up with several independent blocks with connected cleartext. Essentially you could, for example, have the same cleartext encrypted with 2 ciphertexts in a very predictable manner. This is one of the things that could eventually become exploitable.

      1. Cynic_999

        "

        have the same cleartext encrypted with 2 ciphertexts

        "

        Except it is extremely difficult for anyone to get access to the ciphertext because it is never sent outside the HDD. And even if you know that you have 6 different copies of the same data, each encrypted with a different key, I'm not sure that it will make the task of decryption any more successful.

      2. Anonymous Coward
        Anonymous Coward

        "Yeah, particularly in a RAID situation where you'll end up with several independent blocks with connected cleartext."

        This is true for servers with local storage behind a RAID controller.

        This and all other major SDS/hyperconverged systems use either 2/3-way mirror or erasure coding. There is no disk level RAID at all involved, that's how they (well, some) can flip to NVMe without batting an eye.

    2. Cynic_999

      "

      Luckily they are still 50% more expensive than regular drives ...

      "

      No, they are about 10% more expensive - and in some cases the same price depending on supply & demand. You may well be using a SED yourself - unless the SED function is activated it behaves like a non-SED HDD so you wouldn't necessarily know.

      SEDs make it easier to comply with certain standards, and where you are holding sensitive information that requires that nobody can get access to your data-at-rest, it provides a faster system because there is no overhead of software encryption.

      Then when it comes to selling or disposing of old equipment, all data on a SED can be rendered unreadable in about 10 seconds while still leaving you with a serviceable HDD. Compare with many hours per TB required to securely wipe a conventional HDD.

  3. l8gravely

    I love how a disk firmware problem requires a UCS manager update!

    I love how a disk firmware issue requires an update of the entire UCS manager stack. And since I've recently gone through the pain of setting up a UCS HyperFlex cluster, I can tell you it will suck suck suck.

    Hyperconverged is supposed to make things easier, not harder. UCS HyperFlex is all the flexibility of UCS with all the complexity and pain tripled or even more. It sucks... don't go there.

    1. MrBoring

      Re: I love how a disk firmware problem requires a UCS manager update!

      Its all bollocks huh, people complained of too many different fw to track and monitor, so they create one big pack. A tiny update requires patching the whole system. It's the same as the Microsoft monthly rollups, i think its over a 1TB now for Win10/Server2016.

    2. Anonymous Coward
      Anonymous Coward

      Re: I love how a disk firmware problem requires a UCS manager update!

      I run a large UCS environment and we just deployed our second HX cluster. Not sure what you are talking about because it took less than an hour.

      But here’s the thing, we know what the fuck we are doing. We had the preinstall checklist prepped. We also don’t have issues with UCS but I know who has: when we acquired a company we also inherited their IT team. Their “server experts” were a wonderful bunch, no documentation, low skills but big talk, everything was Microsoft’s Vmware’s and Cisco’s fault, no two servers had the same config, no automation, no scripting skills, nothing.

      We have policies for everything, our service profiles are planned out and designed as part of a larger architecture, and we use Ansible to automate deployment. Those new guys did nothing but complain how “complex” everything is.

      My take on it is that people who do not know tech (but like to think they do) complain about their tools.

      1. Alistair
        Mushroom

        Re: I love how a disk firmware problem requires a UCS manager update!

        no two servers had the same config, no automation, no scripting skills, nothing.

        Oh GOD do I feel your pain. been there done that at least 4 times. At least once I had to advise my manager that there was no single circumstance on earth, in heaven or in hell that would get me working civilly with a certain individual. I remained employed.

        Icon relevant to what happens to those systems when I find them.

  4. Missing Semicolon Silver badge
    FAIL

    "low-write, long-idle-time workload"

    What, like most drives that are not nearline storage? Full of really important archived stuff?

    Ouch!

    1. Anonymous Coward
      Anonymous Coward

      Re: "low-write, long-idle-time workload"

      If you're putting your archival data on flash arrays, you're blowing waaaayyy too much money on your infrastructure.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like