Outsourcing - it works
You should all move your support to HPE - they work great.
Or IBM, TCS, Dell-NTT... so many to choose from.
Don't worry about the fact they are straight out of a production line Uni, given 'Computers for Dummies' then immediately let loose of the core of your business - the data and IT that is the lifeblood of what you do.
You get what you deserve when you do.
Was it the SAN, or was it a 3PAR storage array? Please PLEASE do not mix that up anymore... A Fileserver is not a LAN, and a storage array is nota SAN.
Screws turned the other way down under...
Remember... rightie tighty...lefty loosie...
Re: Never knew...
Offtopic but - the single most stupid rule for a rotational movement.
"the penalty clauses in their contract [assuming government was smart enough to put some in – Ed] come under active consideration"
You havent dealt too much with the contract writers for Government Projects, have you?
"whose hand rocked the cable"
Nice one, Simon.
Thanks - always nice to know people read deep and see the little fun bits we try to leave inside. Also tried to make that the headline but just couldn't get there.
Wow, does just reading the story mean I am reading "deep"
And "The hand that rocks the cable" sounds like a startup for a Friday roundup of mishaps anyway...
I did a big job of removing redundant cabling from a datacentre (without breaking anything) but we were doing it very slowly (i.e. not on a fixed time/cost basis)
Someone had to.
Why would one bit of displaced hardware/cable on one box render the whole SAN useless and require a massive outage and a replacement of the whole system?
Surely the whole point of a clustered storage system is that if a box fails, the rest carry on like normal - no single point of failure.
If it caused the system to shutdown, then surely it would be expected to do this in a controlled manor which didn't bork everything, or else it might be better to allow it to run without redundancy and just have a big flashing red light and hooter go off to let you know.
I'll tell you how I did that once. I had a backend of multiple oracle dbs running under Veritas Cluster Server on three Sun Micro T-2000 shiny boxen all nicely racked atop one and other. When getting ready for a maintenance operation on the hardware I went ahead and extended the servers on their rails so they would be in the aisle. Well, whoops! My spotty network wonks decided they could not fit any of the servers with proper sized Ethernet cables, so they all popped out before the servers reached the aisle since they were not as long as the bendy rack cable minder! The primary, secondary, and the VCS heartbeat connections, all gone! :P Good thing this was in a maintenance window and we could recover everything by stopping some services and putting in some properly long cables. Plus it did not damage the server's connectors, and this site was not a high traffic one, so not much visibility. As you can see, something simple and not cared for, and taken for granted caught us out.
Reading the article
The outage happened while trying to move the device while still in production. Doing things like this "In flight" increases the risk a LOT. I do not know of a QA lab anywhere that tests for this ability. If they were intended to be moved in flight, arrays would come with (redundant) sets of wheels to accomplish this. While a black eye for 3PAR, I actually have some sympathy now that I didn't have previously for them.
Yes we've all done something stupid in datacenter at least once in our lives. That's how you learn. Like i learnt the hard way to make sure rack stabilisers are installed, why you never ever install more than one node of a cluster in the same rack if you can avoid it, and why PDU's with circuit breakers can cause unintended consequences.
It is also why you always have someone 'experienced' work on the truly important kit, guiding the younger staff on the pitfalls and gotcha's. Unfortunately these days that is incompatible with outsourcing where it's all about driving the lowest cost.
I can only agree. I once shut down the nearline storage of a bank, one half of an airline booking system and a newspaper Storage Array. If you wonder - I was at fault once, then there was bad firmware and one logistical error (wrong part in box - so essentially also my fault).
Never the less - in each Case redundant systems kicked in, customer staff was trained and experienced to anticipate and handle such incidents.
Back then, companies recruited to find the best staff and then continued to train them. Nowadays companies are struggling to find staff just cheap enough to keep the lights on. If anything out of the ordinary happens they're screwed. The cloud won't help the situation from a technical perspective.
It put an abstraction layer between management and technology so they can distance themselves from technical faults.
<quote>It put an abstraction layer between management and technology so they (management) can distance themselves from
technical faults their stupid decisions.</quote>
I think this article was posted in the incorrect section. Wasn't it supposed to be On-Call?