Re: The Guardian is spineless
Why? So somebody who for all you know had literally nothing to do with this mess gets their name muddied for the rest of their career?
Piss off, your blog is shit and you're a mean-spirited vindictive cunt.
A serious error committed by an "inexperienced operative" caused the IT meltdown which crippled the RBS banks last week, a source familiar with the matter has told The Register. Job adverts show that at least some of the team responsible for the blunder were recruited earlier this year in India following IT job cuts at RBS in …
Or, perhaps, another way to look at the CV is that it shows how mainframe batch support has been largely shifted to India and that outsourcing firm Infosys had a central role in that process, contrary to the public assurances of the CEO of a virtually-nationalised basket-case bank? Never mind the huge inconvenience and cost to millions of people as they were unable to get to their own money and the thousands of staff sacked in the UK. You're right though about this being about more than one technical person though, and that wasn't the point of what I posted, so I've deleted his name. You're welcome to your own opinion. Obviously I don't agree with you.
... is not a good enough reason for this fiasco.
Consider: "a lone terrorist hacked the system / posed as an employee / whatever and caused a banking crisis affecting nearly 20m people for nearly 2 weeks"
Even if the poor bugger had done it ON PURPOSE there is NO EXCUSE for what happened next. It is a system failure on every conceivable level and therefore the ultimate blame must absolutely lie with management.
'CA Technologies - the makers of the CA-7 software at the heart of the snarl-up - are helping RBS to fix the disaster that has affected 16.9 million UK bank accounts.
"RBS is a valued CA Technologies customer, we are offering all assistance possible to help them resolve their technical issues," a spokeswoman told The Register.'
ie: RBS are paying the over time, oh wait that's wrong, the tax payers are paying the over time.
Well, this sounds like it should be of concern to CA's other customers. After all, if the problem is merely highly unique, who knows who else it might affect... some company to whom the issue is only mildly unique, perhaps? It is very telling that they didn't say that it was uniquely unique to RBS!
If this is proven to be down to inexperienced offshore IT then I think all western goverments should be very very concerned/alarmed. I guess that it would be very easy for any terrorist organsiation to infiltrate Indian IT companies with an aim of co-ordinating similar attacks from within a UK/USA/other companies off-shored IT systems to bring down banks, insurance companies, supermarkets etc... Total fuck up with little expense. Any CTO's continuing to state that Indian offshore IT staff are bringing much needed experience (especially on the mainframe) should be sacked, or at least shot...
Really, don't give the Sociopathic Sick Control Freaks in charge any ADDITIONAL ideas. Ten years ago, someone successfully served cheap blowback - successfully mainly because of rampant careerism and arse-covering inside the FBI, with possibly some interference run by the Ones Who Shall Not Be Named - and we have been oozing towards the Mister Moustache Situation at accelerating speed. Probably will have to back out of this using crowbars and projectile weapons, no restoring from tape here, nope.
I worked for a computer bureau that handled processing overnight of foreign exchange transactions for a number of overseas banks in the City. It was my first major job after leaving university.
There was team of three of us and a manager that looked after a horribly convoluted mass of mainly COBOL programs. The original authors had long since left the company and no one now really understood quite how everything fitted together.
Each night, the system ran to update everything with all the the deals done that day to produce the reports on which the banks would then base their deals the next day. We would take it in turns to be on overnight call to patch up the system when it went wrong. Hardly a week went by without one of us being woken up in the middle of the night by our beeper.
Anyway one night the system completely crapped out. Normally the problems were relatively trivial but on this occasion it was a major crisis. The banks went without their reports in the morning and millions of pounds were at risk. We'd tried everything. Even recovering the system back a whole 48 hours and trying to rerun everything didn't work. The strange thing was, we hadn't done any updates to the system recently.
By lunchtime the next day I was ready to give up. My manager promised me a big pay rise if I could sort it out. Finally over 24 hours after the original failure I tracked down the cause. One of the live programs referred to a file in the test environment, I'd recently done some tidying up in the test area and deleted that file...
I explained that the problem had been caused by a corrupt file, without going into details... everyone was very thankful that the system was running again, I got my pay rise for sorting out the problem and nobody ever discovered my guilty secret...
Anonymous for obvious reasons!
That job posting is for a batch administrator. Unless somebody is very very very stupid that job would NOT entail upgrading CA7.
So both could be true: batch operations could be entirely in India, while software support remains somewhere else.
Anonymous mainframe guy accountable for 40,676 MIPs of z/OS R1.10.
the last interaction was with some BT mahindra sysops.
We asked for a patchlevel output for a large sun server. We knew something was odd when the senior sysadmin (who supposedly built, tuned and managed the cluster) asked via email how you did this. We sent him the command and he sent us the response and closed the support ticket.
The response was NOT a list of installed patches... it was a one liner that this senior sysadmin
for a business critical billing/reporting system was quite happy with.
Segmentation fault: coure dump ....
Lets just say SHF at the next project review meeting :-)
WOW, some of you have obviously never dealt with offshored/outsourced staff in India. As someone who regularly has and does, I can promise you that scenarios where the staff running a system don't have any clue about how the system works isn't unusual. The idea that any of them would have a clue where to start in recovering an application, let alone a system, is just laughable.
a junior operator might have been the root-cause, but to trash the database and all backups sounds more like half the batch had run when the backout decision was taken and the queue needed to be reset to work from restore. bitch all you like about ITIL, but a senior change manager would have taken the backout decision, not an operator.
Say what you like about Fred Goodwin, but this would not have happened on his watch..
.. and why was a software upgrade happening mid-week, if not to save on overtime costs for Sunday?
May have been discussed already but don't feel like crawling through shit flood of posts but why in the f__k for something this critical would you not have a staging system as completely identical to production as possible? %99 of all problems requiring a back out should have been caught in staging which allows you to avoid the $10 a day drone ever touching production systems. Their parents were right Baby Boomers and their shit executive leadership were going to screw over the western standard of living.
Ok %99 is an exaggeration but the vast majority. Their should have been virtually zero surprises upgrading to new software as it should have been done at least twice already (on a development system with any old foo data (but a significant amount) and it should have been done on staging with somewhat real data and identical to production hardware. Through QA should have been done and then of course the move to production should have been lead by a very senior techy. Very basic stuff that they knew to do in the 1960s. Its what happens when you have someone who should be asking you want fries with that instead saying I know how to make glorious software.
Spot on with the message but a bit near the mark on the delivery...
As one of the unaffected NatWest customers I was impressed with the amount of effort they have made to personally deal with the issues at a local branch level with the extended opening hours etc.
But the fact remains, in any IT support environment change control and QA should have prevented this from happening. Even then where is the DR plan? and the backup to allow a seem less rollback process?
Stinks of cost cutting (read: skills) outsourcing to me.
if this were the first time a large company's IT went to pot in a push for outsourcing and down-sizing of it's IT functions, then I might have some sympathy for RBS, but it's not.
10 years ago Zurich (UK) under CIO Ian Marshall put through a whole raft of staff redundancies as part of the push to an outsourced and down sized environment. They ended up with a number of unsupported systems, and a number of poorly supported systems as a result.
Every year sees at least one other company bitten in the ass by following the outsource everything magic wand hogwash pushed by the "management consultancies".
Outsourcing has it's uses, but across the board outsourcing of critical business functions, comes under the heading of dumb idea.
Who cares? The C Suite are a bunch of sociopaths in it for short term gain only for themselves. If they screw up they get a golden parachute and their buddies move them into another opportunity. If they don't they leave anyway in a few years tops to pursue other opportunities (ie. run away from screw ups).
Yeah, McKinsey those shower of useless cunts managed to fool another sucker executive. Now people in the organisation go to Google and try to fix the problem themselves or try and live with it. Idiots in India will make it worse and then say that your computer needs to be reimaged.
I heard an interesting story ..... one of our guys was over in mumbai and was being showed around the call centre. It turns out that as soon as the offshoring company trains up people and they have had 6 months experience they start asking for 30% pay rise and if they dont get it they go to some other place looking for people trained on the product. The end result was that every time someone in the UK rings up looking for help they got some inexperienced person with <6 months on the job..... who ended up ballsing up their computer and getting it reimaged.
According to the FT RBS is considering taking legal action against CA. Unless CA were actually managing the change I dont see how this would work.
Being a cynic I would say that the RBS PR machine is again at work trying to throw some mud around in the hope that some of it sticks.
Of course if they do it really will be a "drains up" moment for RBS if they press, so not a bad thing. I suspect it is PR bluster.
http://www.ft.com/cms/s/0/b03dd574-bf8e-11e1-a476-00144feabdc0.html?ftcamp=published_links%2Frss%2Fcompanies%2Ffeed%2F%2Fproduct#axzz1yvaWQwnl
This post has been deleted by its author
Best of luck with that. Unless it is a problem with the software that CA knew about but had failed to document on their Customer Care site then RBS have not got a prayer of winning a court case, particularly if they heavily customise the software. I am pretty sure that the supplier will be able to point to a number of successful implementations of the product and will be able to show that if RBS/NatWest staff had actually read the install instructions together with the associated product manuals and checked for any known issues they would not have had the problem Maybe I am old fashioned but I thought this sort of due diligence was part of an IT technicians job.
This post has been deleted by its author
Suppose, hypothetically, that the patch/update was faulty. Suppose, hypothetically, that the person responsible used the "backout patch" function which had been used previously, tested previously, but for this particular patch revision was incorrectly implemented and actually damaged/destroyed the underlying database. Would a client be expected to test the patch backout function for every patch revision issued before using it?
We can all hypthetically guess at what had happned. Information supplied by a bunch of disgruntled RBS employees that can tell you what probably happened, curious IT types that still work at RBS and can use the incident search function and a handful of people that "where there man" . Somewhere though there is 1 person that can tell you what they thought should have happened. I want to see his post.
I fear for this person. Soon his CV will be flying across some recruitment agents desk. Next they will be sat next to you. Your "jobs a carrot" little support contract turns into a nightmare.
FYI I am not involved in anyway. I just have been dealing with this crap for years. Long may it continue and keep us in contract jobs, F40s and lambos, cheap women and tax avoidance schemes.
Carry on!
Yes, of course. Thats what Pre-Prod (UAT or whatever) Test is for. Test the deployment, check it works as designed, *and* test the backout. If the backout f*cks you up, you don't deploy the damned fix, purely because of the risk that you might need to do it in live. You can insure yourself against it (hot standby systems etc) - but ultimately it comes down to a balance of chance of it going tits-up, against consequences if it does. In this case, the consequences were pretty damned big.
Now, if a backout that was successfully tested against a live-like environment failed in live, either someone didn't follow the process - or the live-like environment wasn't fit for purpose.
In either case, its valid grounds for the el reg lynch mob to get their pitchforks out...
This is a classic piece of FUD and will be quietly dropped somewhere down the line. "Let's spread the shit as wide as you can and hope that it some of it sticks to someone else!"
This is like trying to sue the manufacturer of your screwdriver because you didn't do a screw up tight enough.
RBS suing CA?
They had better be very careful there. I once worked at a place that managed to upset CA.
I don't know how they managed that, maybe management was looking for an excuse, but CA's response was to cancel all licenses and refuse to supply the customer with any more.
We already had alternatives in place so it wasn't a big deal, but for RBS?
Some years ago I helped RBS work through a storage device failure which led to their having to recover a DB2 database. The outage lasted for several hours, ATMs were unavailable and they made the BBC news.
The failing device was fault-tolerant and yet a sequence of events which imho nobody on the planet could have predicted caused the unit to fail completely.
The bank's staff worked tirelessly and diligently over many days to work with the vendor to pinpoint the root cause and then replace modules in similar devices over a planned period.
These were highly-skilled IT professionals who had the knowledge to collaborate with the vendor to work through the problem
Fast forward to 2012. If this really is a case of inexperienced operators or systems programmers blundering with CA-7, a mature scheduling solution, then I agree that they have reaped what they have sown. Core banking systems are complex beasts and require more expertise than Java, MySQL and Linux.
You get what you pay for.
The FSA need to be checking all uk banks & building societies that have been outsourcing or offshoring core IT functions to check they're fit for purpose.
A number of outfits are watching this saying 'there but for the grace of god', as they watch the fuse burning down on their own unsupported legacy timebombs.
A couple more catastrofucks like this will do permanent harm to confidence in our retail banking system. FSA get a grip on the situation now.