Re: Problematic updates are normal?
God, I'd forgotten Terry used to work in a bank as a youngster, as in actually "do bank work" lol
I think banks were respectable back then though.
A serious error committed by an "inexperienced operative" caused the IT meltdown which crippled the RBS banks last week, a source familiar with the matter has told The Register. Job adverts show that at least some of the team responsible for the blunder were recruited earlier this year in India following IT job cuts at RBS in …
>"... the relatively routine task of backing out of an upgrade to the CA-7 tool. It is normal to find that a software update has caused a problem; IT staff expect to back out in such cases."
Seems reasonable. To expect an upgrade to one system that is interlinked with other strange old systems to go absolutely perfectly every time is naive; to have a mechanism to undo it or cancel it safely seems sensible. However in this case it seems that this procedure was either not idiot proof enough or the operator was having a bad day or a bit of both,
Don't know about RBS, but I've worked in other places in Banking, Government agencies and the Utility Sector.
Most large organizations will not authorize a change unless there is a fully specified back-out plan, together with evidence that the change to the live system has been tested somewhere safe first.
In some places I've been, the risk managers have wanted a "how to recover the service should the back-out plan fail" plan.
The RBS example is evidence of exactly why you have this level of paranoia, and why you spend more time writing up the change than the change itself takes, and why you sit in Change Boards convincing everybody that the change is safe.
Unfortunately, I'm sure that may of us here have complained about how much the process costs, how much time is wasted, and how quickly you could work if you didn't have this level of change control. I learned my lesson the hard way many years ago, and now follow whatever the processes are without complaining.
Maybe the higher management will learn some lessons from this as well. But I somehow doubt it.
I take it your a change manager justifying your job? heres what I got on the end of my fortnightly change management e-mail.
Quote of the Month
“No Change is without risk. Changes are managed to minimise the potential negative or unpredicted Impact and Risks of Changes on existing Services and to benefit both ???? and the Customer – ensuring the alignment of ???? IS to Business requirements and a standard approach is used to maintain the required balance between the need for Change and the Impact of Change” - Extract from the “exciting” new ???? Change Management Procedure V2 – published next month.
The word an@l and change manager seem to fit nicely.
"The word an@l and change manager seem to fit nicely."
So you've taken unjustified offence at some standard management bollocks appended to an email about change management. That doesn't alter the fact that well run projects often feature damn good project managers and change managers. I'm good at what I do, but that's not every detail of managing complex organisational or systems change, and I'm not arrogant enough to presume that on big complex projects I know it all (even as project manager on some of these). Luckily I have other people around me who reduce the risks of my carelessness, oversight or lack of time through their diligence, involvement in detail, and application of procedure.
But obviously you know it all, so why hide behind AC?
This post has been deleted by its author
Having worked in an RBS company, RBS have that level of paranoia, and it was a complete PITA to do any software releases to a live system.
So this raises the questions, did RBS GT not follow their own procedures?, or given the amount of hassle that's involved, did they try and short-cut the process? Or perhaps it all became just a form filling/box ticking exercise? I have experience of the latter...
RBS are (or still are I believe from my colleagues since I was off-shored) very much in to change management, as well asa any other red tape that can be put in the way of a techy doing their job. As to implementation plans and back out plans - yup they like those as well, and while wordy/complex are actually well laid out complete with back out stages and back out plans.
Of course with most the senior techies gone, along with most the other UK techies, the quality of those who write them, and proof read them, may have gone down considerably.
how often are those back-out plans and the how to recover the service should the back-out plan fail plans actually tested before the change takes place?
pretty much never because they're impossible to test for.
Generally they're just finger in the air style guesses. Yet they're still good enough to give change controllers a warm fuzzy feeling...
@Evan Essence
"Really?"
Yes. We are talking about CA here (and I have the scars), but the same principle should apply to all third party software. Even top quality products can break in your environment.
What nobody seems to have done yet is ask whether that CA software update was tested first in a non-production environment.
A proper test environment does not mean a machine with the bare essentials on it. You need to have a test environment which reflects the other products installed, naming conventions, data volumes, and in this case the number of jobs, that the production environment has.
Most commenters to my comment seem to be missing the point. Yes, updates can fail, and yes, you have change control procedures in place. But look at the article again. Are problems *normal*? I would say that means, by definition, that *at least* 50% of updates fail. Really?
I have seen the incident record from when this started (17/6) and it isn't an Indian name on the ticket for the backout procedure (not until the job got handed over at any rate).
An upgrade from v11.1 to v11.3 of the CA7 software went wrong though, that much is clear.
Not an Indian name in sight on the ticket?
So you have never had those phone calls from John, Mary, Peter and the like who by the accent of their voice have to be from somewhere like Mumbai, Bangalore, Kolkata or even Delhi.
Surely some them will have migrated to RBS Support by now.
(Strictly tongue in cheek naturally)
Considering the high profile nature of this incident I don't think I should reveal anything more, and I didn't say that there wasn't an Indian name in sight on the ticket, just not straight away when the backout went awry. Doesn't necessarily mean they weren't involved of course, but the only people who know 100% for sure are the people directly involved..i.e. the ticket doesn't indicate that anyone in India was past of the initial incident reponse - so beware of people saying they 'know'.
I got a call once from "Will Smith" who was clearly in an Indian call centre trying to flog me something.
I sympathised with him that Independence day was a shit awful film but hadnt realised his career had tanked so badly.
I seem to recall in Young Frankenstien they got the brain from Abby Normal, was she the name on the ticket?
This post has been deleted by its author
It's irrelevant who started what and when, nor who and how they initiated a rollback, nor where the servers where. Inexperienced outsourced staff were supervising the batch jobs. It would, or should, have been their job to raise the flag instead it seems they happily watched over a disaster happening until it was too late. Experienced staff would have been more on the ball and realised something was wrong earlier. Yes, things do go wrong, it's inevitable, however it's how they are handled that matters.
The questions to be answered should be when did the cock-up start , how long was it before someone raised an alarm and who was supervising the process when it went tits up. The latter seems to be pretty clear.
So to those who've seen the incident reports maybe you could answer those questions instead of trying to create a smoke screen to protect your beloved leader.
It's hard to work out how they have messed this up so badly. If a remember correctly when installing CA-7 you set an option on whether it keeps everything or initialises from new. When backing out the software update they wouldn't restore from a backup but reinstall the previous version and maybe they messed this bit up? This part was most likely done in the UK and in India they manage the batch schedules. If this was left unnoticed for a few days then they have a major issue. It will be a spider web of feeds and dependencies.
"A complicated legacy mainframe system .. " is seldom the problem. The applications are usually well implemented and optimised to the hilt. Administered by people who know the solution inside out and have developed their own tools to administer the systems. Expensive, though, and prone to loss of expertise through the aging out of the experts. None of the bright young kids want to work in that area.
Treat the mainframe as just another distributed server ("until we can get rid of the expensive legacy stuff a bite at a time...") and you are looking for trouble. We switch from our last mainframes next month (also a banking service company) and I'm glad, but also a little sad after 40 years of dealing with their arcana at various employers.
Disagree.
Each installation of this software will be unique but if left in the fresh-from-the-box state they may only differ from each other trivially.
If the uniqueness can be trivial it can also admit of other distinctions and if this instance is differs greatly from the possible trivially unique norm then there's an understandable sense to the phrase 'highly unique'.
It might have been preferable to use the word 'particular' so it may be a stretching of the language but I do not think it breaks any useful rule.
This post has been deleted by its author