back to article 'Inexperienced' RBS tech operative's blunder led to banking meltdown

A serious error committed by an "inexperienced operative" caused the IT meltdown which crippled the RBS banks last week, a source familiar with the matter has told The Register. Job adverts show that at least some of the team responsible for the blunder were recruited earlier this year in India following IT job cuts at RBS in …

COMMENTS

This topic is closed for new posts.

Page:

      1. Peter2 Silver badge

        Re: That's it, blame the help

        I prefer the good old CLI, and especially the ones without help entries.

        I don't think I have ever seen someone who didn't know what they were doing messing one of those systems up, simply because it's virtually impossible to use without having been properly trained in the first place.

  1. Anonymous Coward 101
    WTF?

    Astonishing

    - Are junior employees regularly being put in the position of being able to wreak such havoc in the first place?

    - Were no fail safes being used?

    - Was there no supervision?

    1. Rufus McDufus

      Re: Astonishing

      That a single inexperienced person could delete a queue-worth of batch jobs with such importance to the company is just frightening. Just listening to the technology being used here makes me feel like I've gone back to the 80's.

    2. Anonymous Coward
      Anonymous Coward

      Re: Astonishing

      Why are you assuming it was a junior employee. It might have been the head of IT RBS India. I think the word you are looking for is inexperienced.

    3. Anonymous Coward
      Anonymous Coward

      Re: Astonishing

      "Was there no supervision?"

      If they were following process then there would have been ample supervision. RBS is very ITIL focused, and the instant the decision was taken to backout the upgrade there would have been an incident team assembled. That involves Incident Managers authorising every action being taken, which usually follows much debate on a conference call about all possible courses of action.

      It is possible the inexperienced employee made a mistake attempting the task, but you can't directly prevent that type of error. Or it could be that the Incident Manager was similarly inexperienced.

      1. Anonymous Coward
        Anonymous Coward

        Re: Astonishing @AC 13:14

        >RBS is very ITIL focused

        As are all companies right up until they get the certification then it all goes out the window. Oh, and ITIL and all that best practices voodoo is a pile of crap anyway.

        1. slideruler

          Re: Astonishing @AC 13:14

          Not quite. ITIL done out of the book, is a pile of crap. A recipe for red tape disaster. But applied selectively, in a common sense way, it can work well.

          Certainly if they'd done Change and Release properly - as the book suggests, (i.e. tested the change in a pre-production environment, tested and documented the backout, specified appropriate post-implementation success/fail criteria) they'd probably have been in a better state than they ultimately got into.

          I'd *love* to read their major incident report though... :-)

        2. FreeTard
          FAIL

          Re: Astonishing @AC 13:14

          If they had of followed very basic project management processes, they would have had an immediate fallback. This is very basic stuff, normally I wouldn't care but all my banking is with natwest, as is the wife's, which mean she didn't get paid, and neither did I. Let's see if they take my mortgage out tomorrow, as its with them as well, as is my credit card.

      2. Anonymous Coward
        Anonymous Coward

        Re: Astonishing @AC 13:14

        Been there with the joke they call RBS Incident Management. More like hours of procrastinating until some one senior enough can be found to say yes to the solution offered up at the very start by the techy.

        Been there, suffered it, dozed through most of it as it was the usual way to survive.

        Still it must be remembered they did off-shore roughly 60-70% of all technical staff, but had to delay redundencies on a number of occasions as they could not get enough 'bums on seats' in India for a long time, and een then only after preventing managers from rejecting the crud that was being offered.

      3. Anonymous Coward
        Anonymous Coward

        Re: Astonishing

        ITIL and Incident Managers just smacks of middle-management bullshit. Sack of crap ITIL didn't exactly help them here did it? Seriously, that is one giant bag of bollocks.

      4. Bilby

        Re: Astonishing

        >RBS is very ITIL focused

        Ahh, you have hit the nail on the head. From Wikipedia:

        "ITIL describes procedures, tasks and checklists that are not organization-specific, used by an organization for establishing a minimum level of competency."

        Managers who think that achieving a "minimum level of competency" is sufficient should not be allowed to play with systems that require an above-average level of competency; and managers who think that their specific complex organization is best operated by using "procedures, tasks and checklists that are not organization-specific" should not be allowed to run a lemonade stand.

        The fundamental problem is that there are people in positions of power who think that 'management' is a basic skill in and of itself, which can be applied successfully to running anything, from lemonade stands to multinational banking houses.

        The "Lack of experience" problem started at the top, and the feeling in boardrooms around the world today is "We don't need detailed knowledge of our corporation's systems, so why should we pay through the nose for staff who do? Let's write down the instructions on a checklist and give them to someone in a poverty-stricken hell-hole like Hyderabad, Mumbai or Edinburgh, who will step through them for $17,000 pa and no benefits."

        What could possibly go wrong?

    4. Anonymous Coward
      Anonymous Coward

      Re: Astonishing

      "- Are junior employees regularly being put in the position of being able to wreak such havoc in the first place?

      - Were no fail safes being used?

      - Was there no supervision?"

      Do you not work in IT, then?

      Sysadmins with only a few years of experience are routinely at the helm of command lines which can trash FTSE100 critical systems. Companies simply do not hire elite teams of 60kpa+ admins, and then hire more 60k+ admins to watch over their shoulder as they type every command, every day.

    5. Nigel 11
      Mushroom

      Re: Astonishing

      I suspect it's a multi-level FUBAR. Someone made a small not very serious error. Someone else got the patch-up for that wrong, and made the hole bigger until a chunk of masonry fell into it. And then someone carried on digging even though he *really* should have stopped, and brought the entire building down. "When you're in a hole, stop digging" is good advice, but these guys seem not to have known a hole when they saw one.

      The person who really needs to be shot isn't any of the sods on the ground. It's the person who decided it was OK to get rid of all the experienced staff in the first place. Preferably also everyone upwards from him to the CEO, since it was mission-critical, and to encourage the others. Before we get to find out how much worse it might have been, by experiencing it.

    6. slideruler
      FAIL

      Re: Astonishing

      You have to understand something about mainframes, and the history of RBS/Natwest et al.

      The golden rule of mainframes, from the day they first appeared, is that they are *not* idiot proof. They're unfriendly, ruthless, highly reliable, highly efficient data processing engines. Issue a shutdown or force command with master console authority, and it will do it, no questions asked. Its assumed that an idiot wouldn't be given authority to do something stupid. The same assumption tends to apply to people who have high levels of authority within its various subsystems, like Netview, CA7, RACF etc. You can't blame the technology.

      Historically (ten years or so ago) RBS were the most conservative of banks. They were risk averse within IT, and had a high body count of experienced people in both Edinburgh and London who generally knew their jobs well, and the merged clearing systems worked reliably and efficiently.

      Fred the shred then trashed the bank with his dutch gamble, and Hester arrives. He says to his general "cut costs at all costs". They start swinging the axe on all UK based techies. Nearly all techie level jobs are to go overseas, or to UK based Indian staff on ICT visas. I had a colleague working there. He was told to hand over to Indian staff - and they started interviewing for his 'replacement'. Many who turned up for the phone interview had difficulty in stringing two words together. Others were plainly clueless; pure CV creativity worthy of the booker prize. He eventually found three (yes, three indians to replace one UK based staff member) and spent weeks explaining his job, and processes to them. He documented, powerpointed, and PDF'd to excess - so that any reasonable techie with the skills as advertised could have picked the job up. He then left. Within three days, he got an email at home. They'd managed to stuff things up - and would he help.... The three replacements soon became two, as one left to go elsewhere for more money...

      I'm rather enjoying seeing the fruits of Hesters slash and burn decision. Please, please,please lets see him in front of the house of commons select committee, being grilled on what he's done - and why it went wrong. If they want evidence, I'm sure there are plenty of ex RBS guys who can attest to what's happened.

      I really don't trust any of the banks today - with my personal financial data, or my money. I've got another colleague - now made redundant from Barclays, due to 'cost effective global sourcing'. The problem is, I see few other UK owned and operated banks left.

      Mattress stuffing really does seem to be the only option left.

  2. Anonymous Coward
    Happy

    Where's Julian Assange when you need him?

    1. Androgynous Cupboard Silver badge

      Writing a press release of course

      About Julian Assange. Of course.

    2. Anonymous Coward
      Anonymous Coward

      Evading justice, like the scum-bucket he is?

    3. Anonymous Coward
      Anonymous Coward

      "Where's Julian Assange when you need him?"

      Ecuador.

  3. Scott 57
    WTF?

    Please excuse my ignorance....

    But the queue wasn't backed up before a change?

    And why the hell was an inexperienced op working on this?!

    Cheerio RBS - been nice banking with you....

    1. Nigel 11
      Unhappy

      Cheerio RBS

      My first thought.

      My second thought was whether things are any better elsewhere?

      My third thought is whether after a few months, they won't have learned something from the experience and done at least some of what's necessary to make sure that the lightning strikes somewhere else next time.

  4. Mr_Bungle
    Mushroom

    Work Blunders

    I'm sure most of us have made errors on a rare occasions.

    I once had missed a nights sleep and clicked 'shut down' rather 'log off' when remoting into a server. A file server, with people working on open documents.

    I replied to an email rather than forwarded it. The e-mail was slagging off a customer who was a giant bellend and this and some other thoughts were sent back to him. Got away with this somehow.

    And so on.

    Imagine the bowel emptying horror when this chap realised what he had done.

    1. Anonymous Coward
      Anonymous Coward

      Re: Work Blunders

      "Imagine the bowel emptying horror when this chap realised what he had done."

      You own me a keyboard!

      I did a remote shut down by accident when I was on Holiday in France at 0300 on a Sunday, boss had missplaced the spare key to the server room. They had to break the door down to hit the power button.

      Yes, 0.5 seconds are pressing that button the bottom fell from my world!

      Good job my boss is a grand fellow and has a large sense of humour.

      1. Rufus McDufus

        Re: Work Blunders

        I remember remote-logging into a telco's main DB server 20 years ago via modem (when I worked at CA, funnily enough). I wasn't getting any response on the console so tried a few of the typical key presses to try and get a response. Lo and behold a bunch of reboot-style messages appear in front of me. They'd given me remote access straight to the console. I never did know if my attempts to get some terminal activity caused it or not.

      2. Anonymous Coward
        Stop

        Did It Ever Occur To You

        ..that the real idiot here was your boss ?? His Key was a kind of Single-Point Of Failure.

        A faulty action is not an issue at all. An Issue is that a single faulty action can screw up a complete operation.

        Many places are run like this and the leadership (or lack thereof) is to blame for it.

    2. Anonymous Coward
      Anonymous Coward

      Re: Work Blunders

      I was putting together a script for an evening outage while on a conference call to the customer. Copying the commands to fail a cluster to the B-node, I accidently pasted them into a putty session logged in as root rather than my text editor.

      Following the rule "it's always better for the customer to find out from you first", I had to sheepishly admit not only that I'd taken their site down for 5 minutes but also that they didn't have my full attention on the call. Fortunately I'd dug them out of the brown stuff often enough that my balance at the Bank of Goodwill easily coverered it.

    3. NogginTheNog
      Coat

      Re: Work Blunders

      Ooops! I think there may be a position open for you with RBS from next week... :-D

    4. KjetilS

      Re: Work Blunders

      ... and I once hit the breakers marked "UPS" thinking it was the breakers *to* the UPS. In reality, it was the ones going *from* the UPS. I realised my mistake when the whole room suddenly went dark and silent.

      That resulted in a two hour downtime for the whole company, and my boss laughing his ass off.

      His comment? "Atleast you won't do that again soon"

      The breakers have better markings now.

      1. Tom 38

        Re: Work Blunders

        One of my colleagues (now my manager \o/) wanted to kill a recently backgrounded job on the only production server hosting our website.

        He meant to type:

        kill -9 %1

        He typed

        kill -9 1

        Thus killing init, putting the box into a dead state, and the website offline until we could get an techie into the DC to press the reset button.

        After this, all servers get DRAC consoles, he got his root access taken away, and we got backup servers.

        1. Anonymous Coward
          Anonymous Coward

          Re: Work Blunders

          As somebody who was called in to several "managed incidents" over the years, one of the first things to do was to fire off an email to the operators mailbox asking who was the shift manager on duty.

          Unfortunately the spell checker in outlook does not pick up on the missing "f" in "shift".

          Never ceased to amaze me the number of replies I would get.

        2. Anonymous Coward
          Anonymous Coward

          Re: Work Blunders

          "kill -9 %1

          He typed

          kill -9 1"

          Ah... the glorious kill -9.

          I've sat there and played 'rock paper scissors' with other admins to see who *didn't* get to be the one to send a questionable kill -9 on HA servers before.

        3. tfewster
          Facepalm

          Re: Work Blunders

          > he got his root access taken away...

          But he'd just gained valuable experience! That gut feeling that makes you pause before hitting return because something isn't quite right..

          OTOH, someone who makes the same mistake twice deserves no mercy

    5. keithpeter Silver badge
      Windows

      Re: Work Blunders

      "Imagine the bowel emptying horror when this chap realised what he had done."

      I suspect this must have been a slowly dawning realisation given the description of an 'incident team' above and the procedures that should be in use. Perhaps the full scope was not apparent for hours/days. Lovely. Gives me chest pains just thinking about it.

    6. Dr. Mouse

      Re: Work Blunders

      "Imagine the bowel emptying horror when this chap realised what he had done."

      I can't think of a better way to describe it. Well done :)

      Yeah, I've committed serious errors in the past, although none on even the scale of yours Mr_Bungle. My most recent was when I was clearing up some log files, and then tried to restart syslog without checking the command line was clear. "rm /etc/init.d/syslog restart", followed by confusion over the error of "no such file: restart", followed by a panicked search for a backup.

      Anyone can make a cockup when under pressure or not concentrating properly on a menial task. I hate to think of the panick the guy who did this went through. Bowel emptying indeed!

      1. Wensleydale Cheese
        Happy

        Re: Work Blunders

        @Dr. Mouse

        "Anyone can make a cockup when under pressure or not concentrating properly on a menial task. I hate to think of the panick the guy who did this went through. Bowel emptying indeed"

        I was once called out by a panicking operator when disk space was running dangerously low, with only 45 minutes to go before the night shift started.

        By the time I arrived, word had got out and anxious line managers well arriving in droves.

        "Can we have a meeting about this?"

        My response was to barricade myself in the computer centre and ask the operators to keep all the managers out.

        Fortunately they had the power to do that and I could concentrate on the problem in peace and quiet.

    7. Anonymous Coward
      Anonymous Coward

      Re: Work Blunders

      You chaps so funny!!!

      I be having a work blunder of my own just last Tuesday! Get phone call from big boss in Scotlandland saying to please be re-running program that makes money for bank. Silly thing go crash and say bad things so I switch PC off and on again and try to run program again. Same thing!!! So I try again and silly thing just say more bad things. So I switch PC off and go home.

      Anyone for to be needing expert in Microsoft Word and CA7?

      1. MrZoolook
        Thumb Up

        Re: Work Blunders

        Comments like this are why I sometimes want to create extra accounts, so I can upvote again!

    8. Anonymous Coward
      Anonymous Coward

      Re: Work Blunders

      Oh yes indeed, like the time I reached down to reboot the four servers I was building (after configuring the RAID enclosure), pressed the power buttons with two fingers of each hand and realised that my servers were in the cabinet next door to the one with the console in and that the moment I released the pressure I was going to perform a decidedly inelegant shutdown on two SQL clusters running the backend of a customer facing webapp....

      It was a lonely few moments followed by a fatalistic shrug, a quick powerup and the rest of the day keeping a very close eye on the incident queues in case I might have to put my hand up and admit my error.

      Or the day (at another bank) that I got a snotty email asking why I had blasted a VM running the print spooler for mortgage applications resulting in no letters having been printed for 11 days ( the answer beingthat it was a java component, called on demand, that was undocumented on that machine - indeed documented as on a different machine - that had been moved there in response to an incident months before and that no-one had bothered to remediate).

      Or indeed any of the other stupid but easily commited errors that happen all the time - even to experienced and competent people.

      This stuff happens all the time, just not usually so badly to such critical and highly visible systems.

      1. Rufus McDufus

        Re: Work Blunders

        "Re: Work Blunders

        Oh yes indeed, like the time I reached down to reboot the four servers I was building (after configuring the RAID enclosure), pressed the power buttons with two fingers of each hand and realised that my servers were in the cabinet next door to the one with the console in and that the moment I released the pressure I was going to perform a decidedly inelegant shutdown on two SQL clusters running the backend of a customer facing webapp...."

        Ha ha! I did that on a mail server. Pressed the button on the wrong box. I was looking around for a broom or pole to stick in the power button to replace my finger. Then my boss walked in and laughed at me!

        1. Anonymous Coward
          Anonymous Coward

          Re: Work Blunders

          Thankfully not me, but I was warned about it when I started as an admin by the person who did. We have a high availability system that underlies pretty much everything. 2 parallel systems running in different datacentres, with work load balanced between the two. To do any updates or changes, you route traffic from A to B and stop A to update it. This guy routed all traffic from A to B and stopped B.

          I have accidentally made a mistake with the mqsistop command though, what sort of idiot names a message broker and a config manager the same with just one letter different :)

  5. Bob Terwilliger
    FAIL

    Smell the management bull..

    Typical management/politician type answer that tells you more by it's omissions than what it contains.

    "Well I have no evidence of that." <translation> I have been told by underlings but haven't seen the documentation myself

    "The IT centre - our main centre, we’re standing outside here in Edinburgh, [is] nothing to do with overseas. " <translation>The mainframe is here, I'm not saying where the sys admins are

    Our UK backbone has seen substantial investment." <translation> We have upgraded hardware and software, I'm not saying where the sys admins are.

    1. Anonymous Coward
      Anonymous Coward

      Re: Smell the management bull..

      it looked like he was standing outside Fettes Row - Group Tech HQ just north of Edinburgh city centre - whilst there is a small dev-only data centre in there; the main data centres are to the east and south of Edinburgh and are pretty big. i.e. the big rooms are about the size of a football field. The LPARs that run all the batch jobs are actually quite unimpressive when you stand next to them.

      When I was last in them (some years ago) - they were really nicely done (compared to some horror machine rooms I've seen) and very well organised.

      I think it's fair to say that people who worked for RBS pre-problems would say it is amazing place to work and highly professional.

      1. Anonymous Coward
        Anonymous Coward

        Re: Smell the management bull..

        WAS a great place..

        There fixed it for you.

        Quality has degraded significantly over the last ten years.

  6. Evan Essence
    Stop

    Problematic updates are normal?

    "... the relatively routine task of backing out of an upgrade to the CA-7 tool. It is normal to find that a software update has caused a problem; IT staff expect to back out in such cases."

    Really?

    1. Anonymous Coward
      Anonymous Coward

      Re: Problematic updates are normal?

      '"... the relatively routine task of backing out of an upgrade to the CA-7 tool. It is normal to find that a software update has caused a problem; IT staff expect to back out in such cases."

      'Really?'

      Yes, really - in the dysfunctional screwed-up world of British banking, where immensely over-rewarded people completely devoid of banking qualifications continually seek to double their money by putting together mergers, takeovers, and ambitious new ventures. Without for a moment considering the impact on the resilience and maintainability of the computers down in the engine room without which the whole bank would instantly fall apart.

      1. Anonymous Coward
        Anonymous Coward

        Re: Problematic updates are normal?

        Question: Who is the odd one out from the following list?

        Lord Stevenson, former chairman, HBOS

        Andy Hornby, former chief executive, HBOS

        Sir Fred Goodwin, former chief executive, RBS

        Sir Tom McKillop, former chairman, RBS

        John McFall MP, chairman of Treasury select committee

        Alister Darling, Chancellor of the Exchequer

        Sir Terry Wogan, presenter of Radio 2 breakfast show

        Answer: Sir Terry Wogan. He is the only one with a banking qualification.

        Acknowledgement: Private Eye

        1. Anonymous Coward
          Anonymous Coward

          @Tom

          I am so stealing that!

          1. Robert Carnegie Silver badge

            Correction?

            http://en.wikipedia.org/wiki/George_Osborne is a Recommended Update. Not as Chancellor, but in the Catalogue of Non-Competence.

            "Osborne's first job was entering the names of people who had died in London into a National Health Service computer. He also briefly worked for Selfridges, re-folding towels. He originally intended to pursue a career in journalism, but instead got a job at Conservative Central Office."

            Does anyone remember some badly folded towels in Selfridges in the early 1990s? The famous Bathgate scandal?

          2. Anonymous Coward
            Anonymous Coward

            Re: @Tom

            "I am so stealing that!"

            As was I... If you are going to steal, why not steal from the best (Private Eye)?

        2. JimmyPage Silver badge
          Stop

          Re: Problematic updates are normal?

          I was channel hopping a few weeks ago, and hit UK Gold and an episode of Yes Minister, where a bank CEO wanted a few more storeys on his HQ building and had to go to the Department of Administrative Affairs to discuss it.

          As he meets Sir Humphy, the "joke" is that the CEO of the bank hasn't the first clue about banking, and that Sir Humphy is lined up for a nice non-executive directorship with the bank when he retires.

          Can you guess when that episode was written:

          a) 2011

          b) 2001

          c) 1991

          d) 1981

          (Answer: d)

        3. Anonymous Coward
          Anonymous Coward

          Re: Problematic updates are normal?

          It is very stealable (and is already on my FB page) - but also slightly out of date; John McFall became Baron McFall of Alcluith in 2010, and is therefore the former Chairman of the TSC (2001 to 2010). I have no idea whether Andrew Tyrie, the current Chairman, has a banking qualification.

Page:

This topic is closed for new posts.

Other stories you might like