back to article RBS Group hopes £750m IT shakeup splurge will prevent next bank mainframe meltdown

RBS Group has spent £750m overhauling its IT – which includes replacing the systems at the heart of its mainframe mega-meltdown in 2012. The banking giant has established three separate batch schedulers to process transactions at NatWest, Ulster Bank Northern Ireland, and Ulster Bank Republic of Ireland. It's hoped this will …

COMMENTS

This topic is closed for new posts.
  1. Matt 21

    Would have been a lot cheaper

    to keep hold of experienced staff instead of trying to save tuppence on staff costs.

    1. AMBxx Silver badge

      Re: Would have been a lot cheaper

      This is the same company that has just slashed contractor rates by 10%

      1. Fatman
        Unhappy

        Re: Would have been a lot cheaper

        This is the same company that has just slashed contractor rates by 10%

        So that's HOW they expect to pay for it, and here, I thought it was coming out of executive bonuses!

        Silly me!

    2. Spoonsinger

      Re: Would have been a lot cheaper

      Cheaper to who? They seemed to have done quite well out of dumping staff, fiddling with the dubious markets & general mismanagement. Might have got a bit of bad publicity, but that didn't really have any consequences - obviously apart from a big government bail out(*)

      (*) but that really didn't cost them anything.

  2. Anonymous Coward
    Anonymous Coward

    No single point of failure...

    So, how does splitting into four unconnected systems remove the single point of failure? Four parallel systems __may__ have helped, but all this does is "brand" those affected by a single point of failure...

    1. Anonymous Coward
      Anonymous Coward

      Re: No single point of failure...

      The issue was that the Ulster brand batch was (for some historical reason) dependent on the Natwest batch. That meant that Ulster bank customers had to wait until Natwest batch caught up before they could get batches run. Given that Natwest is the largest brand by number of customers and it took a long time to recover all the data & run the batch backlog, that meant it seriously delayed the recovery of the Ulster branch.

      Removing the inter dependencies simplifies the batch and makes a future recovery simpler, should anything happen.

      1. Anonymous Coward
        Anonymous Coward

        Re: No single point of failure...

        [quote] The issue was that the Ulster brand batch was (for some historical reason) dependent on the Natwest batch [/quote]

        That's because it's the same batch suite run multiple times but with a different High Level Qualifier (HLQ). Plus there are some common components.

        HLQ format:

        1st char : P = Production, T = Testing

        2nd char : G = Group, R = RBS, N = Natwest, J and K = UBN and UBR (can't remember which is what now), T = Test

        3 and 4 : Application ID

  3. John Smith 19 Gold badge
    Facepalm

    Another great product from Computer Associates.

    Oh dear.

    On the up side.

    A scheduler fail next time takes out 1 of their 4 brands.

    Which keeps the cash rolling in form the other in.

    1. AOD

      Re: Another great product from Computer Associates.

      As has been stated here, the scheduler itself didn't fail.

      This was a prize example of a PICNIC (Problem In Chair, Not In Computer) so the focus should be on the inexperience of the folks who trashed the job queue along with why they were even being put in that situation in the first place.

      As for a distributed high availability setup, well that's fine but generally you find that certain things get replicated from one instance to another (y' know, so they stay in step).

      This sort of setup helps when you have a hardware failure, but not when someone (or something) explicitly trashes key information in the live system. Then it's a whole other ball game.

      Of course if your scheduler supports some form of point in time recovery, then that would be handy also, but scheduling systems can be horrendously complex beasts especially when dealing with multiple feeds, dependencies, in flight jobs etc. Not for the faint hearted.

      The separation of schedulers actually isn't a bad thing. A mistake made against the job queue of one brand should leave the others running without any issues. Of course if you really want to do it properly, you also segregate the staff access to make it less likely that a mistake made against one system can be applied to the others.

  4. Tom 38

    Remind me again

    How much money they saved off-shoring permies and slashing contractor rates (and hence contractor headcount)?

    1. ecofeco Silver badge

      Re: Remind me again

      Well £750 MILLION, obviously.

      *snerk*

  5. Aristotles slow and dimwitted horse

    @ No single point of failure...

    I would assume that by design these schedulers, whilst independent on a BAU basis - have been implemented in some form of distributed HA configuration and can assume workload for another in the event of an outage; or at the very least each independent scheduler solution has it's own HA config.

    At least I hope that's what they have done.

    1. Anonymous Coward
      Anonymous Coward

      Re: @ No single point of failure...

      Well, yes, but that's not how it reads to me.

    2. Anonymous Coward
      Anonymous Coward

      Re: @ No single point of failure...

      Everyone goes on about HA & single points of failure, completely ignoring the core issue in the 2012 meltdown.

      The batch schedule was lost. That schedule would be on some kind of RAID storage, replicated to the remote disaster recovery site instantly (you need zero data loss in DR). Corrupt that data and you have corrupted your resilient onsite copy and your remote copy at the same time. You can have multiple levels of resilience and it doesn't help you one iota.

      You /can/ run set of data at a historic point in time, but then you also need the capability to reliably roll forward the transactions which occurred between the historic point to just before you corrupt the data (and potentially any transactions which still got through after corruption).

    3. jonathanb Silver badge

      Re: @ No single point of failure...

      That deals with hardware failure, power cuts, fires, floods and so on. It doesn't deal with an incompetent Indian (or person of any other nationality) messing up the software.

  6. Tristram Shandy

    The original meltdown was caused, purely and simply, by human error, and the problem of inexperienced staff will not be solved by having three separate schedulers. Backing out a failed mainframe software upgrade, which I believe was happening at the time of meltdown, is an occurrence that happens from time to time, and shouldn't cause problems to an experienced systems programmer.

    Whether the scheduling queue was deleted by accident or intent, we don't know, but in either case lack of experience has to be a factor.

    If there are now three schedulers, then software updates to them all are presumably occurring at three times the rate as originally, so are there going to be three times as many cock-ups?

    Not having everything go titsup at the same time is useful, but not much has been gained.

  7. aftermath99

    Where did reg get it's info?

    Was the 750 Million spent solely on upgrading the systems running the commercial bank? Was it hardware? Was it software? Also, if modernising, why run batches at all? Shouldn't they be real time? (Like HSBC, et al)

    the IT budgets at RBS have shrunk a lot in recent years, but are still significant and most likely exceed 750 Million by a lot. Without a breakdown, this could be just a PR spin.

  8. Oh Bother

    Brtis?

    Those of us in the Republic of Ireland don't generally refer to ourselves as Brits ;)

    Also, the major Ulster Bank outage here lasted close to 5 weeks, not a few days.

  9. Long John Brass

    Whats the bet

    That lucky sod(s) that ran the unified batch system; will now get to run all 4 or 5.

    Hope they updated the tick sheets these guys work from :)

    1. Securitymoose

      Re: Whats the bet

      No, the same people won't run it. The staff turnover in India means that as fast as guys get trained up, they go off to new customers, so that established customers, including RBS, always get the rookies. The only answer is to return control to the UK and start treating staff as valuable resources instead of disposable costs.

      1. Fatman
        FAIL

        Re: Whats the bet

        The only answer is to return control to the UK and start treating staff as valuable resources instead of disposable costs.

        "But then I can't get my bonus!" cried the bank's CEO.

  10. TopOnePercent
    Stop

    FFS

    Its not the software. Its not the hardware. Its the offshoring.

    All RBS have achieved is having the same inexerienced, inexpensive people look after more versions of the same system that they already don't understand.

    Taking people that have no history of computer use, who grew up 100s of miles from the nearest PC, and whose entire family line were small holding farmers, and putting them in charge of your global IT infrastructure cannot end well.

    3-5 years experience does not qualify someone as a senior staffer, not in the real world. All it means is they're leaving the junior ranks with just enough knowledge to be dangerous.

    IT is an expensive business. Its supposed to be expensive, because it takes years to learn, and many more years experience to get good at it.

    1. Anonymous Coward
      Anonymous Coward

      Re: FFS

      Sorry, by software I meant coding rather than off the shelf purchases and hence ability of staff. My point (and I agree with you here) was that 750 Million doesn't sound like a lot to me and is probably more for maintenance, as an overhaul would be much more expensive (IMHO).

    2. Platelet

      Re: FFS

      "Taking people that have no history of computer use, who grew up 100s of miles from the nearest PC, and whose entire family line were small holding farmers, and putting them in charge of your global IT infrastructure cannot end well."

      That's a very harsh description of the Scottish population

  11. plrndl
    FAIL

    Deja Vu

    The single point of failure remains where it always was, the board of directors.

    IT is as fundamental to modern banking as is risk management. There should be a person with in-depth understanding of IT at the same level as the Finance Director. Only then will sensible decisions be made on IT.

    1. Fatman

      Re: Deja Vu

      The single point of failure remains where it always was, the board of directors.

      100% DEAD ON TARGET!!!!!!

This topic is closed for new posts.

Other stories you might like