Too late
some people have already thought this
http://forums.theregister.co.uk/forum/3/2015/06/17/natwest_hit_by_another_rbs_it_cockup/#c_2546376
The NatWest and RBS IT cock-up that caused 600,000 transactions to go missing this week was entirely unrelated to the 2012 mega IT cock-up, the bank has said in an not-too-reassuring update. In a webcast about the Royal Bank of Scotland's IT strategy today, Simon McNamara, chief administrative officer, said: "It is different …
I respectfully disagree. The management gobbledegook filter clearly states "ingest", so they had the file. What you've described is a failure to "transfer" a file.
No, the word "ingest" means, quite clearly, that the file had some kind of unexpected content.
And we're now ALL thinking the same thing.
"The CSV file had a comma in the wrong place."
Because decades of experience, billions of pounds and ever-improving technology STILL can't defend itself against a comma in the wrong bloody place.
Such as it ever was, is, and no doubt will be.
This post has been deleted by its author
Perhaps there was a data error, then.
But nobody checks anything these days until something goes very visibly wrong. The doctrine in schools is that we must not query the creativity of the little darlings by anything so vulgar as re-reading and checking their work.
And so they become the programmers who let the customer find the faults and complain.
Would have thought they used something like a dry run process, some kind of environment that would let them look at what the file is and does before it goes live.
I don't know lets call it test, you are testing the system so perhaps call it a test environment. Where you can then test the 13million transactions and find missing commas before it hits your live environment.
Or have I just worked in places with logical thinking?
No, no. They have the filter that checks for the misplaced comma. It runs a vm that logs the results and that's all that vm does. However the guy who normally archives the logs was fired downsized rightsized retired last week. As a result the vm crashed when it could no longer write the log file. This in turn cause the ingest failure.
Seems plausible to me. Alternative explanation: Cleaner unplugged the server for her Hoover / tripped over the relevant cable and noone noticed. Well, it is RBS.
BUT
600,000 transactions. FROM 1 THIRD PARTY. Should be possible to verify that by checking what from whom has gone BOING. The anguished cries of the twitterati et al should give a good cross section. That would check the veracity of the only clear part of the gobble.
Anyone got a few minuites to waste checking?
I do love the word Ingest. Do they still use paper tape / tie readers? Enquiring BOFHs need to know etc.
FROM 1 THIRD PARTY
Which is HMG. Most of the screams were about tax credits and such. I did not notice a twitterati scream about salary (or other form of money earned by hard work).
When HMG is involved it can always be presumed to be the guilty party. So my guess is - incorrectly formatted transaction file coming in.
It's reasonable to assume that the RBS systems have are able to handle exceptions at a transaction level, i.e. reject records that contain bad data, rather than the entire file. If they didn't, problems like the recent one would happen at least once a week.
If a system that can handle bad records rejects a whole file, the likelihood is that the third party that supplies the file has modified the format, either deliberately or accidentally.
You're all wrong! At least if it has been established that this is a government batch-file.
They've said they can't restore it until the weekend, so it's obvious. The government sent the CD, and it's got lost in the post. So they've re-burned it and posted it again. Natwest now have a techy permanently stationed in their post-room, Segway at the ready, to zoom him off at top speed to run it up to their server room. 15 minutes after it arrives, all will be sorted.
Presumably if this one goes wrong, then they'll put it on a memory stick, and lose it in a taxi instead. it's important to plan for a variety of failure scenarios...
This post has been deleted by its author
Standard processing for Direct Debits and Standing Orders can still be three days depending on the source and target banks and the size of transaction, so three days plus the missed day means people *could* be affected for four days. The vast majority will have completed by now, but you don't want to further upset the tiny minority still affected.
This post has been deleted by its author
Failure to ingest
Is it just me that has a mental image of a server in an old fashioned metal cabinet, tapes whirring of course, and green vomit spewing out of it (accompanied by smoke and plaintive beeping), as it fails to ingest this file?
Perhaps a computer like this one (Youtube link).
Scalability is not the same as availability.
No-one with a clue about what they are doing says a system is 100% available. That would be saying that there is no remote possibility that could ever endanger your system, including Godzilla strike, changes to the laws of physics and err... Moth-ra strike. Or Mechagodzilla. You would only offer an SLA of 100% if you are happy that the payout you are contracted for (or likely to be fined for in this case) is affordable.
If it's perfectly sane (i.e. can't be crashed by bad data), perfectly stable (i.e. no changes are applied) and perfectly secure (i.e. no patching needed), you can design the infrastructure for 5-nines.
However, basic probability theory says that an event in the past does not affect the probability of future random events.
I imagine the import routine puts the direct debit/standing order transactions into a single monolithic database that is latency/performance dependent on the server it runs on and/or the storage it uses. No doubt RBS have upped the server spec over the years, put the database on faster storage, but baulked at actually rewriting or modifiying the app to use a distributed database. It's a scale-up problem.
Investment from RBS probably means "we'll buy more hardware" not invest in staff who know how to write the application for the modern world.
You need to understand the way the updates are processed, the parallel update requirements etc. before you cam make a statement like "modifiying the app to use a distributed database".
I never worked with RBS, but certainly while we were working with NatWest they were willing to rewrite, the question was into what? While there have been many advances since RBS bought NatWest, 2002?, what would you write it in today?
Can I just make a point here about the number, 600,000.
Firstly, it's a bit exact, meaning that anyone that works in numbers knows that when someone give an exact rounded number then it's bullshit, around 600,00, just over 600,000, are both ok.
Secondly if it was say 6 million but you say 600k no one will ever find out unless either a. you know 600k other RBS customers or b. Over 600k people complain on twitter/facebook (which strangely enough would be 10% of that number)
Just my tuppence worth.
What good does it do fining an organisation - where is there justice in that?
That 56 millions of squids would be better spent in relief to those directly affected and making infrastructure improvements so it limits chance of things happening again.
There aint no justice in a fine. And a fine has potential to harm those already harmed by the earlier incidents by diverting dosh away from where it is needed? (If a fine is required it is better to fine those individuals directly rather than the organisation itself?)
Organisations do not create money and any money in an organisation is generally and principally provided by its customers. High earners in the company are likely to remain unaffected however ...
Far better for the authorities to instruct organisation to implement an improvement plan and compensation plan to 56 millions squids level. (Just like Health & Safety inspections in uk - there are no justifications for slapping on a fee because additional improvements are required?)
Point well made and taken.
But the banks use customers money to do risky stuff that creates profits or losses for the bank (and increasingly these days particularly in days of high inflation? - there is attitude: get your money working for you. Meaning any organisation handling money speculates with that money in its possession with a view to creating profits or at least covering staff costs and staff bonus for those involved?)? In UK Icelandic bank "crash" (term used loosely) cost local and central guvmint administrators quite a bit as they had dosh (that is customers dosh and profits made from handling that dosh) tied up in Icelandic funds no?
is a waste of time, they just charge their customers more to recover the money. What should happen is that those tossers directly responsible and accountable for the fcuk-up should be personally fined. Taking money directly out of their pockets would seriously focus their minds.
True, true, ...
The trouble is in UK that the strong tradition of Tort, vicarious liabilities, redress, ... sort of is overlooked and ignored by recent practice of not holding individuals to account with a preference to apply a fine to an organisation.
It seems a bit of a strange set-up considering England's attraction to Common Law?
I suppose we may draw our own conclusions as to why this set of circumstances comes about.