Captia has crapped out, now to be know officially as Crapita.
'Major incident' at Capita data centre: Multiple services still knackered
A major outage at a Capita data centre has knocked out multiple services for customers – including a number of councils' online services – for the last 36 hours. Some of the sites affected include the NHS Business Services Authority, which apologised on its website for the continuing disruption and said it hoped its systems …
COMMENTS
-
-
Friday 26th May 2017 13:00 GMT Anonymous Coward
"The remainder of services are now being robustly tested"
Translation "Sorry guys, but we pay such low wages that we get the lowest grade staff and they couldn't be bothered to test the generators. Rest assured that those responsible are now busy playing a datacentre sized game of switch it on and pray it comes up..."
-
Friday 26th May 2017 10:38 GMT Lee D
Stop relying on one datacenter to be up.
This is WHY Windows Server and lots of other OS have HA functionality.
Hell, it's not even that hard to enable. Or just provide a secondary system somewhere else that does the same even if you don't have fancy connections between them.
If your platform is not virtualised, why not?
If your platform is virtualised, turn on the HA options so that the VM replica in another data center just starts up and becomes the primary and your domain names, etc. resolve to all IPs that can offer the services.
I still don't get why ANY ONE FAILURE (one datacentre, one computer, etc.) is still a news item nowadays. It shouldn't be happening.
Even if you deploy on Amazon Cloud or something, PUT THINGS ELSEWHERE TOO. It's not hard.
-
Friday 26th May 2017 11:04 GMT FrankAlphaXII
It seems that Crapita don't believe in Business Continuity otherwise an outage at one datacenter wouldn't take down part of the NHS and a number of local governments. As you stated, there should be no such thing as a single point of failure in 2017. That doesn't bode well for UK emergency preparedness at the most important level. If something as simple as internet communications get taken down that easily what happens when more than one of their datacenters fails and can't/won't be restored for weeks or months?
I work in Emergency Management for a government agency at a local level plus I develop BC/DR plans for SMBs on the side so I see this kind of shit out of government outsourcing contractors all the time, beancounters that run businesses like Crapita (looking at you Serco, Egis, and Leidos) don't get really simple preparedness and mitigation concepts and if they do understand them, they'll be first to balk at the price tag associated with them. Until they've had their "effeciencies" blow up in their faces. Thing is, in this day and age fault tolerance and providing an emergency level of service for data when something does happen isn't hard or expensive and it's really unforgivable that a supposedly first in class outsourcing contractor can't provide it's expected level of service because their infrastructure's shit and their planning's worse.
-
Friday 26th May 2017 12:15 GMT GruntyMcPugh
"Stop relying on one datacenter to be up."
Indeed, a couple of years ago I did an audit at a well known bank, each of it's datacentres, which were almost identical. For some reason the door on the gents in one had a glass panel, and the other didn't, and the vending machines in the break area were further apart in one,.... but the IT equipment, mirrored exactly.
-
Friday 26th May 2017 13:13 GMT Anonymous Coward
"Stop relying on one datacenter to be up."
Having 2 DCs and designing for no single point of failure costs ~ 3 times the money. This is government IT we are talking about. The DR plan is probably to build a new DC!
"If your platform is not virtualised, why not?"
Because it's such a large system that it uses the resources of complete physical servers is usually the answer in these type of systems.
-
Friday 26th May 2017 13:42 GMT Halfmad
Thing is with these companies that although they may include agreeing to have failover sites etc when sh!t happens and those don't work they just say "hey sorry, won't happen until the next time it happens" and as the NHS is f*cking awful at contract law they have no monetary clause to hammer them with.
Seen this so often in the past 10 years.
-
-
Saturday 27th May 2017 12:42 GMT handleoclast
Re:consultancy fee
My county council pissed away 7 figures to Price Watercloset Coopers to come up with ways of saving money.
My suggestions:
1) Don't piss away 7 figures to PwC
2) Hire staff capable of coming up with suggestions themselves (suggestions other than asking PwC what to do).
Ooooh, where's the IT angle? My county council uses Crapita for their payment systems. Who'd have expected that?
-
-
-
Friday 26th May 2017 16:28 GMT Rob D.
It's not hard but ...
Actually it is hard because it isn't that simple. In the real world, most of the problems around business continuity come up because someone has tried to turn a tricky problem requiring attention to details in to something that has a simple answer which is easier to understand and by definition is cheaper.
Commonly this sounds something like, "We paid to virtualise everything so we can just move it if we have a disaster to the other data centre. Easy - please explain why we have to pay for anything more?"
Reality bites early in the requirement for budget up front for the significant additional planning, design, implementation, testing, training and infrastructure costs. The details house many devils here. Throw in time required for testing, operational training, operational proving in production, and by now the System Integrator is wishing you'd never shown up to explain what is missing while they work out how can they get past User Acceptance without anyone realising the business continuity isn't really there.
-
-
-
Friday 26th May 2017 11:02 GMT m0rt
Re: Probably got their own staff to install the back up generators
Bets on diesel in the generators being a couple of years old? The fact they are now having an issue with parts suggests that the sudden loss of power cause some great failures.
Today we shall mostely be Capitalising on the Capitulations of the PITA that is Crapita.
-
Friday 26th May 2017 11:42 GMT Anonymous South African Coward
Re: Probably got their own staff to install the back up generators
Not taking any bets, but regular testing of diesel generators need to be done.
Heck, just kick out the mains CB and let the genny take over (for 30 minutes each week), this way you can weed out any old and dodgy UPS'es as well.
-
-
Saturday 27th May 2017 17:22 GMT PNGuinn
Re: Probably got their own staff to install the back up generators
The gennys were tested weekly but noone thought to buy any fuel and they ran dry aftter 3 mins?
They did buy fuel but it was petrol / bunker oil because that was cheaper?
They went green and bought a load of cooking oil cheap?
No - Crapita aren't even **that** capable.
>> This might have helped.
-
Friday 26th May 2017 13:20 GMT Anonymous Coward
Re: Probably got their own staff to install the back up generators
Ours are tested by the power failures hitting some weeks apart, lately.... just last time our small lab datacenter was kept alive by the UPS and its generator, the main one failed. Later they discovered scheduled maintenance was no longer active. Still, after asking several time, I don't know who's in charge of re-filling the diesel tank (I'm not authorize to perform it myself, you know, the dangers of handling dangerous chemicals and operating on machines I was not trained for...)
-
-
Sunday 28th May 2017 12:18 GMT Stoneshop
Re: Probably got their own staff to install the back up generators
Isn't the refilling done by the tanker driver who delivers the stuff?
As the Germans say 'Jein' (contraction of yes and no): first someone[0], having been notified by Facilities that the tank is running low, has to call the supplier for delivery, then with the tanker arriving someone[1] has to unlock[2] the gate/hatch/trap door to the tank neck.
[0] from Finance, or Contract Manglement[3]
[1] from Security[3]
[2] you don't really want someone peeing down the filler neck, or dropping sand or sugar in.
[3] in extremely enlightened cases these responsibilities will have been delegated to Facilities as well.
-
Monday 29th May 2017 02:25 GMT Alan Brown
Re: Probably got their own staff to install the back up generators
Which is fine until someone shuts off the feed to one of the tanks (vandalism) and said driver pumps N amount of fuel because that's what he's expecting to pump instead of looking at the fill gauges and stopping when they say "stop"
Cue multiple thousand litres of diesel not being in the tanks, but instead in the stormwater system and lots of people asking "what's that smell?"
-
-
-
Friday 26th May 2017 17:16 GMT Stoneshop
Re: Probably got their own staff to install the back up generators
Heck, just kick out the mains CB and let the genny take over (for 30 minutes each week)
Ingredients: one power grid with regular shortish (30 minutes or less) outages, one computer room floor with various systems, one UPS powering the entire floor running at ~15% capacity, one diesel genny. Due to the regular power dips, we were quite sure the UPS and diesel were functioning as intended; fuel was replenished as needed. Then came the day that the power consumption of the computer room doubled due to an invasion of about 45 racks full of gear. And then came the next power dip. Which made the UPS (powering the computer room; the generator was hooked up so that it basically kept the batteries charged) suddenly work quite a bit harder. And longer; for a number of reasons. Which caused the temperature in the UPS room rise quite a bit more than previously. Environmental monitoring went yellow, and several pagers went off, and Facilities managed to keep the UPS from shutting down through the judicious use of fans scrounged from a number of offices.
Moral of this story: cooling is important too, not just for the computer room, but also for the UPS room.
-
This post has been deleted by its author
-
-
Tuesday 30th May 2017 13:36 GMT CrazyOldCatMan
Re: Probably got their own staff to install the back up generators
Bets on diesel in the generators being a couple of years old?
Or, in a very old situation, diesel in the under-carpark tank has seeped away into the subsoil because of a flaw in the tank..
Which was fun when the generator did kick in for real but only ran for ~20 minutes before exhausting their local tank..
No-one was checking the levels of diesel in the bigger tank. Oops.
-
-
-
-
Friday 26th May 2017 11:24 GMT batfastad
Well!
Well you don't think that the money their customers (NHS Trusts, Councils etc) pay actually gets spent properly and proportionally on the infrastructure backing their services do you?!
Look it's contract renewal time... lets take the money and sweat the assets of our existing platform for a few more years. After all, we've got executive pay reviews coming up soon.
The fact that a DC has gone down and that has taken out production service is unforgiveable in this day and age.
-