So....restore from backup
The issue is with the docs.
CTO is a knob and why is this an issue because they obviously have backups don't they....
"How screwed am I?" a new starter asked Reddit after claiming they'd been marched out of their job by their employer's CTO after "destroying" the production DB – and had been told "legal" would soon get stuck in. The luckless junior software developer told the computer science career questions forum: I was basically given a …
Only this shouldn't have been a backup issue...
A common sense strategy would have been to use live backups (given that was what they are doing, I'm assuming that there's nothing in prod that would lead to regulatory/privacy/other issues that would dictate NOT using prod data) and read-only accounts to restore the production DB into test. Script the whole thing, run it daily with a few simple checks for restore size/timestamps/simple query results/e-mail the DB team and you save the company the hassle of discovering their restore process stopped working correctly months ago.
Imagine - testing your backups AND creating sensible procedures AND showing new starters sensible ways to do things from day one. And if the backups failed to restore as expected, fix the issue...
Compare this to "break it, discover backups aren't working and fire people" and I would wager make the same mistakes again in the future...
This seems to be an all too common problem. Too many small companies say they can't afford backups and backup validation, I guess their data isn't all that important then.
No, nothing to do with how valuable data is. It's to do with how valuable time is.
Take a small IT firm with 2 staff, both working 16 (and sometimes more) hours per day to get the business built up. Large amount of work starting to come in as well, but not enough work/income-over-costs to hire someone else.
If you get jobs out, you eat. If you don't get jobs out, you don't eat. Or don't pay the rent, or power, or..
Verifying backups can be restored is a pretty long process that ties up resources. Small firms have a hard time buying extra kit let alone having a spare machine they can put on such un-important stuff to actually verify they can re-image from a recent copy. And if your backup is data only (plus licenses etc), then re-installing a machine from scratch can take a couple of hours (*nix) or several days (you-know-which-OS) especially where updates are involved. And in either case, there can be a lot of time involved in making sure that the task is completed properly and everything is back (just numbers of files and bytes can be a good start, only takes a few minutes, but they don't always match even in a perfect restore (say work machine has an extra temp file the backup image doesn't).
Backups take a lot of time to check properly. When I did them I did a system that worked on first test (and pulled some all-nighters getting stuff set up and testing it), worked on week-later test, and then we went to monthly checks. Not proper testing procedure I know, but I didn't have the time to be sure that everything worked perfectly. I could do that, or I could earn money. Small and new businesses have to put every moment possible into earning, as the moment you're forced to close the doors for the last time any backups or other data instantly becomes worthless. Unless you did some sellable dev stuff.
PS El Reg - love that voting no longer requires a page reload! Thanks for changing this!
".....This was a cockup waiting to happen." First rule of process design - always assume at least one user will do something wrong if you give him/her the means to do so, do not assume they will be able to over-ride the impulse to follow an incorrect direction. The management were responsible for the process documentation, therefore, IMHO, they were responsible for the failure.
".... sounds like the CTO needs to visit example.com"
That might not help as much as you think if documentation elsewhere for adding new users also uses "example.com". I.e. it might already exist...
Rule #1 about documentation is that someone somewhere will blindly follow any code or command snippets that are included.
"The issue is with the docs."
Absolutely. Firstly, who wrote the documentation and thought it was ok to include the production password?
If it was a technical author, they could be excused for assuming it was not a real production password. In which case, who gave them it? This password should be handed out on a needs only basis.
If it was one of the developers, what kind of moron are they? Even if they they have received no training at all on basic security, you'd have to be an idiot to think it was ok to record a real password in clear text on documentation.
Secondly, what are the production details doing on a document used to set up a dev environment anyway? A dev, particularly a novice one, shouldn't be anywhere near production, simply because they have no need to be.
In fact, fire everyone. The whole affair suggests rampant sloppy practice everywhere. The fact their backup restores failed just confirms that.
But I'm pretty sure which ones weren't.
> you'd have to be an idiot to think it was ok to record a real password in clear text on documentation.
It wasn't the large multinational where I cut my teeth in, where no passwords were *ever* written or even spoken. And it wasn't the one where I am part of manglement now, where our code of conduct says clear and loud that none will be disciplined for making honest mistakes¹, and we do abide by that.
¹ Our mistakes have a very real potential to kill people. Going around pointing fingers is not going to help anyone. Making sure that mistakes are open and candidly reported, and fixed promptly, is a far better use of everyone's time, and a humbling experience.
Nobody but the DBAs in charge of the production database should have had them. And each of them should have had separate accounts (because of auditing, etc.). Database owner credentials should be used only for database/schema maintenance - after the scripts have been fully tested (and at least a backup available). They should never be used for other tasks. Any application accessing the database must have at least one separate account with the only the required privileges on the required objects (which usually don't include DDL statements).
All powerful accounts (i.e. Oracle's SYS) should be used only when their privileges are absolutely necessary, and only by DBAs knowing what they're doing. Very few developers, if any, should have privileged access to the production system, and they must coordinate with DBAs if they need to operate on a production system for maintenance or diagnostic needs. DBAs should review and supervise those tasks.
But I guess they have followed the moronic script I've seen over and over (and stopping the bad habits usually encountered a lot of resistance):
1) Developers are given a privileged account when the application is first created, so they can create stuff without bothering the DBA.
2) Developers write an application that runs only with a privileged account, and doesn't allow separate database users (granting permissions to other users is a boring stuff, as writing stored procedures to control accesses to things). DBAs are happy because they have to work little.
3) The privileged account credentials are stored everywhere, including the application configuration files, unencrypted, so everybody knows them
4) The development systems becomes the production system. DBAs don't change anything because of fear something doesn't work.
5) Developers still retain access to the production system "to fix things quickly", and may even run tests in production "because replicating it is unfeasible".
6) DBAs are happy developers can work on their own so they aren't bothered.
7) Now that you have offloaded most tasks to developers, why care of backups? Developers will have a copy, for sure!
8) Then comes a fat-fingered, careless new developer....
Let me bring my IT expertise into play here. You're all wrong of course!
Obviously database passwords shouldn't be written in public documents. They should be kept on a post-it note on the screen of one of the shared PCs.
What's wrong with you people for not knowing this basic piece of security best-practice!
Apart from using 'careless' instead of 'only human' new developer ...
I can report similar (some years back now) - new member of the ops staff was instructed to setup a test database, mistyped the DB URL by one character, and because of the risible beyond belief setup of i) dev can see prod network, ii) dev hostnames differ by one character from prod hostnames, and iii) dev, test and prod DBs all shared the same passwords and user names, they wiped the entire production database by running the schema creation scripts.
The impact - this was a bank (no names but North American and generally an IT train wreck at the time), the DB data loss took out the SWIFT gateway, the bank ceased to be a bank for about six hours, and they had to notify the regulator of a serious IT incident. The length of time was due to the backups being inaccessible and no-one having a clue how to restore.
And FWIW, we'd already advised the bank that the one character difference in hostnames on an open, flat network was a suicide pact six months before.
On the plus side, the irascible, voluble Operations Manager took one look at the root cause, said it wasn't the operator's fault and went off to shout at someone on the program management team. Much respect for that move.
Where's the segregated VLAN? Anywhere with such important data and of such a size should be capable of setting up an environment where dev network logon credentials only work on the dev VLAN and so do not permit the crossing over into the production VLAN whether you know the prod db connection string or not. One account for troubleshooting prod environments (which they wouldn't have had in this case), and one for performing dev tasks. Not that difficult.
This happened on my watch in Prod Support for a certain oil company.
The dev/prod Oracle databases had the same internal password. Luckless dev connected to what they believed was their dev instance in Toad and then dropped the schema.
They then called us to advise what had happened once they realised their mistake. We restored from backup (not before they bitched about how long it took to recover) but they never should have been able to do so in the first place.
Rule Number 1: Restrict access properly to your prod environment. That means no recording random username/password combos in docs, scripts etc.
Rule Number 2: Don't share credentials across environments.
Anon for obvious reasons.
Rule Number 1: Restrict access properly to your prod environment. That means no recording random username/password combos in docs, scripts etc.
Rule Number 2: Don't share credentials across environments.
Rule 3 - never lets Devs have access to stuff. Like small children, they *will* break stuff..
Rule 3 - never lets Devs have access to stuff. Like small children, they *will* break stuff..
In a controlled environment this is good. If the Dev breaks something before the users do, it can be patched to prevent anyone else from doing the same. My unofficial CV has 'breaking things' as a skill, which dates back to using BASIC at school and being unable to resist entering '6' when asked to enter a number from 1 to 5, just to see what happened.
My CV also includes breaking things, but I was one to always stick to the rules.
> being unable to resist entering '6' when asked to enter a number from 1 to 5, just to see what happened.
So when asked to enter a number from 1 to 5 I broke into the school in the dead of the night.
"Rule 3 - never lets Devs have access to stuff. Like small children, they *will* break stuff.."
I cannot upvote this enough. Stupid t**ts that I had the misfortune of dealing with would have had a breakdown dealing with keeping the systems running. I want you to deploy xxxx. Ok who to, oh everyone in the organization. Ok how? Oh this installation program that I have written, you just need to install a uninstall a, then install b and c and remove b, that will leave a stable version runnig on their machines.
So... I'm not going to implement it, until I get a firm written procedure for how to deploy this. We also need to test it in a small department. I could have fixed it in a couple of hours, but this wasn't the first time I had to put up with them having no idea about dependencies, and I had wasted four or five days fixing similar messes of theirs, they were on better money too. I never got any recognition, cause little ever went wrong.
I was so happy when they chose the planning department to trial on. (This department had 'skilled' staff who like to setup their own 'uncontrolled' infrastructure) I had a good 3.5 months of peace, until guilt at all the wasted man hours finally got to me, and I sorted it that afternoon.
Then there is the always fun "Yes we do backup database transactions, here is your database backup, and here are your transaction logs, now you buggered up the database, rolled back and forth until you have a confused mess, so which transaction sets do you want to roll forward with?" - (Gives me a warm fuzzy feeling of joy remembering those looks of dawning horror)
Devs really are like small kids, they have NO idea of consequences, and will run away given a challenge.
You are spot on...
You need to separate prod from dev and not allow dev access to prod.
So while the Reg asks who should be fired ... there are several people involved.
1) CTO / CIO for not managing the infrastructure properly because there was no wall between dev and prod.
2) The author and owner of the doc. You should never have actual passwords, system names, etc in a written doc that gets distributed around the company. The manager too should also get a talking...
3) The developer himself.
Sure he's new to the job, however he should have been experienced enough to not to cookbook the instructions and should have made sure that they were correct. He was the one who actually did the damage.
As to getting legal involved.... if this happened in the US... it wouldn't go anywhere. As an employee, he's covered and the worst they could do was fire him. If he were a contractor... he could get sued.
We know this wasn't Scotland or the UK. (Across country? Edinburgh to Glasgow ... 40 min drive. )
I do have some sympathy for the guy... however, he should have known to ask questions if things weren't clear in the instructions.
He should chalk this up to a life lesson and be glad that his mistake didn't cost someone their life.
I maintain a small file server for a small company (about 45 computers). I use Ubuntu with Samba as the server. I have another desktop running Ubuntu with Samba on the ready and use rsync to provide a nightly copy of files from the server to this system which is a live copy of all the files served by the primary server. If the main server were to go down or hit with a virus (or ransomware), all I have to do is take the primary file server off-line, run a script on the system with the live data from the previous day, and change the IP address to make the backup system a temporary server.
Another Ubuntu system is running which provides nightly backups using back-in-time to make nightly backups for off-site storage using portable USB hard drives.
My philosophy is you can never have too many backups, so the primary server also makes hourly backups during business hours that are retained for 2 days. The backkups are then pared down to keeping one copy per day for 14 days, then one a week for a couple months, then one copy a month till drive space runs low. The drive is then removed from service to retain historical data and a new drive put in service. This set is for on-site.
The system makes it easy to restore anywhere from one to all files in a short period of time. There is currently about 90GB of user data on the server. To restore one file takes only a couple minutes to find the file from the desired backup and restore it. A full restore of user data takes about an hour.
:The system is regularly tested and a full restore had to be performed once when the Cryptolocker ransomware encrypted files the victim had access to on her computer and the server. More time was spent ensuring the ransomware had be isolated and eliminated on all computers on the network than to get the server ready.
While some may consider triple redundancy overkill, I like to be prepared in case one of the backup systems may have happened to fail the night before a restore might be needed on the server. There is always at least one backup drive stored off-site. In the case of catastrophic loss of the server (flood, fire, explosion, etc), server configuration files in the nightly backups make it easy to setup and configure a new server in about 2 hours, ready to have files then restored.
Test and test often...
"Yet it's your fault that guide is wrong?"
Actually, the story quite clearly says he *didn't* follow the document. he should have used the credentials generated by the script, instead he copied the credentials from the document.
The fact that the document contained the production credentials is an error but the implication is that he would have been ok with those.
Of course, he should have been supervised so that's also an error. should he have been sacked? Probably not. and certainly not in the way he was.