back to article Developer mistakenly deleted data - so thoroughly nobody could pin it on him!

Welcome to the eighth edition of "Who, me?", the column in which Reg readers confess to moments at which they messed things up but good. This week meet "Ben" who told us that "On a Friday afternoon about five years ago I was asked to complete some backup scripts before the weekend." Ben told us his employer "was too tight to …

Page:

  1. Anonymous Coward
    Anonymous Coward

    Not me, but someone else previously in my team

    There is a folklore story at my company who are a managed service provider and so deal with many clients and generally have lots of things on the go at the same time.

    A ticket came through requesting a sysprep of a machine at a client, but due to multiple windows being open and potentially some distraction, my ex-colleague sysprep'd a domain controller. No biggy in most circumstances, but this customer only had one domain controller and so were royally shafted. For some reason the backups had been failing on this domain controller and it couldn't be recovered from backup.

    Through some luck of the gods, this domain controller had been cloned accidentally a couple of weeks before and that was the only reason why they didn't lose all their domain configuration.

    So in this instance, two wrongs made a right, but he was never allowed to live it down.

    1. Adam 1

      Re: Not me, but someone else previously in my team

      Ok. So not you. Nope. Definitely someone else. Got it.

    2. Amos1

      Re: Not me, but someone else previously in my team

      Did you know that when you're in Active Directory Users and Computers and are going to delete something from the right-hand pane but accidentally are clicked in the left pane and say you're sure, it can delete the entire production OU? I know one sysadmin who now knows that and also is now far slower at his admin tasks. Until the next time...

      1. Phil W

        Re: Not me, but someone else previously in my team

        Maybe he should also learn about the "Protect from accidental deletion" tick box.

        1. J. Cook Silver badge

          Re: Not me, but someone else previously in my team

          @Phil W:

          "Maybe he should also learn about the "Protect from accidental deletion" tick box."

          I'm not quite sure when that little ticky box made it's first appearance; might have been server 2008R2. I honestly can't remember if it was in 2k3 or not; it's been quite some time since we had a 2k3 DC at [redactedCo] and frankly, I've slept since then. :D

          That little ticky box sure is a life saver, though.

      2. Youngone Silver badge

        Re: Not me, but someone else previously in my team

        I did know that, as I had to restore our domain controller one Saturday after my colleague deleted some vital top level OU.

        He would have done it himself, except for the whole getting sacked bit.

        1. Alan Brown Silver badge

          Re: Not me, but someone else previously in my team

          "He would have done it himself, except for the whole getting sacked bit."

          Which is pretty indicative of a company you don't want to be working for, unless this guy has a long history of that kind of thing or it was malicious.

  2. tip pc Silver badge

    I hope lessons where learnt and a proper backup suction put in place, but I suspect not.

    1. Anonymous Coward
      Anonymous Coward

      To hoover up the data?

      1. Kane
        Joke

        "To hoover up the data?"

        To "vacuum" up the data, I think you'll find.

        1. Herby
          Joke

          Vacuum??

          "To hoover up the data?"

          To "vacuum" up the data, I think you'll find.

          This being a UK site and all shouldn't it be "Dyson" up the data??

          1. Random Handle

            Re: Vacuum??

            >This being a UK site and all shouldn't it be "Dyson" up the data??

            ....that would mean rsync to a cheap NAS in Malaysia

    2. Evil Auditor Silver badge

      The proper backup suction that I came across some decade ago. Setting up a backup procedure just about as manually as Ben did. For checking if the procedure worked as expected they didn't want to wait until the tape was fully written. Instead to the tape they directed the data stream to /dev/null - the backup ran flawlessly, answered with a success message and was ready to be deployed.

      Problem was, no one changed the device to write to the tape. Until their auditor asked if they ever did backup restore tests.

      1. 2Nick3

        In a meeting on how we could save costs on our backups I suggested we could write the backups to /dev/null, and since we only had SLAs on the backups and nothing on restores (yeah, really - no idea how that contract got signed) we would not be violating the contract. Not only would backups run faster, we would also increase the success rate, as we would no longer need to manage a scratch tape pool. It was only slightly (in my opinion, at least) more ridiculous than some of the other suggestions being made.

        It took a few hours for a group of us (who could detect sarcasm) to convince one of managers that we really couldn't do it.

        1. Alan Brown Silver badge

          backups to /dev/null

          Has someone been reading Simon's early "Red Bucket" efforts?

  3. Timmy B

    Sat at my desk one day happily carrying out my programming duties... All of a sudden a expletive, or more exactly a stream of profound and harsh expletives, come from behind my managers monitor. He stands up ashen faced and beads of sweat already forming on his brow. In updating the software for the main customer database for the company he had reversed the source and destination databases and totally blatted the whole thing. All of the information about sales and support and leads was gone. The most recent backup month older. Thankfully we had a development backup from earlier that morning.

    Our main office is 50 miles away and this is in the days before decent internet connections and the backup is far too big for us to copy.

    The two of us make an excuse about it seems to be some kind of hardware fault on the server take the system offline by deleting a random dll. We jump in the car like batman and robin - driving in a not dissimilar way to the office with a freshly burned CD with the data.

    Later that afternoon we're fixed. Losing only a tiny amount of work that we put down to what could be a failing hard drive that we would "keep an eye on" looking like geniuses because in one day we not only fixed the hardware, updated the software, showed how willing we were to drop everying to support head office, etc.

    1. Sir Loin Of Beef

      I will admit that I did the same thing once but with the helpdesk ticketing system. I wanted to show I was more than a phone jockey so I grabbed the manual and attempted to "update" the software. Instead I ran the delete all script. Luckily we caught it before it deleted the whole database (we lost about 40%). This gave my manager an excuse to rewrite all of the categories and knowledge base so it was a backhanded win for me.

      1. werdsmith Silver badge

        I changed an ODBC DSN, checked it, double checked it, triple checked it. All good

        So I ran the process, which connected via the 32 bit DSN and promptly shafted everything.

        I experienced that special sinking feeling that is only known to IT people with enough permissions.

    2. PPK
      Facepalm

      Reversi

      A loooong time ago (probably around '93) my old MD was just setting up the small company that I later joined. One of his more technical colleagues came to his house with his PC, in order to upgrade my boss's machine to the (then) latest Win 3.1 build.

      As I recall he was using a transfer utility via serial cable to mirror his newer system onto the MD's one. Unfortunately an incorrect directional button was clicked, and several hours later he discovered he had just downgraded his own working system...

  4. CAPS LOCK

    Never edit the fstab table on a production system...

    ...don't ask me how I know...

    1. Korev Silver badge
      Coat

      Re: Never edit the fstab table on a production system...

      Maybe someone would make a fstab at guessing why...

      1. Anonymous South African Coward Bronze badge

        Re: Never edit the fstab table on a production system...

        Gots a spare fstab in the nasty pocketses?

        1. Korev Silver badge
          Coat

          Re: Never edit the fstab table on a production system...

          Yep, fstab, a /bin, /etc

          1. Androgynous Cupboard Silver badge

            Re: Never edit the fstab table on a production system...

            I would also highly recommend against (in the days before package management) upgrading system libraries by hand - specifically, upgrading the dynamic loader library. The number of dynamically linked executables on a Linux system is quite high and, unfortunately, includes the "mv" command. Best laid plans and all that.

            1. onefang

              Re: Never edit the fstab table on a production system...

              "The number of dynamically linked executables on a Linux system is quite high and, unfortunately, includes the "mv" command. Best laid plans and all that."

              This is why you keep a statically linked busybox or toybox around.

          2. This post has been deleted by its author

          3. Paper

            Re: Never edit the fstab table on a production system...

            Anyone going to ./share what happened?

    2. trollied

      Re: Never edit the fstab table on a production system...

      Oh, crikey. Reminds me of a balls-up from yesteryear, when I was a young scamp in my first job.

      Many 100s of LUNs presented to a large Sun box (E10k or E15k, I think it was). Spend a good few days carving the devices up and configuring them in Sybase.

      Then the machine was rebooted.

      I hadn't updated the vfstab.

      Took me AGSE to dd bits off each disk to work out which device was supposed to be mounted where... You live and learn!

    3. Flocke Kroes Silver badge

      Re: Never edit the fstab table on a production system...

      Unless you have physical access and know how to use your boot loader. The magic phrase you need to add to the kernel command line is: init=/bin/bash

      You can then fix /etc/fstab, change root's password and then realise none of your changes happened because your forgot to: mount -o remount,rw /

  5. Julian 8 Silver badge

    Saw similar at a previous job. A contractor had come in and written a few scripts to do some temp folder tidying on a large cluster. Unbeknown to me over the last few weekends said cluster had had some serious problems and a node had been completely vaped each weekend. - Someone else was checking these issues and it was not mentioned in our handovers.

    I was asked to look at the scripts (just windows cmd) and they all looked OK - looked.

    That weekend, down a node went again, so I took a closer look at the scripts.

    In a sandbox I copied the suspect script and ran it line by line. All ran well until it came to a delete and there was an extra space after a wildcard. So instead of deleting the intended folder, it deleted the root of the drive it was running on (and this was the system drive)

  6. Anonymous South African Coward Bronze badge

    I'll leave this here for all of you to ruminate on...

    (the original can be found at https://web.archive.org/web/20090208023917/http://justpasha.org/folk/rm.html thanks to the Wayback Machine).

    Have you ever left your terminal logged in, only to find when you came back to it that a (supposed) friend had typed rm -rf ~/* and was hovering over the keyboard with threats along the lines of "lend me a fiver 'til Thursday, or I hit return"? Undoubtedly the person in question would not have had the nerve to inflict such a trauma upon you, and was doing it in jest. So you've probably never experienced the worst of such disasters...

    It was a quiet Wednesday afternoon. Wednesday, 1st October, 15:15 BST, to be precise, when Peter, an office-mate of mine, leaned away from his terminal and said to me, "Mario, I'm having a little trouble sending mail." Knowing that msg was capable of confusing even the most capable of people, I sauntered over to his terminal to see what was wrong. A strange error message of the form (I forget the exact details) "cannot access /foo/bar for userid 147" had been issued by msg. My first thought was "Who's userid 147?; the sender of the message, the destination, or what?" So I leant over to another terminal, already logged in, and typed grep 147 /etc/passwd only to receive the response /etc/passwd: No such file or directory. Instantly, I guessed that something was amiss. This was confirmed when in response to ls /etc I got ls: not found.

    I suggested to Peter that it would be a good idea not to try anything for a while, and went off to find our system manager.

    When I arrived at his office, his door was ajar, and within ten seconds I realised what the problem was. James, our manager, was sat down, head in hands, hands between knees, as one whose world has just come to an end. Our newly-appointed system programmer, Neil, was beside him, gazing listlessly at the screen of his terminal. And at the top of the screen I spied the following lines:

    # cd

    # rm -rf *

    Oh, shit, I thought. That would just about explain it.

    I can't remember what happened in the succeeding minutes; my memory is just a blur. I do remember trying ls (again), ps, who and maybe a few other commands beside, all to no avail. The next thing I remember was being at my terminal again (a multi-window graphics terminal), and typing

    cd /

    echo *

    I owe a debt of thanks to David Korn for making echo a built-in of his shell; needless to say, /bin, together with /bin/echo, had been deleted. What transpired in the next few minutes was that /dev, /etc and /lib had also gone in their entirety; fortunately Neil had interrupted rm while it was somewhere down below /news, and /tmp, /usr and /users were all untouched.

    Meanwhile James had made for our tape cupboard and had retrieved what claimed to be a dump tape of the root filesystem, taken four weeks earlier. The pressing question was, "How do we recover the contents of the tape?". Not only had we lost /etc/restore, but all of the device entries for the tape deck had vanished. And where does mknod live? You guessed it, /etc. How about recovery across Ethernet of any of this from another VAX? Well, /bin/tar had gone, and thoughtfully the Berkeley people had put rcp in /bin in the 4.3 distribution. What's more, none of the Ether stuff wanted to know without /etc/hosts at least. We found a version of cpio in /usr/local, but that was unlikely to do us any good without a tape deck.

    Alternatively, we could get the boot tape out and rebuild the root filesystem, but neither James nor Neil had done that before, and we weren't sure that the first thing to happen would be that the whole disk would be re-formatted, losing all our user files. (We take dumps of the user files every Thursday; by Murphy's Law this had to happen on a Wednesday). Another solution might be to borrow a disk from another VAX, boot off that, and tidy up later, but that would have entailed calling the DEC engineer out, at the very least. We had a number of users in the final throes of writing up PhD theses and the loss of a maybe a weeks' work (not to mention the machine down time) was unthinkable.

    So, what to do? The next idea was to write a program to make a device descriptor for the tape deck, but we all know where cc, as and ld live. Or maybe make skeletal entries for /etc/passwd, /etc/hosts and so on, so that /usr/bin/ftp would work. By sheer luck, I had a gnu emacs still running in one of my windows, which we could use to create passwd, etc., but the first step was to create a directory to put them in. Of course /bin/mkdir had gone, and so had /bin/mv, so we couldn't rename /tmp to /etc. However, this looked like a reasonable line of attack.

    By now we had been joined by Alasdair, our resident UNIX guru, and as luck would have it, someone who knows VAX assembler. So our plan became this: write a program in assembler which would either rename /tmp to /etc, or make /etc, assemble it on another VAX, uuencode it, type in the uuencoded file using my gnu, uudecode it (some bright spark had thought to put uudecode in /usr/bin), run it, and hey presto, it would all be plain sailing from there. By yet another miracle of good fortune, the terminal from which the damage had been done was still su'd to root (su is in /bin, remember?), so at least we stood a chance of all this working.

    Off we set on our merry way, and within only an hour we had managed to concoct the dozen or so lines of assembler to create /etc. The stripped binary was only 76 bytes long, so we converted it to hex (slightly more readable than the output of uuencode), and typed it in using my editor. If any of you ever have the same problem, here's the hex for future reference:

    070100002c000000000000000000000000000000000000000000000000000000 0000dd8fff010000dd8f27000000fb02ef07000000fb01ef070000000000bc8f 8800040000bc012f65746300

    I had a handy program around (doesn't everybody?) for converting ASCII hex to binary, and the output of /usr/bin/sum tallied with our original binary. But hang on - how do you set execute permission without /bin/chmod? A few seconds thought (which as usual, lasted a couple of minutes) suggested that we write the binary on top of an already existing binary, owned by me... problem solved.

    So along we trotted to the terminal with the root login, carefully remembered to set the umask to 0 (so that I could create files in it using my gnu), and ran the binary. So now we had a /etc, writable by all. From there it was but a few easy steps to creating passwd, hosts, services, protocols, (etc), and then ftp was willing to play ball. Then we recovered the contents of /bin across the ether (it's amazing how much you come to miss ls after just a few, short hours), and selected files from /etc. The key file was /etc/rrestore, with which we recovered /dev from the dump tape, and the rest is history.

    Now, you're asking yourself (as I am), what's the moral of this story? Well, for one thing, you must always remember the immortal words, DON'T PANIC. Our initial reaction was to reboot the machine and try everything as single user, but it's unlikely it would have come up without /etc/init and /bin/sh. Rational thought saved us from this one.

    The next thing to remember is that UNIX tools really can be put to unusual purposes. Even without my gnuemacs, we could have survived by using, say, /usr/bin/grep as a substitute for /bin/cat.

    And the final thing is, it's amazing how much of the system you can delete without it falling apart completely. Apart from the fact that nobody could login (/bin/login?), and most of the useful commands had gone, everything else seemed normal. Of course, some things can't stand life without say /etc/termcap, or /dev/kmem, or /etc/utmp, but by and large it all hangs together.

    I shall leave you with this question: if you were placed in the same situation, and had the presence of mind that always comes with hindsight, could you have got out of it in a simpler or easier way?

    1. Nick Kew

      I shall leave you with this question: if you were placed in the same situation, and had the presence of mind that always comes with hindsight, could you have got out of it in a simpler or easier way?

      Yes. Take up a new career writing IT suspense stories. That one looks massively TL;DR, but had me gripped!

    2. MJB7

      DON'T PANIC

      "if you were placed in the same situation, and had the presence of mind that always comes with hindsight, could you have got out of it in a simpler or easier way?"

      /usr/bin/python

      But of course, that wouldn't have worked when people were running Unix on VAX.

    3. Alan J. Wylie

      That was heroic and ingenious.

      My rescue mission to get the system back to normal after someone had typed

      chmod 444 /bin/*
      involved an 8" floppy and driving from Keighley to Peterborough (about 140 miles) and back.

      One moral of the story, which I am still trying to instil into my cow-orkers 30 years later, is use the symbolic modes to add or subtract explicit permissions.

      And for today:

      X-Clacks-Overhead: GNU Terry Pratchett

      1. Aladdin Sane

        AT LAST, SIR TERRY, WE MUST WALK TOGETHER.

        Terry took Death’s arm and followed him through the doors and on to the black desert under the endless night.

        1. Sir Runcible Spoon

          re:chmod 444

          Is there any justification for chmod to be made read-only..ever?

          If not, then the code should automatically exclude itself from working on it's own executable, imho :)

  7. Anonymous Coward
    Anonymous Coward

    Code repository wiped by Linux "guru"

    Circa 2004 - starting to work for a small startup, which was still "versioning" using folders. I ask to start to use at least something free like CVS (SVN was just born back then, and no Git, etc.). Linux "guru" offers to take care of it.

    After a few months the CVS server has issues - Linux "guru" discovers the root partition is full, and cleans it - wiping the whole CVS repository which was created there instead than a better place. Also, the Linux "guru" didn't setup a backup. Luckily developers have a copy of code on their systems, repository can be rebuilt, it's version 1.0 of the software, no complex branches and tags, only history is lost (plus any trust I had in the Linux "guru"). Management finds business is still afloat, no reprimand.

    Anyway, for a while, when something didn't work, it was common to say "X deleted it!" - i.e. "Can't reach Google, X deleted it!"

  8. Alan J. Wylie

    Two years ago

    Valve Steam CLEANS Linux PCs (if you're not careful)

    Dodgy shell script triggers classic rm -rf /

    rm -rf "$STEAMROOT/"*

    But STEAMROOT had not been set

    1. Anonymous Coward
      Anonymous Coward

      Re: Two years ago

      Yes, we has a QA system with such a bug that was left running tests over a weekend. It had all the user directories NFS-mounted as well. "Fortunately" the script ran as root, so was "nobody" on the NFS shares, but it still managed to delete many world-writable files. It was strange, the first thing we noticed on Monday moring were complaints from people about strange things happening, files vanishing, etc. The penny dropped when we realized the complaints were spreading alphabetically by username...

      Hurrah for an admin with (a) the wit to immediately bring down the home directory server and (b) working and valid backups. ZFS snapshots helped to restore the last few changes, so little was actually lost. The QA person responsible survived too.

      1. Sir Runcible Spoon
        Facepalm

        Re: Two years ago

        Why the hell does rm not return an error when an argument is empty? Surely it's an obvious shortcut to wiping the entire drive if the entry is blank?

        1. Phil O'Sophical Silver badge

          Re: Two years ago

          The argument wasn't empty. "${XXX}/" becomes "/" when XXX is empty. It's poor coding, rm is doing just what it was asked to do.

        2. Adrian 4

          Re: Two years ago

          It does, but that argument isn't empty due to lack of programmer thought. If $STEAMROOT is empty, the argument expands to just "/*".

          1. Alan J. Wylie

            Re: Two years ago

            $ echo "rm -rf $xxx/*"

            rm -rf /*

            $ set -o nounset

            $ echo "rm -rf $xxx/*"

            bash: xxx: unbound variable

            $

            1. really_adf

              Re: Two years ago

              $ set -o nounset

              $ echo "rm -rf $xxx/*"

              bash: xxx: unbound variable

              Or, in case you think nounset is set, but you're wrong:

              echo "rm -rf ${xxx:?}/*"

        3. David Nash Silver badge

          Re: Two years ago

          It's not blank/empty, it's "/".

        4. onefang

          Re: Two years ago

          "Why the hell does rm not return an error when an argument is empty?"

          I'm still wondering why rm returns an error when the thing you are trying to delete doesn't exist. If it had existed, it would no longer exist afterwards, so the result is the same.

          1. Doctor Syntax Silver badge

            Re: Two years ago

            "I'm still wondering why rm returns an error when the thing you are trying to delete doesn't exist."

            Your intention: to delete a file called 0nefile.

            You type in: rm Onefile

            On completion of rm, 0nefile still exists because you didn't tell rm to remove it. If rm doesn't return an error you're no wiser to this unless you then run ls. Wouldn't it be handy if rm gave you some feedback to tell you you'd typed in an incorrect filename?

        5. Long John Brass

          Re: Two years ago

          Why the hell does rm not return an error when an argument is empty? Surely it's an obvious shortcut to wiping the entire drive if the entry is blank?

          #!/bin/bash

          set -euo pipefail

  9. Dave K

    Penny pinching...

    It never ceases to amaze me how companies have documents, systems and other intellectual property worth millions, yet don't want to spend a comparatively paltry amount of money to ensure that information is properly and securely backed up and protected. Many don't learn until something like this happens...

    1. Anonymous Coward
      Anonymous Coward

      Re: Penny pinching...

      I work for a company that took the opposite approach - a huge, robust, multi-terabyte daily/monthly/yearly rolling backup system that significantly increased the (already impressive) storage costs.

      Back in the day I was working tech support for one of the Actuarial teams, and our support manager wandered over to me with a grin on his face. "You know how you raise backup restore requests occasionally?"

      "Yes" say I, not going into the breakdown of how often it's me who deleted things rather than the users

      "Five times in the last two years, out of 8 total."

      "Huh. I would have thought we do more than that."

      "Me too, given the company paid nearly a quarter million for those two years of backups. Per ticket, each of those backup restores cost around thirty grand."

      I just about managed a nervous laugh before he assured me he wasn't going to put it like that for senior management...

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like