back to article Rookie almost wipes customer's entire inventory – unbeknownst to sysadmin

Welcome once again to El Reg’s weekly instalment of Who, Me?, where readers get monumental cock-ups and heart-stopping near-misses off their chests. This week, Reginald tells us a personal horror story from the ‘80s, when he worked for what was then a top five minicomputer biz. At the time, he was on site at a customer – a …

Page:

  1. Ima Ballsy
    Facepalm

    Arggggg ....

    Been there, done that, got the T-Shirt .... Ain't no fun.

    Any *nix admin worth their salt has done this ONCE at least in their lifetime ...

  2. Anonymous Coward
    Anonymous Coward

    DEC PDP11 upgrade

    I had a customer upgrade their PDP 11/23 to a 11/23+ ... the DEC engineer came in to do it and in the process re-imaged the boot disk deleting all of our applications ... so I got a phone call about 4pm.

    The customer (a hospital lab) setup a 2400 baud modem, I dialed in, logged in, and spent the next six hours logged in and talking to the customer on another line getting him to load the various tapes and RL02 disks to rebuild everything and restore their data.

    Of course I didn't get paid overtime because that was my job, but the DEC engineer stayed at the lab to see the installation completed and got his overtime.

  3. Anonymous Coward
    Anonymous Coward

    ...then there's backup stories...

    ....like the AS/400 upgrade where we were upgrading the memory and all the disk storage and the OS version...all in a weekend. Near the end of the process, we needed to get some stuff off the last tape backup.

    *

    So then we found out that NONE of the tape backups was readable!!! No one had ever tested a restore!!!

    *

    It all worked out all right in the end....but a lesson there....somewhere!

    1. PickledAardvark

      Re: ...then there's backup stories...

      "So then we found out that NONE of the tape backups was readable!"

      I've been there too on NetWare/ArcServe.

      Write-only backups are secure. You can toss the tape in a bin and nobody can steal your data...

      1. Anonymous Coward
        Anonymous Coward

        Re: ...then there's backup stories...

        ARCserve did have the verify options.. if you had the time :) We used to get lots of calls into support on that topic though, that, and tape rotation scheduling :)

    2. Anonymous Coward
      Anonymous Coward

      Re: ...then there's backup stories...

      Many years ago (pre internet times) a client phones at 5:30 Friday afternoon. It was the IT guy wanting to run through the steps involved in recovering from a backup. Their US headquarters had a hard disk fail on their accounting system. He was talking the Financial Controller through a recovery and while he knew his stuff he just wanted to double check everything.

      8pm the same night the phone rang again - how soon could I fly to the states? Only one of the backup tapes was good. The financial controller had put the sole remaining good backup tape in the drive, then popped out to get a bite to eat at 7pm because it was going to be a late night. At 7:30pm the scheduled backup process copied the corrupted database over the only remaining backup.

      Saturday was spent on the phone trying to talk them through everything I could think of.

      Sunday afternoon I was sitting in a private jet winging it's way to their US HQ. Three days of very hard work later we'd managed to recreate the accounting database from pieces of corrupted databases and log files. Another private jet ride home - this time the pilot was kind enough to tell me there was a cooler full of beer behind my seat.

    3. Anonymous Coward
      Anonymous Coward

      Re: ...then there's backup stories...

      Once was supporting a MAGIC based system, a Pick-like OS that ran on X86 PCs to give a database environment with network terminals. The server has a "massive" 40 MB disk and a (from memory) a 20 MB tape system for backup. Nice clear instructions were provided for each site on how to run the backups with a proper 1 week 2 Friday tape cycle (6 tapes in total with the Fridays being stored off-site and returned on schedule).

      So one of the machines had a disk crash and died. No problem, pulled out the spare server, set it up for that site, loaded the programs and went to the tapes to reload last nights data. Put in the relevant tape, pressed the restore, and up popped the message, "Load Tape 1". When queried, the operator said that this message had started to pop up a short while back - "Load Tape 2", and as there was not allocated "Tape 2" he just pushed the original tape back in.

      So we quickly tested each of the available tapes, all 4 week tapes were tape 2, and so was the Friday tapes. Disaster, no backup at all. After some panic calls to a data recovery specialist who opined that there was a small chance that we could possibly recover the data from the tapes, emphasizing that the chance was small but the fees weren't, we agreed to send a couple of tapes up to try this option via urgent courier. While hunting around for a courier envelope, in a drawer of the operators desk we found a solitary tape marked Monday.

      We asked the operator about it and he expressed surprise, he wondered where it had got to as he'd mislaid it and had used a new tape to replace it. He wasn't sure if it was before the system started asking for a second tape. So we shoved it in the tape unit, hit restore, and it restored. The data was a week old but at least we had somewhere to start !

      Other story is going way, way back. The first PC's that HP made with a Hard Disk inside had a little oddity that they would only boot from drive A:, so the usual setup was that the hard disk was set to be Drive A:. While I was using the machine and had setup a significant amount of data on it for a test system, I wanted to take a break and someone else came along and decided to use the temporarily unattended machine to format a floppy disk. So the placed a floppy in the drive and typed format A: {enter}. When I arrived back at the machine they were still there wondering why it was taking so long to format a floppy disk...

    4. Anonymous Coward
      Anonymous Coward

      Re: ...then there's backup stories...

      ah.. write only back-ups. Classic

  4. IJD

    PDP 11/44 (running RSX/11 IIRC) at uni shared between a dozen postgrads (including me), who needed superuser privilege to use the video hardware. One who had several accounts wanted to remove all his old files so lazily typed rm [*.*]*.*;* instead of removing one UID at a time. Everything including everyone else's files and the OS (and all system utilities) disappeared, and of course the last backup was months old. Of course the files were still physically there but no longer accessible.

    Luckily I'd been running the system debugger and it was still resident in memory (all 64k of it...) as was the printer driver, so I managed to print out the block allocation of the (20MB) hard drive on the line printer. Said user then had to use the debugger to manually reallocate every block on the disk by hand one at a time to the correct UID (including the OS) and filename. Took him all weekend...

    [no it wasn't me, but I was the defacto sysadmin when people suddenly found they hadn't got any files any more]

  5. Mike Moyle

    Welcome to the club!

    Lesson learned: NEVER decide to "clean up some old files" at 4:30 on a Friday afternoon. You WILL look for shortcuts and it WILL bite you on the ass.

    This was back in the mid-'80s, while I was using/adminning Unix boxen with a "Beginner's Guide to Unix", or some such, in hand because -- while I was the Art Department's tech-geekiest member -- I was not a programmer. Fortunately we were a beta site for the machines so;

    A -- Having called the vendor Friday to announce my stupidity, our rep was in on Monday morning before I even got there, re-installing the system and recovering the work files, and;

    B -- A short time later, the company updated the software to include a GUI for admin tasks, taking the command line out of the hands of fumble-fingered amateurs. (Gee, I wonder what inspired THAT decision...?)

    1. Olivier2553

      Re: Welcome to the club!

      "Lesson learned: NEVER decide to "clean up some old files" at 4:30 on a Friday afternoon. You WILL look for shortcuts and it WILL bite you on the ass."

      Do not do anything of some significance on Friday. At all. Any major change, big operation, etc. must be made by Thursday at the latest, so in case of cock-up, you have the Friday (plus days week-end) to repair it.

  6. John R. Macdonald

    @big_D

    IIRC a mainframe manufacturer, one of the seven dwarves at that time, did something similar. You could 'upgrade' to a faster, and more expensive, machine simply by connecting a wire.

  7. JQW

    I once wiped a large portion of a hard drive after using find with exec rm -rf {} - due to not taking into account the fact that some directories on the system had spaces in them.

  8. FuzzyWuzzys
    Happy

    Got you beat, my story begins with "My Wife's...."

    My Wife's system needed rebuilding, so I took a backup of my wife's photo library, something that has 120 years of family history and 110,000 images....

    Long story short I had to pay £10 for a partition recovery software which recovered 98% of the drive and my wife will not let me touch anything on her machine unless she has taken all her own backups first!

    My heart didn't stop my wife threaten to make it stop, permanently if those photos didn't come back and as any married man will tell you, "Hell hath no fury like a wife who's let he husband touch and break something she treasures."

    1. Chairman of the Bored

      Re: Got you beat, my story begins with "My Wife's...."

      @FuzzyWuzzys, dang.... 10 quid only. Plus flowers, honey-do's and probably a lot of subservient behavior. You got off easy!! Question becomes ... how long was the stay on the couch :(

  9. Will Godfrey Silver badge
    Angel

    Defensive typing

    I've long been in the habit of entering dangerous commands partially in reverse, so in the case of theO/Ps one I've have done:

    '-rf /*.old*'

    then gone back top the start of the line and entered the 'rm' bit.

    1. Criggie

      Re: Defensive typing

      I tend to put "echo " in front of any dodgy command.

      rm -rf $dir/

      becomes

      echo rm -rf $dir/

      and you hopefully notice that $dir is empty. Thanks to NVidia for that one.

  10. sisk

    A couple months ago on my home computer (which has several Linux distros installed and which all share a common /home because I apparently like to make life difficult for myself - and yes, that's as close to a logical reason I have for having multiple distros installed on one machine) I was going to get rid of one of the extraneous Linux installs and use the space to expand the root partition for one of the other distros. I realized I'd typed /dev/sdc2 instead of /dev/sdc3 at the same time that I verified that, yes, I wanted to delete the partition. And sdc2 is where the above mentioned shared /home lives. Doh.

    Fortunately I have a good file server and a cron job running rsync every night, so I didn't actually lose any data, but I think my heart stopped for a few seconds before I realized that.

    1. onefang

      "several Linux distros installed and which all share a common /home because I apparently like to make life difficult for myself"

      It's actually a recommended way of doing things by some distros, to the point they'll do that automatically if you don't tell it otherwise. Separate /home and /boot. The reason for /home is so you can share it between distros. The reason for /boot is that in the past you had to have that near the beginning of the drive, something that hasn't been true for a long time.

      "that's as close to a logical reason I have for having multiple distros installed on one machine"

      I have one machine with over two dozen different operating systems on it, mostly Linux. I have a micro SD card that I call "Magic Pixie Dust", with all those distros on it, so I have a variety of choice if I have to boot into something to fix some one else's computer, demonstrate something, or just offer them a choice of distro to install. I keep the master for that on the second hard drive of my test box, with the boxes usual WIndows / Linux on the first drive.

      I don't share my /home, coz different distros, and different versions of distros, have different versions of applications, some of which change the way their config files work. In particular tmux which seems to change the names of options I use with each release.

      I usually prefer the "one big partition" scheme. Otherwise you invariably run out of space on one, while having plenty of space on the others, then you need to juggle files and symlinks, or resize partitions to get things to fit once more.

  11. Kevin Fairhurst

    Came in to work one Monday to find that the Unix system was borked... on investigation it appeared that a large number of files & folders had gone missing, probably by someone doing an incorrect rm.

    Our systems were shared with our US office who supported the UK outside of our core hours (we were in from 7am to ensure trading was ready for 8am, they were available to field staff until 10pm UK time) so we suspected it was one of our US counterparts who had done it, but had no way to prove it.

    Rather than try and fix anything, they'd gone through and deleted all logs and history entries so we could never find the evidence we needed!

    Restoring the system from a recent backup brought everything back online again, as one would expect!

  12. Anonymous Coward
    Anonymous Coward

    ummm, hasn't anyone ever thought to put in a subroutine that, when the "rm" command has been typed in and enter hit, pops up with a "are you sure you want to delete everything on the system?"

    1. DavidRa

      Sure they did, but the universe invented better idiots

      Of course. However, the incompletely-experienced often choose to force bypass that configuration. For example, a lot of systems aliased rm to "rm -i" by default, which would force interactive confirmations. People would then say "UGH, I hate having to do this" and add their own customisations to their shells/profiles etc:

      unalias rm

      alias rm=rm -f

      Lo and behold, now no silly confirmations, regardless of stupidity/typos/etc.

      1. Stuart Castle Silver badge

        Re: Sure they did, but the universe invented better idiots

        Personally, I don't do stuff like that, and won't use any options on the command line to bypass checks unless I'm writing a script that needs to run unattended.

  13. J.G.Harston Silver badge

    My ISP was going out of buisness and turned off my service. Argh! I screamed - all my (recent) email is on there! Ok, we'll turn it back on for a couple of hours. Logged in and fetched all my email locally. Phew.

    Set up new account on another ISP. Configured my email client. Connected to the new ISP and the email client promptly sync'd my local files with the empty account on the new ISP. ARGGGGGGHHHH!!!!!

    On my PC I still have a big directory on the current drive containing files made up from every allocation cluster on the old drive after I imaged it and extracted all the contents, which I go through from time to time. I still have a gap of about a week in my email archive that I haven't found yet.

  14. Stuart Castle Silver badge

    I have my own story where I've done something I've regretted, but one from a friend first.

    Back in 2011, we needed an inventory system. Because it needed to integrate with various existing systems (some of which were custom designed), myself and a couple of colleagues built it from scratch, using SQL server, Java, PHP and other web technologies. We had V1 up and running, and I was adding equipment to it, using the site we'd set up for this purpose. All of a sudden, I started getting errors saying it couldn't find the database. I checked the server was up. It was. So, I logged on to SQL Server Management Studio, and couldn't find the database. I got our DBA to look at it, as he has access to the backups, which I don't. As far as we could determine, one of my colleagues had renamed the database to a full stop, which was apparently preventing the GUI showing it. The DBA got it back, and immediately locked down the database so no one apart from him and another colleague of mine (not the one who renamed the database) could make structural changes.

    My one was when I first started. We were using Windows NT 4, and Microsoft had just released SP 2. I had been testing it for weeks on my machine, and, in my defence, it had passed all the tests with flying colours. So, I confidently started installing it on staff machines. Roughly half of them failed, and had to be re-installed.

  15. Anonymous Coward
    Anonymous Coward

    When we had a Xerox Desktop Publishing system installed at the engineering company where I worked, the installer came with a huge box of 5¼ floppies to install and configure the software. We were given a list of what functions were available, and the cost of having each one installed, and we picked what we thought would be needed and were billed accordingly.

    Some time later, we decided that we needed a couple of extra functions, so a maintenance engineer came along and logged in, ticked the relevant boxes on the installation list, and logged out. I asked how this was possible, to be told that all the necessary software was actually already installed, it just needed a tick in the right box to activate it.

    About a year later, I needed another function to be switched on, so I tried to log in using the original password, but it had expired. After quite a bit of cogitation, I realised that, if I were to disconnect my workstation from the server and reset the date to the previous year, the old password would let me in, I could switch on whatever I liked, and then reset the date to the correct year and reconnect to the server.

    Over the next couple of years I switched on nearly all of the functions on my particular workstation, and no-one was any the wiser.

    Anon because what I did was a bit naughty, and I wouldn't like to be charged retrospectively for the functions I "misappropriated".

  16. steviebuk Silver badge

    My story, again...

    ....being the security minded person that I am but also no expert I was working at a place with Follow Me printing. On Ricoh devices. Looking through the server one day spotted the option "Purge print jobs on logout" Ooo, that would be a good idea to switch on. Purge them from the MFD, good security and all.

    So I set it and forgot about it. Then calls started to appear in our main 2nd line queue. "My print job has only half come out then disappeared". "I sent my print job a few times but it only partly prints". Oh shit. I realised what had happened, grabbed all the calls and quickly closed them, then fixed the issue (turned the option to purge back off).

    I hadn't put in a change request, although don't think they were enforced at the time. It turned out people would go to the MFD, swipe their card, start printing then swipe to logout before the print job had finished. Or their job was so long that the MFD would timeout their session and automatically log them off. And of course, their print jobs would then be purged.

    Oops.

    No one noticed the calls come in so I kept quiet & quickly closed them. Made up some excuse for the users :) although all those extra closed calls helped my stats for the week.

  17. OzBob

    One I heard from an "old sweat" at my govt employer

    they got a copy of DOS something (2.0 I think) from the vendor (not allowed to build it yourself in those days) and typed the"format" command after putting a floppy in the drive. However with no drive specified, "format" wiped C:. So, send PC back to vendor for re-installation.

  18. Anonymous Coward
    Anonymous Coward

    HP 4 and 8Gb DAT drives could be upgraded to 8 and 16Gb if you happened to have the right tape to put in it. The weirdest one I ever saw was when a vendor had taken components off a motherboard for a customer project. The national rail company had a bunch of servers that should have had two adaptec scsi chips on them but had been modified to only have one chip (essentially de-soldered the IC). Every time they hit a certain back-up speed Netware fell over with a kernel panic. After going on site and ending up physically opening one of the servers up I clocked the missing chip. Next day with a scsi bus analyzer I could see what was happening. Echo caused by the now modified bus. Only kicked off when we had our software running flat out. Fortunately our debug menu allowed you to limit the top speed the system would back up at... problem solved. Another good one was re-conditioned tape drives that vendors send back up with lower CRC checks. Looked identical. Returned same firmware and revisions. Was liable to making write only back ups.... grrr

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like