back to article Microsoft cloud evaporated by one busted file

A corrupted file in Microsoft's DNS services brought down its cloud across the world, the software giant has revealed. In a dramatic failure, Office 365 and Windows Live services including Hotmail and SkyDrive fell over for more than three hours earlier this month, causing further embarrassment for Redmond. No customer data …

COMMENTS

This topic is closed for new posts.
  1. Anonymous Coward
    Anonymous Coward

    Blaming it on their F5s?

    I call bullshit. I've seen F5s used for years without problems - the only place that seems to have consistent trouble with them is MSO.

    1. Anonymous Coward
      Anonymous Coward

      Err?

      It doesn't say F5 anywhere in the article or the linked blog post, I call knee jerk.

      1. Anonymous Coward
        Anonymous Coward

        It doesn't say it

        ...but that's what they use ; )

        As often as they have trouble with them (these BPOS-S outages aren't the half of it - for a while we were averaging a load balancer issue of one sort or another every 2-3 months for our dedicated environment) the only reasonable explanation is operator error.

  2. Anonymous Coward
    Anonymous Coward

    "further hardening the DNS service to improve its overall redundancy and fall-over capability"

    I don't think they need to improve the fall-over capability - that part certainly seems to be working well.

  3. Wize

    Keep it all on the cloud

    Access to your data anywhere

    Then suddenly - No access from anywhere.

    1. BorkedAgain
      Windows

      Most cloud professionals* would add the proviso: "...but probably best not *that* cloud..."

      There are a few companies out there that seem to know their stuff; enviable uptime and reliability. That's never really been Microsoft's strong point, has it?

      *Cue luddite hordes with their "cloud professional? Tautology, that!" cleverness...

      1. BorkedAgain
        Facepalm

        D'oh.

        Logic 101.

        I meant contradiction, not tautology. I feel such a fool...

        1. Wize

          I would say that most profesionals would not keep their data on any cloud...

          ...other than for backup.

          Anyone's server can go down. Even your own. But you can do something about your own server. You don't want to be at the hands of someone not entirely interested in your companies profits, only their own liability clause.

  4. jake Silver badge

    Numpties.

    Remind me again why I should trust a company with centralized control of my data ... Especially when that company spent decades trying to move control of the personal desktop from mainframe data centers to the personal computer?

    No, thank you. I'll keep it in-house. For values of "in-house" that include a couple continents. Honestly, it's not all that hard to roll your own.

  5. Anonymous Coward
    Anonymous Coward

    Let the 'Professionals' handle it

    I can't tell you how many times 'load-balancing devices in the DNS service respond to a malformed input string' give me problems with my desktop computer. What a relief I can depend on others to fix it now. Then again when 'load-balancing devices in the DNS service respond to a malformed input string' on my network; it has never brought the entire MS online services product suite down across the world for everyone else. Go Cloud!

  6. Field Marshal Von Krakenfart
    FAIL

    Fixed it for you!

    So what is he is saying is that some idiot added crap to the configuration file and it got propagated across network

    by two "rare conditions" = Muppet didn't have a second person to eyeball his/her handiwork before committing the change to the configuration file, in addition they altered the configuration file directly rather than coping the one from the test machine.

    Really, this is IT 101....

    1. rciafardone
      Boffin

      Well, not exactly.

      Not to defend MS, God's forbit, but as i read it, it seams to say that the configuration file itself was OK, problem was that somehow it got corrupted on transition, and the thingy responsible for taking care of such situations failed miserably too.

  7. Anonymous Coward
    Anonymous Coward

    Hmmm

    MS things becoming unresponsive, crashing or failing to work. Where have I heard that (several dozen times) before?

  8. NoneSuch Silver badge
    FAIL

    Having seen a 23K MS Word 2003 DLL take down an Exchange server when it failed, I can believe this.

    This would be the downside of making everything inter-operable.

  9. scifidale
    Mushroom

    BOOM and theres the reason I wont host any company data in "the cloud". If theres downtime to be had I want to be the one instigating or fixing it!

  10. steward
    Alert

    I'm glad to know...

    that Windows 7 didn't get any less attention than the cloud did in keeping vital files accurate.

  11. Uncle Siggy
    Megaphone

    Exterminate! EXTERMINATE!

    "the software was unable to parse an incorrectly constructed line in the configuration file"

    The above translates to "one of our engineers fat fingered it"

    Ahh, the vagaries of human error.

    1. ps2os2
      Thumb Down

      re Exterminate! EXTERMINATE!

      Well Its partially human.... The so called file should have a parser to catch errors before they sent out to various other servers no?

      That should eliminate any human issue. No whether there is a check at each server to check for validation issues is another possibility. Its called check and recheck and then do a checksum.

  12. Gareth Gouldstone
    Unhappy

    Why do they call it a cloud?

    'cos it's light and fluffy and insubstantial, I guess

    1. Darryl

      ...and occasionally can cause major catastrophies that no-one can control or predict.

      1. rciafardone
        Pint

        BINGO!!!

        nuff said.

        The beer is on me.

  13. Anonymous Coward
    Anonymous Coward

    Now Apple begins to shake in theirs boots, guys hurry up with the data centre, Billy's network crashed!

  14. Anonymous Coward
    Anonymous Coward

    Such an important function...

    ... such overlooked little services in a basket.

    They did use to run their four DNS servers in the same subnet, didn't they? Oh and they got their all-important everything-depends-on-this sso domain suspended for non-payment, too. Why companies feel they need to sprawl across dozens of domains, all interdependent, is a little beyond me. But maybe reasons why or why not are just a little beyond them. They're certainly not the only tech giants to bugger this one up regularly. As self-proclaimed world improvers employing supposedly the worlds finest tech heads and with plenty of resources to fix it all up neat and tidy, their antics do seem a bit pathetic, however.

  15. Tomato42
    Windows

    "Rare conditions"

    I seem to be hitting "rare conditions" daily as far as Microsoft software is concerned...

    No wonder MS themselves are hitting them once a fortnight or so.

  16. Anonymous Coward
    Anonymous Coward

    How can you do this in MS DNS services?

    Or are they running bind9 on Red Hat Linux?

  17. Anonymous Coward
    Anonymous Coward

    I worked as an...

    ...IT Service Management consultant on contract at MS a year or so ago. I quickly realized that their infrastructure management skill levels and practices were abysmal. I told them what they needed to do and got out as soon as decently possible - I didn't want to be associated with such a crowd of no-hopers. It seems nothing has changed.

  18. N2

    Updates

    "A tool that helps balance network traffic was being updated and the update did not work correctly...

    Taste your own medicine MS, So now you know just how bloody frustrating it is when your updates dont work correctly

    & did the "helpline" assistant go "ooh, I think you'll have to buy another license for that"?

  19. TheRegistrar
    Go

    They couldn't pay us to use their cloud

    But funnily enough they do.

    Higher ROI with service credits than shares.

    At this rate of failures, we're be using them for free.

  20. Anonymous Coward
    Anonymous Coward

    Huh....epic fail, to be sure, but I have a Hotmail account (foisted upon me against my will by a higher educational institute which shall soon give me a fancy piece of paper that I'll put in a frame and reference on a resume but otherwise never think of again) and never noticed the outage. Then again, I only reluctantly use that account.

  21. Anonymous Coward
    Anonymous Coward

    to err is human...

    ...but computers are excellent amplifiers. They wouldn't be the first outfit to fall victim to a self-inflicted DDoS. I think there's must be an axiom about resilient systems in here somewhere.

    While the number of single points of failure (SPF) is inversely proportional to the number of redundant features, SPF can only approach (but never reach) a lower limit of 1.

  22. Herby

    Wasn't the idea of "personal computers"...

    To be personal, and NOT connected to a central point (Mainframe) of failure.

  23. PeterM42
    FAIL

    Every CLOUD has a silver lining............

    ..................for those wonderful companies who promise everything and deliver everything, INCLUDING outages over which you have little or no control.

  24. Anonymous Coward
    Anonymous Coward

    So is this the Blue Sky of Death?

    Microsoft can now make millions of people scream - ALL AT THE SAME TIME!

This topic is closed for new posts.

Other stories you might like