back to article Twitter titsup: Our failover was actually just FAIL ALL OVER

Twitter fell offline last night for several hours because - the company has now confirmed - redundancy in the micro-blogging site's data centres failed to kick in. The result was a catastrophic system collapse, Twitter's engineering veep Mazen Rawashdeh explained: The cause of today’s outage came from within our data centers …

COMMENTS

This topic is closed for new posts.
  1. Anonymous Coward
    Anonymous Coward

    I can imagine

    a few years ago, had a database go offline for 2 hours. Despite being clustered over 2 servers. When I chased, the hosting company were forced to admit that they had just found out that they had a common power bus in the data centre - meaning both live and standby servers had been taken out when they had a power failure.

    They never did explain what my £2K/month was paying for ....

    1. BigFire

      Re: I can imagine

      booze and hookers? That's the usual explanation for embezzler.

      1. Anonymous Coward
        Anonymous Coward

        Re: I can imagine

        Or a Lamborghini*.

        * - If you live in Atlanta, you know which data center this is.

    2. Anonymous Coward
      Anonymous Coward

      Only two hour?

      Only two hours of relief from the inane crap that gets put on to twitter.

      Please make it a lifetime of relief from the self important, uninteresting rubbish.

      1. AdamWill

        Re: Only two hour?

        You could just not read it.

  2. IT Hack
    Pint

    What utter tosh! If you are serious about BCS/DR you plan for worse case scenarios...not that 'one system might fail'.

    Example - if the entire DC went you have a secondary to take the load. That secondary would be some distance from the primary in case there was an incident covering a good size chunk of real estate. And if you have multiple DC's you have multiple recovery sites.

    Unless of course the bean counters looked at the cost, decided that savings could be made (because of course bean counters are tech savvy) via a half arsed implementation requiring lots of hands on work to flip the bit. Actually it would not surprise me if their redundant systems are also located in the same DC. More cost savings innit!

    Cost savings. huh. Your entire freakn business relies on data centres and you can't be arsed to have a decent BCS/DR solution.

    I knew there was another reason I don't use twatter.

    Pint raised to those twatter support staff that were under the kosh to sort things out because management are a bunch of numpty wankers.

    1. JDX Gold badge

      Uptime always has to be balanced against cost. Twitter (you seem to have a spelling problem) might have decided the remote possibility of a double failure was better than the cost of a more redundant system since it's not a website which actually matters.

      Of course an arrogant techy arsehole wouldn't know about things like business costs, only about insulting people they think they are superior to.

      1. IT Hack
        Pint

        Indeed uptime is balanced against cost. It is also dependant upon the need of the business to ensure that any outage does not, for example, cost the company dear in terms of breaches of contract, reputation and of course compliance to regulatory mechanisms.

        Your point (oh by the way I am more than capable of spelling. I use the term twatter because it is derisory. Anyone with half a braincell would be able to discern this) is actually moot as it does seem that people have an expectation that twatter is available. So actually it does rather seem your point of it not actually mattering is quite facile and incorrect.

        This 'arrogant techy arsehole' actually delivers quality value propositions from processes to tin in the datacentre. I have worked with people who are to notch and I've worked with more people who are utterly useless who are usually management (the buck has to stop somewhere) and indeed techies (I use the term advisably for these jokers) who I would not even trust with a disconnected keyboard.

        I am glad to see that you are happy with mediocrity. I am not. At a wild guess you work in government or some other organisation that does not require you to ensure that the needs of the business are met but are rather more happy to sound off on subjects you have no clue of.

        Now lets have a pint :)

        1. Ben 50
          WTF?

          letmein

          @IT Hack

          I suspect he understood you deliberately misspelled it. It was sarcasm. Anybody with half a braincell ought to be able to discern this.

          Twitter went down for a couple of hours. The world didn't end. Nobody died. Nobody really cares. This means that the business decision (if it was one) to set the slider somewhere towards the "chancy" end of "Utterly Impossible to Take Down" (law of diminishing returns anybody) and "It'll all be fine, don't worry about it" wasn't actually SO awful. They carry on making money, running a business, and can continue to grow.

          While we're on sweeping generalisations, I'm guessing you've never had your own startup which you've steered to success - by finding compromises between what you should do, and what you can actually afford, have have resources for.

  3. Anonymous Coward
    Anonymous Coward

    Saw the frontpage yesterday.

    "Twitter is currently down for <%= reason %>.

    We expect to be back in <%= deadline %>"

    I had to stop and wonder if I'd started moonlighting for twitter and forgot.

  4. Big_Ted
    Trollface

    This is what you get plugging 2 data centers into the same 13 amp socket.

  5. Gary F
    WTF?

    Have they not heard of load balancing?

    Sounds like they're doing it all wrong if they don't have any "hot" redundancy. Our infrastructure is spread over multiple servers and if any server (web, database, file, mail, etc) goes down it doesn't matter because the other servers pick up the load. There is no single point of failure. Why the heck aren't Twitter at that level yet? Lack of investment? Or lack of knowledge?

    1. Dave Perry
      FAIL

      Re: Have they not heard of load balancing?

      Not lack of investment overall I suspect. Lack of the CORRECT investment maybe/usually.

    2. walatam
      Headmaster

      Re: Have they not heard of load balancing?

      <pendantryAlert>

      There are very rarely no single points of failure. If the power or communications to your data centre(s) has a single provider or a common provider there is a single point. To get true redundancy you would need to isolate every single component in the infrastructure and, perhaps, have a separate data centre on a different continent (because you might need to consider if your main and backup centres are on a single or connected fault line - the building and ground are points of potential failure). Of course, you would then need to understand how the communications links are provisioned to ensure that a shark could not remove your redundancy if it fancied a bit of cabling ...

      I now this is extreme pedantry but I get a bit peeved when I am asked if we can remove all single points of failure. As ever, there is an acceptable level of risk and an acceptable level of cost to mitigate risk.

      </pedantryAlert>

      1. IT Hack

        Re: Have they not heard of load balancing?

        This. Acceptable level risk and cost to mitigate the risk.

  6. Smelly Socks
    Mushroom

    whodunnit

    It was the cleaner wot did it. She pulled out the power from the twitter web server to vacuum behind it.

  7. Craig 12

    I'm on twitter, but didn't even notice it was ever down. I guess I must still have a teeny bit of life left after all!

  8. TeeCee Gold badge
    Meh

    Twitter down?

    <Battery Sergeant-Major Williams>

    Oh dear, how sad, never mind.

    </BSM Williams>

    1. Phil O'Sophical Silver badge
      Headmaster

      Re: Twitter down?

      Oh dear, what a pity, never mind.

  9. nematoad
    Unhappy

    "Now - back to making the service even better and more stable than ever,"

    Won't have do much then by past experience.

  10. The Man Who Fell To Earth Silver badge
    FAIL

    And this matters, why?

    So Twitter went down for a couple of hours, and the Twits who use it had no Tweets. It's not as if Twitter provides a service that actually matters.

  11. VED

    Great News!

    It really is fabulous news! Quite expected, by the way!!

  12. nuked
    Black Helicopters

    foil hat

    They would have to be dangerously inept for this explanation to even approach plausibility.

  13. Anonymous Coward
    Anonymous Coward

    Of course it matters.....

    ......that's 2 hours when the BBC didn't know what the news was!

This topic is closed for new posts.

Other stories you might like