back to article How a tiny leap-day miscalculation trashed Microsoft Azure

As soon as Microsoft's cloudy platform Azure crashed to Earth, and stayed there for eight hours, on 29 February, every developer who has ever had to handle dates immediately figured it was a leap-day bug. Now the software biz behemoth has put its hands up and admitted in a detailed dissection of the blunder how a calendar …

COMMENTS

This topic is closed for new posts.
  1. Bob Vistakin
    FAIL

    Even a £5 watch gets the leap year right!

    Azure. Clippy. Metro. Bob. Vista. Kin.

    1. Anonymous Coward
      Anonymous Coward

      Re: Even a £5 watch gets the leap year right!

      Wow, bob you've changed your tune, you usually love MS, oh hang on...

      This is a massive balls up, but it's one that deals with cryptographically signed certificates which are used to secure an information channel between distributed virtual and physical systems. This is not a £5 digital watch, of which BTW many got the 2000 leap year wrong.

      1. ElReg!comments!Pierre
        WTF?

        @AC

        I don't care who did the mistake or who is commenting on it, it is a pretty huge blunder, and a stupid one, too. One of the things that could have been, and should have been, predicted and avoided. From one of the biggest software vendors in the world, it does look pretty amateurish.

        1. Anonymous Coward
          Anonymous Coward

          Re: @AC

          Pierre, did you miss the bit where I said "This is a massive balls up"?

          The point I was making is that while this is a massive balls up, it is not as simple as was suggested by OP who is a perennial MS basher.

          Time is complex, especially on globally distributed systems, it should have been caught, a lot of people think time as simple, it's not.

          1. Bob Vistakin
            Facepalm

            Re: @AC

            Watching shills squirm and blame everyone but themselves after being caught out red handed is a marvel to behold. Why, its almost as entertaining as seeing them try to deny Microsoft’s Bing uses Google search results—and denies it: http://goo.gl/Bi0JH

            1. Anonymous Coward
              Anonymous Coward

              Re: @AC

              Again with the reading comprehension skills:

              This was a massive balls up, but is significantly more complex than you initially represented it. This hardly makes me a shill, you however seem to be a big straw man.

              1. Richard 12 Silver badge
                Mushroom

                Re: @AC

                No it wasn't.

                It was a total and utter **** up that is only possible if you genuinely have no idea what you are doing.

                The reason is simple: This failure is only possible if you're processing the date as three independent numbers.

                Listen very carefully Microsoft, I will scream this into your ear only once:

                DATES ARE NOT THREE NUMBERS.

                DATES ARE NOT TEXT.

                A datetimeis a number of intervals after an epoch. Never anything else.

                Feel free to pick your interval (either days or seconds would be sensible in this case) and your epoch, but doing anything else is sheer insanity that should result in instant termination because no programmer working with dates in any capacity should be that ****ing stupid.

                I've known this since I was 12. Yes, this is quite literally a childish blunder.

                The worst part is that you have to deliberately make this mistake these days, because every single modern framework comes with a Date or DateTime object that handles it for you. (Though 1900 and 2100 might be a problem in some.)

                Heck, even Excel handles it!

                1. Elmer Phud

                  Re: Excel

                  Excel used to have problems handling negative time - unless you changed to the date and time format used for Macs.

              2. Bob Vistakin
                Facepalm

                Re: @AC

                You're making my point for me - please continue, it's really entertaining. What you're saying is *Microsoft* alone finds date computation "significantly more complex". Linux doesn't. £5 watches don't. All the other posters in this thread show exactly why, too.

                1. Anonymous Coward
                  Anonymous Coward

                  Re: @AC

                  Yeah Bob, all the other posters think something is simple, so it must be... I however, have designed and implemented a Mainframe to desktop global time synchronisation service for a FTSE100 corporation, which synced z/OS, Tandem, AS/400, various UNIXes, Linux, physical access systems, Windows server and desktops. Let me assure you time is not simple. It is not unheard of for major corporations to have change freezes around daylight saving changes, for example, because the risk is of screw up is so high.

                  And no, crappy £5 digital watches don't handle leap years properly, no matter how many times you say they do..

                  1. Anonymous Coward
                    Anonymous Coward

                    Re: @AC

                    > I however, have designed and implemented a Mainframe to desktop global time synchronisation service for a FTSE100 corporation,

                    Err... so have I, on 400+ servers (mixed OS'es) located in over 30 different sites. It was trivial and it is called NTP. The most difficult part was ensuring the relevant UDP port was allowed by the firewalls and that was a network problem not a time problem.

                    Oh yeah, and a couple of the servers where running a version of solaris that had to be patched because a dodgy NTP service let the time drift out of sync.

                    1. Vic

                      Re: @AC

                      > The most difficult part was ensuring the relevant UDP port was allowed

                      To be fair, I have encountered a situation where it would be a good idea not to permit code changes across a DST change.

                      Many years ago, I inherited a project that used Visual SourceSafe as its revision control system. I found an interesting feature. If two users committed the same file, the order of commits on the server would not be the order in which they were received - it would be according to the timestamp placed on the file by the *client* machine doing the commit. I had one PC with a bit of a clock drift that kept rolling back other people's changes...

                      I have no idea if this has been fixed - I haven't used that product since then, and I have no intention to do so in the future.

                      Vic.

                  2. Bob Vistakin
                    Pint

                    Re: @AC

                    Time to pull up an armchair, crack open a sixpack and enjoy the entertainment this fool is giving everyone by digging his hole ever deeper.

                    1. Anonymous Coward
                      Anonymous Coward

                      Re: @AC

                      Yeah NTP is simple, I'm a fool, that's all there is to time sync.

                      In the words of Ben Goldacre: I think you'll find it's a bit more complicated than that.

                      1. Anonymous Coward
                        Anonymous Coward

                        Re: @AC

                        > Yeah NTP is simple,

                        Yes it is. The concept is simple, the configuration is simple, the implementation is simple, securing it with key exchanges is simple, starting and stopping it is simple, ensuring it skews time instead of steps it is simple, monitoring it is simple, setting up the stratum zero clocks is simple (okay that can be complicated).

                        The bureaucracy involved in deploying it to 400+ servers is not simple, but that is bureaucracy and not the technical aspects.

                        1. Vic

                          Re: @AC

                          > The bureaucracy involved in deploying it to 400+ servers is not simple

                          It is if you use puppet or similar.

                          Vic.

                          1. Anonymous Coward
                            Anonymous Coward

                            Re: Vic

                            > It is if you use puppet or similar.

                            You are joking? How would puppet help with the bureaucracy? Do you even know what bureaucracy means?

                            The bureaucracy means getting the owners of the various platforms, who are usually PHBs without a frigging clue, to approve the change request to either give you access to their systems or get one of their own people to follow the instructions on the idiot sheet you will provide them with. Of course, the PHB will often get the department idiot, whose shoes have Velcro straps because he can not tie his shoelaces, to implement the change (who would have thought that you would have to explicitly state in the idiot sheet that copy the file does mean print it out and photocopy it). Then six weeks later you have to attend a critical incident phone conference because their server crashed for the eighth time this year and this one "must be because of your change".

                            So no, puppet wont help because bureaucracy means dealing with living breathing people and not some multi-platform configuration system.

          2. Richard 12 Silver badge
            FAIL

            Re: @AC

            Time is actually very easy:

            Store and process in UTC.

            Displaying time to the user and parsing user input is harder, but once you're always storing and processing in UTC it is no longer critical to the operation of the machine.

            I've long since lost count of the number of failures caused by storing and processing in local time.

            Local time changes.

          3. Anonymous Coward
            Anonymous Coward

            Re: @AC

            Time isn't complex, you just need to decide on a sensible way of counting it. Microsoft made a huge mistake by using "local" time instead of UTC. Every sensible system uses UTC, and hence this works without problems:

            $ date --date="29 feb 2012 +1 year"

            Fri Mar 1 00:00:00 CET 2013

  2. Tom 38

    So, what MS are telling is us that their programmers use their APIs like this (pseduo code):

    mydate = date.today()

    mydate.year += 1

    instead of this:

    mydate = date.today()

    mydate += delta(years=1)

    Awesome. Makes you think what other shitnuggets Azure has yet to shake free.

    1. Steve Knox

      Your pseudocode seems to imply that they used a date object, which I doubt. Since a date object is usually represented internally as a count (usually in milliseconds) from an epoch, and adding to the year property simply increments the core value by the correct number of milliseconds or whatever unit, it would not be dependent upon calendar date, and there would not likely be a problem.

      More likely, the date was stored as a calendar date in integer or text format, and they manipulated the year portion of that less intelligent data type directly.

      That fail for an integer format would look something like this:

      intValidDate = getCertificateDate()

      /* Certificate Date is store as an integer in YYYYMMDD format, so all we have do to is... */

      intExpireDate = intValidDate + 10000

      If text, they probably had a delimiter like "/" and parsed the pieces into integers, added 1 to the year, and concatenated them back to text.

      Either way, this is exactly why you should use a well written date object rather than try a shortcut.

      1. ElReg!comments!Pierre

        date object, yes.

        Or, failing that, at the very least a check for leap years.

      2. Kanhef
        FAIL

        Even if dates are stored in a discrete year/month/day format, a competent programmer would never have let this happen. Any function that creates or modifies such a date should normalize it into a valid form. (For example, a user should be able to add 60 days to a date and get the correct result.) This is not difficult:

        While day is greater than numDaysInMonth: subtract numDaysInMonth from day, increment month.

        Proper handling of invalid months is left as an exercise for the reader, should take about 5 minutes. Add another 5 if you want to make if bulletproof and handle negative values as well. First-year CS students can do this; for a company such as Microsoft to screw it up requires sheer incompetence.

      3. Peter Fox

        Sorry to go on about this but...

        'Well written date object' Eh? If it is based on a timeline then it isn't.

        What date objects can represent 12 Mar 2012, Mar 2012, 12 Mar, 2012, Not-known, and End-of-time/unknown-in-the-future?

        See http://vulpeculox.net/day/index.htm for the answer.

      4. Jonathan Richards 1
        Boffin

        Code leak

        People are posting pseudocode, whereas a hack into the Azure version control system (notepad text objects with local timestamps in the 8.3 filename) reveals the ACTUAL code at fault:

        10 Y$=RIGHT$(D$,4)

        20 Y=VAL(D$)

        30 Y=Y+1

        40 RETURN

      5. Michael Wojcik Silver badge

        @Steve Knox

        > Your pseudocode seems to imply that they used a date object, which I doubt.

        Why? This was in the generation of Azure transfer certificates. That code is very likely written in C++, if it's native, or C#, if it's managed. Microsoft already have certificate-generation APIs for both native and managed code, so it seems likely they used them.

        > More likely, the date was stored as a calendar date in integer or text format

        Why? Even if the transfer-cert generation code is written in C, the date-manipulation code should be using either the standard C library time-manipulation structures and functions, or the Windows FILETIME ones. They'd use one or the other to get the current date in the first place, so they'd already have a struct tm or similar. And canonical date editing with mktime and friends is standard and well-documented.

        There is ABSOLUTELY NO REASON for the certificate-generation code to have manipulated the year portion of the date directly. At some point the original ("today's") date was almost certainly in some form suitable for canonical manipulation: a FILETIME, a .NET DateTime object, a struct tm, a time_t, etc.

        > they manipulated the year portion of that less intelligent data type directly

        Well, yes, that's exactly what happened. But it's vanishingly unlikely that the programmer who did so, did it because the current date wasn't already available in a format suitable for canonical manipulation. This isn't a case of reading a textual date from some source and then adding a year to it; they had to get the current date in the first place.

        I've generated certificates using the .NET Framework (and using OpenSSL, etc, but I doubt the Azure infrastructure uses OpenSSL). It does nearly all the work for you. This screwup is perverse; it's harder than doing it the right way.

    2. Anonymous Coward
      Anonymous Coward

      Well...

      This is Microsoft we're talking about here. Well known for their crappy software.

    3. bazza Silver badge

      @Tom 38

      Yep, it'll be something like that, possibly they've done it as direct manipulation of some time string. I've not read their report.

      Yet again some programmers somewhere have been shown to be a bunch of lazy ******s. Symantec had a similar problem with their antivirus software updater thinking that the year 2010 came before the year 2009... And are Apple devices capable yet of setting an alarm off properly at the appointed time? I suspect not.

      I honestly don't know what goes on in such programmer's heads. If they cared to take even a casual glance at the reference manuals for things like the ANSI C library, Java class libraries, etc. they would find a wealth of functions that a bunch of careful people spent time and effort on so as to make it easy for other programmers to avoid this sort of mistake. Why don't they just ******g use those well thought out routines instead of thinking "I know, I'll do it all over again myself in my own code, how hard can it be, I'm sure a string will do?". It's unbelievable madness. Who supervises these idiots and reviews their code, designs their systems? Sure, the purpose of the routines available in the libraries may be a bit tricky to fully understand, but then time measurement systems (e.g UTC plus the various local timezones) are not a trivial topic. But that's no excuse to ignore the complexity.

      1. bazza Silver badge

        Aha, a downposter!

        Clearly someone in favour of poor programming and buggy software.

  3. ratfox
    Angel

    Not the first, not the last...

    It is a rare software company that never had an embarrassing leap year bug... This one is still going to follow Microsoft for a while, though.

    1. Bob Vistakin
      FAIL

      Re: Not the first, not the last...

      Well, yeah, sure there are plenty of horror stories around, but is this the first time the British Government chose a partner so clueless they literally didn't know what fucking day it was?

      1. dogged

        Re: Not the first, not the last...

        > implying the British Government is less stupid

        Azure also runs about 50% of iCloud, so presumably Apple are that stupid too. Enjoy the moment, Bob. It'll make you feel better when MS cough out another record-breaking set of sales figures for Win7 later in the year.

        1. Bob Vistakin
          FAIL

          Re: Not the first, not the last...

          "Later in the year"? Hopefully you're not using one of these comedy calendars - that could mean anything from next week to sometime in the next decade.

        2. Richard Plinston

          Re: Not the first, not the last...

          > record-breaking set of sales figures for Win7 later in the year.

          I am in two minds about what will happen.

          Either the Osborne effect will kick in and people will delay buying new computers, or upgrading their XP machine, until they have evaluated Windows 8 released versions.

          Or they will rush out and buy Windows 7 so that they don't have to move to Windows 8 and can wait for W8 SP2 (to be called Windows 9).

    2. Anonymous Coward
      Anonymous Coward

      Re: Not the first, not the last...

      > It is a rare software company that never had an embarrassing leap year bug

      Name them

  4. Anonymous Coward
    Thumb Up

    Whoever designs system that have changable clocks will always have problems

    Time is relative, but as it's historicaly far from perfect we endedup with a system that has bits added on and days added on here and there and ontop we change the time twice a year becasue of some fetish to have cockrels cocking away in the early hours of the morning.

    We then take all these sun following fetish's and impose them upon computers who logicaly couldn't care less if the sun is up or not and only care about things being in order. This is were we have the issue as when we start taking that time used to control the order and jump it backwards or forwards in a large chunk we can end up with that level of order getting a little bit our of step and this as we all know upsets programs. Now you can check for TZ changes in your code and cater for these types of exceptions, but thats alot of checks for what is only going to happen in a few small windows during the year.

    Personly I wish computers had two clocks, one thats set and just goes and is used by data processing in code and another one that does all the human quirks and a log is used to map that onto the computer one so it ends up with a new entry of the computer time and the old/new human time every TZ/leap year etc and is only needed to be converted for any reports/display/input from the users. You can then do any processing without a care about TZ changes and handle all the mapping in the input/output. But there is always an exception to the rule.

    This is what makes computers fun and people employed. Maybe not today but one day there will be somebody out there who has the job title - Digital Timezone consultant. It will be a sad day, especialy for those sysadmins who fill out BCS forms and realise there actualy doing 20 different job roles :).

    1. Yet Another Anonymous coward Silver badge

      Re: Whoever designs system that have changable clocks will always have problems

      "Digital Timezone consultant" - I spent a week doing that once.

      We had to coordinate some environmental observations done by schools/volunteer groups all over the world

      Someone in East Timor say they measured it at 8:00am on 21Mar = what timezone do they use, are they in daylight saving time, when did they change clocks, do they change clocks?

      Multiply by a 1000 observations!

      1. Vic

        Re: Whoever designs system that have changable clocks will always have problems

        > I spent a week doing that once.

        A week?

        > Someone in East Timor say they measured it at 8:00am on 21Mar

        [vic@OldEmpire ~]$ TZ=Asia/Dili date +%s -d "21 Mar 2012 08:00"

        1332284400

        [vic@OldEmpire ~]$ date -u -d @1332284400

        Tue Mar 20 23:00:00 UTC 2012

        There's probably a simpler way...

        Vic.

        1. Anonymous Coward
          Anonymous Coward

          Re: Whoever designs system that have changable clocks will always have problems

          LIinux + GNU is not UNIX/Solaris/Aix/BSD*....

          Always quote the format string. Use the time libraries in C or Perl.

          Do n't be smart on line unless you know the facts and what you are doing. Know your time libraries and use man(1). Assume no more that Posix and XPG4

          Perhaps he was working with Windows or a real UNIX and not using GNU date(1),

          Sun Microsystems Inc., SunOS 5.10 (sun4u), Generic_138888-08 CSS 2.1-IB

          $ TZ=Asia/Dili date +%s -d "21 Mar 2012 08:00"

          %s

          $ date -u -d @1332284400

          date: illegal option -- d

          usage: date [-u] mmddHHMM[[cc]yy][.SS]

          date [-u] [+format]

          date -a [-]sss[.fff]

          $ which date

          /usr/bin/date

          $

          1. Anonymous Coward
            Anonymous Coward

            Re: Whoever designs system that have changable clocks will always have problems

            Would that be the same LIinux + GNU is not UNIX/Solaris/Aix/BSD that has to use tables produced by a bunch of astrologers to get their time correct?

    2. Anonymous Coward
      Anonymous Coward

      Re: Whoever designs system that have changable clocks will always have problems

      "Personly I wish computers had two clocks, one thats set and just goes and is used by data processing in code and another one that does all the human quirks and a log is used to map that onto the computer one so it ends up with a new entry of the computer time and the old/new human time every TZ/leap year etc and is only needed to be converted for any reports/display/input from the users."

      That's exactly what most real compouters do. Maintain the system clock in UTC, and only convert for display. Except Microsoft OSes, of course, although it seems that Windows7has finally learned to get it right.

      Wouldn't have prevented this snafu, though. This one was just down to lazy programming. I remember learning how to programme a computer to convert dates allowing for leap years in 4th form, and that was 40 years ago.

      1. the spectacularly refined chap

        Re: Whoever designs system that have changable clocks will always have problems

        I think he's talking about something a little more substantial than simply tracking UTC and converting it whenever necessary to the local timezone. Simply tracking UTC doesn't really buy you anything in terms of simplicity - you still have leap years and leap seconds. The only real motivation for tracking UTC as opposed to local time is for genuinely multi-user systems where users may reside in differing timezones: it adds complexity, not removes it. When you look in detail at the complexities of the way "real" systems do it is looks more and more like an awkward fudge: for example some parts of POSIX allow for there to be 59, 60 or 61 seconds in a minute. Others _require_ that there only ever be 60 seconds in a minute. Squaring the circle between those two contradictory measures requires double-think of the worst kind.

        There are simple, monotonically rising time scales for when they are required: TAI for example. Nobody really uses it outside the scientific community since it doesn't really bear any relationship with the real world.

        1. Richard 12 Silver badge

          Re: Whoever designs system that have changable clocks will always have problems

          Storing and processing in UTC removes >99% of the complexity.

          You're left with the two (and only two) issues of leap year and leap second which happen roughly every four years and 1-7 years respectively.

          As opposed to using local time, which for most people changes twice every year as well as the above leap years and leap seconds, and doesn't stay the same year-to-year either.

          On top of that, most people who can afford computers also travel, so that's additional local time changes.

          So, what to do? Store UTC and handle leap years and leap seconds, or local time and handle leap years, leap seconds, DST, political timezone changes, travel etc?

          TAI would be better, but no common OS uses it internally and neither does the general Internet, making it more likely to be wrong.

          1. mad physicist Fiona
            Facepalm

            Re: Whoever designs system that have changable clocks will always have problems

            "Storing and processing in UTC removes >99% of the complexity."

            You've obviously never done any of this work. How does defining everything to UTC solve the problem of six different units in customary use - seven if you include the week (which you need to to accomodate daylight savings)? Determining a time offset from UTC or any other time zone is a matter of one extra term in an expression referring to a lookup table. Add automatic detemination of daylight savings and you are still only looking at a dozen or so lines of code.

            Compare that dozen lines to the amount needed simply to deal with all those units. Leap years and leap seconds notwithstanding, months are still not the same length, and yes handling leap years alone (yet alone leap seconds) takes vastly more than that 0.1 line of code that is allowed for in your 99% assertion. An assertion now shown as the ignorant crap it really is, to the extent that you have demonstrated that not only do you not understand the problem, but fail to grasp even its dimensions.

            As a final case in point: how many systems actually deal in UTC internally? I'll give you a clue - it isn't ALL "real" systems, in fact I can't think of any more recent than MS-DOS. Windows doesn't and Unix certainly doesn't, both use simple synthetic escalating timers of the nature suggested by Refined Chap. They convert to natural units as and when needed. What do you know that the designers of those systems don't? Or more realistically, what did they realise based on factors that you haven't even considered?

            1. Richard 12 Silver badge
              Facepalm

              Re: Whoever designs system that have changable clocks will always have problems

              mad physicist Fiona, you appear to have spectacularly missed my point.

              You aren't describing processing, that is all formatting. You'll need to do that no matter how you store the time internally, but you don't need to do it very often.

              It's not as complex as the existential questions you get from storing and processing in local time - that way you don't know what time it was by the time it's stored to disk, because the local time definitions may have changed. Thus any stored local time also needs the definition of local time at the time to be stored alongside it to use in all future processing.

              UTC changes much less often than local time, so that processing lookup table is much smaller - it will have 35 entries in total as of the end of 2012, all of which are +1 second and published in advance.

              As opposed to the local time tables which are complicated enough to be worth defending a copyright claim over and change regularly on the whim of world politicians!

              Storing local times means that your data set is dependant on those local time tables, and every single data point must state the timezone it was recorded in, for the data to be useful for any purpose at all.

              Storing UTC means you can do almost all processing with no lookup tables at all and be fairly accurate about intervals - only 35 seconds out in 50 years - or have one adjustment lookup table that is valid for all data points.

              Yes, you still need those complex lookups to display to the user but you don't need them for your data set to be useful.

              Incidentally, Windows has been UTC internally since Vista, although its monotonic clock remains irritatingly 32-bit. (49.7 days is a magic number)

              1. mad physicist Fiona

                Re: Whoever designs system that have changable clocks will always have problems

                No, it is not me that has missed the point: if you define an internal representation in terms of a monotonic counter that is no longer UTC, even if the epoch is defined in terms of UTC: that simply calibrates the meaning of the counter against the real world: it is _not_ itself UTC and immune to this kind of snafu. Windows uses such a monotonic timer internally, not UTC. Nothing uses UTC, at least you haven't given an example of a system that does yet.

          2. the spectacularly refined chap

            Re: Whoever designs system that have changable clocks will always have problems

            Simply using UTC would not have solved this problem. Nor as you would admit would it solve the leap second/year problems. Nor would it solve the simple problem of months being different lengths. Nor would it solve the simple problem on non-normalised units.

            Why do you think the POSIX time_t type exists for example? It and similar artificial measures simply to get away from these real world complexities that inevitably introduce these kinds of corner cases and flakiness with it. Simply adopting UTC internally does nothing to solve that and indeed can introduce subtleties of its own, such as the same being on a different date in two nearby pieces of code.

  5. Adrian Challinor
    Boffin

    Oracle has some interesting date calcs

    In Oracle DB, the DateAdd function has an interesting little tweak. If you add a month the 29-Feb-2012, you get 30-Mar-2012. It does special processing on the last day of the month to give you the last day of the next month. That can catch people out too.

    1. Phil O'Sophical Silver badge
      Coat

      Catch people out

      Yes, especially the ones who expect March to end on the 31st..

    2. Anonymous Coward
      Anonymous Coward

      Re: Oracle has some interesting date calcs

      SELECT ADD_MONTHS( DATE '2012-02-29', 1 ) last_day_of_march

      FROM dual;

      LAST_DAY_OF_MARCH

      2012-03-31

      Can you be clearer?

  6. Paul Crawford Silver badge

    $diety, not again...

    "A certificate created by an agent in a VM on 29 February 2012 will expire on 29 February 2013, a date that simply doesn't exist"

    Err, how about treating that at 1 March 2013, perhaps?

    1. Peter Jones 2

      Re: $diety, not again...

      My first thought was that a policy of certificate expiry on the first of the following month would have avoided all this.

    2. Michael Wojcik Silver badge

      Re: $diety, not again...

      > Err, how about treating that at 1 March 2013, perhaps?

      X.509 certificates are used to provide security functions. Security measures are usually designed to fail secure: when an error is detected, and the system can't verify secure operation, it denies the request / terminates the action / etc. (That's not always what "secure" means, of course. In some cases it might be more secure to reset to a set of hard-coded defaults, for example.)

      So generally speaking, the recipient of an X.509 certificate that has an invalid date should reject that certificate. It's hard to see what attack modes would produce a certificate that has an invalid expiration date but is otherwise valid; but that doesn't mean there aren't any.

      More generally, there's always a tension between best-effort design principles (like the Postel Interoperability Principle), where the recipient tries its best to determine what the sender wanted, on the one hand; and strict-adherence design principles, where the recipient insists on well-formed data, on the other. The former allow for sloppy implementations and occasional misinterpretation in exchange for making it easier to get things working. The latter make it harder for legitimate use, but they also make the system harder to exploit.

  7. JeffyPooh
    Pint

    Humans are stupid

    See title.

    Yes, I include myself in the set of stupid humans. But it's still disappointing.

  8. Anonymous Coward
    Anonymous Coward

    STOOOPID!

    "After a series of failures, the host OS declares the hardware to be at fault"

    Excuse me? Somebody actually programmed this insane assumption into an OS?

    Like software never has repeatable failures so it must be hardware?

    Somebody needs to be looking for another job.

    1. Anonymous Coward
      Anonymous Coward

      Re: STOOOPID!

      No! they need to look for a new career, I don't want Micrsoft to fire them only for someone else (like my bank) to hire them.

  9. Pirate Zebedee

    Again and agian

    I remember having an issue with Exchange 2007 and a leap year spent a whole day looking at that issue and ended up calling Microsoft

    they are not the only one with time issues Apple iOS has had issues when changing to summer time and Symantec Backup exec does funny things as well

    1. multipharious

      Re: Again and agian

      Man, trips down memory lane. I remember some crazy ass date time bugs...

      Just about everyone gets dates wrong, but Calendar appointments while appearing simple wind up being complex with recurring appointments being particularly vulnerable. The fix cannot tell if items were created before or after the fix, and there is nowhere in the metadata to keep things straight. Updated or not? Of course these are the items that bite folks in the butt since they are one hour late or early if something goes wrong. Especially if a recurring appointment such as a Birthday gets flung into the next day. Or scheduled tasks like backup, archiving runs, replication schedules. When is the 13th, 14th, 15th, month of the year? Hardcoded MM/DD/YYYY makes me want to send entire teams of developers to internationalization jail.

  10. asdf
    FAIL

    leap year fails

    After having many Zunes go tits up due to leap year fails you would think they would learn. Guess its Sony's turn to have another leap year fail.

  11. Anonymous Coward
    Facepalm

    HAHAHAHAHAHAHAHAHAHAHAHAHA!

    And more: AHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHA!!!!!! Oh man.....

  12. steeplejack
    Facepalm

    Microsoft always stressed recruiting smart people.

    But there's that little saying - "You just can't get the staff these days..."

    I always found that when doing arithmetic on dates, it's a good idea to convert to julian day number first, then back to a date afterwards.

    1. mccp
      Thumb Up

      Re: Microsoft always stressed recruiting smart people.

      All well and good, but when you include time, you're better off with Modified Julian Day so that your days start at 00:00, not 12:00.

  13. Anonymous Coward
    Anonymous Coward

    I think

    the microsoft apologists and excusers are overlooking the fact that the situation which caused the crash was entirely foreseeable, not a weird combination of circumstances but an inherent part of what the routine was supposed to do. It was apparently not tested to see what would happen on a leap year, and if that is the case, it is inexcusable

  14. P. Lee
    Paris Hilton

    The problem isn't the date bug

    As Twain (Mark, not MS) said, "It aint what you you don't know that gets you into trouble, its what you know for sure that aint so." In this case, the software decided that there's a hardware fault without actually having any hardware monitoring flag a problem. I've seen banks with whole mainframes dedicated to testing. They roll the clocks forward and backwards to test what happens over time with their applications before deploying to a live environment. It appears that MS doesn't run such tests. That's a little scary. They don't even check their hosted, must-be-up-at-all-costs cloud software for leap-year date problems.

    Anyone can make date handling mistakes, the question is whether the testing is done and architecture is right and fault isolation (or even diagnosis) is baked into the design. I guess that's why people buy mid-range unix systems and mainframes. Better hardware design and diagnostics and a real reluctance to imagine that hitting the reset button is a valid solution. This might be ok in SMEs but it is just not fit for the enterprise.

    Thank-you for participating in the MS "Train the Software Release Manager," "Train the Designer" and "Train the Coder" Programs. Your data is appreciated. Please hold.

  15. Rich Harding

    Not as simple as some think...

    Try coding the Monthly Expiry Dates on a Pay As You Go insurance policy ;)

    Neither the "End of next month" nor the "1st March following year" approaches, mentioned in comments above, are correct. This applies to most systems where you may need to increment more than once - if you don't always add n period to the start date, you end up with the end date continually creeping forwards, which is seldom an appropriate solution.

  16. Gordon 11

    What we learn form history...

    ...is that we don't learn from history.

    Prime MAGSAV had this same problem in 1992 (see. e.g., ftp://www.risks.org/pub/users/neumann/cal.pdf -p5.

    It was daft then - it's even more daft now.

  17. eulampios

    MS must be using a wrong OS

    Sorry for Microsoft. Learn from unix utilities and calc emacs:

    :~$ date -d 'feb 29 + 1 year'

    Fri Mar 1 00:00:00 CST 2013

    ~$ date --version

    date (GNU coreutils) 8.5

    #Or calc GNU Emacs:

    alg' <Wed Feb 29, 2012> + 366

    <Fri Mar 1, 2013>

  18. sabroni Silver badge
    FAIL

    appalling

    Just read the summary linked to in the article. They did indeed just take the date and add 1 to the year, and not with a date object and the dateadd function.

    Unbelievable that this happened on their production service. If I was a customer I'd be migrating to another platform right now...

    1. Audrey S. Thackeray

      Re: appalling

      If I was a customer I'd be migrating to another platform right now...

      That was my first thought.

      But is it a bit like flying after a plane crash - this will be a shake up that makes sure everything gets checked properly the way it should have been in the first place?

      I mean I don't know whether their competitors have equally stupid assumptions programmed in that just haven't come to light yet.

      These occasional IT bush fires can be good for clearing away some of the useless clutter.

      1. Kanhef

        Re: appalling

        This may or may not cause other software vendors to change their coding practices.

        But I sure as hell don't have any confidence that Microsoft will change.

  19. clean_state
    Holmes

    self-heling technology

    I like the Microsoft "self-healing" technology described as "restarting the VMs on other boxes". i once had an IT tech who would similarly believe that REFORMAT-RE-IMAGE-REBOOT was an efficient self-healing woodoo magic.

  20. Bodestone

    They never learn

    A few years ago I spent many hours trying to get Exchange 2003 installed and it kept falling over. Several hours in my search on the error message started producing hundreds of additional results from countries several hours ahead of me.

    That's right. It couldn't be installed on Feb 29th. I had to wait until March 1st.

  21. Annihilator
    Coat

    Correction

    "Now the software biz behemoth has put its hands up and admitted in a detailed dissection of the blunder how a calendar glitch trashed its server farm. It's also a handy guide to setting up your own wholesale-sized cloud platform."

    Surely that should be "a handy guide on how not to set up your own wholesale-sized cloud platform?"

  22. MacGyver
    IT Angle

    No cloud.

    Don't use a cloud. The end. The reason this time was Leap Day, what will it be next time?

    If someone says "cloud" in your organization, squash it. Explain to them that there is no such a thing as a cloud, only mirrored data centers, otherwise known as off site storage and thin client services that are pay as you go. It is good for no one (well maybe "cloud" providers), least of all for on site techs (soon to be outsourced). There were reasons we went away from thin clients years ago, those reasons are still there.

  23. Peter Mc Aulay
    Mushroom

    Bahaha

    Then again, experience with SharePoint (not very recent but proving very resistant to alcohol) long since convinced me that Microsoft sucks at anything to do with time zones, date handling, etc. so this comes as absolutely no surprise at all.

  24. Anonymous Coward
    Anonymous Coward

    In defense of kludge

    Although everyone blames the guy who thought he could just do year++, it was the fancy timestamp style date handling that made it actually crash. If the whole system used the naive method, it would have had worked fine. The certificate would have been valid on 2013-02-28 and invalid on the 2013-03-01.

    1. Michael Wojcik Silver badge

      Re: In defense of kludge

      Perhaps you should learn how X.509 certificates work.

This topic is closed for new posts.

Other stories you might like