back to article How the NYE leap second clocked Cloudflare – and how a single character fixed it

When the leap second was added just before the arrival of 2017, Cloudflare stumbled. The content delivery network's DNS service suffered a limited service interruption during the first few hours of the new year. John Graham-Cumming, head of engineering for the company, said in a phone interview with The Register that the …

  1. Anonymous Coward
    Anonymous Coward

    "but we have external input making them unpredictable"

    A friend of mine always repeats "never trust input". Is a good advice...

    1. Oengus

      Re: "but we have external input making them unpredictable"

      In my days as an application programmer about 80% of the code was involved in checking inputs (even when those inputs were output from other programs that did their own input checking...).

      1. Steve Knox
        Facepalm

        Re: "but we have external input making them unpredictable"

        In my days as an application programmer about 80% of the code was involved in checking inputs (even when those inputs were output from other programs that did their own input checking...).

        These days input checking for me is limited to handling when the data in files to import do not match the specification created by the same exact people who create the data files. It's still well over half of what I have to do.

        1. James 51
          Flame

          Re: "but we have external input making them unpredictable"

          A similar pet peeve of mine is when the same people create ambigious data file formats. So much work to correct when it would have been so simple to fix when it was still being written.

          1. Anonymous Coward
            Facepalm

            Re: "but we have external input making them unpredictable"

            My own 2p...

            A client whose in-house software created XML files that didn't validate against the XSD they themselves created. That's quite an achievement when it's mostly done by tools rather than hand-cranked.

            1. Doctor Syntax Silver badge

              Re: "but we have external input making them unpredictable"

              "That's quite an achievement when it's mostly done by tools rather than hand-cranked."

              True, but people go for the easy solution - in this case, obviously, the hand-cranked version. Why can't they learn that a surname field can legitimately contain things like "O'Neill"?

              1. James 51
                Boffin

                Re: "but we have external input making them unpredictable"

                SQL escape character. I've seen that trip up a few systems.

            2. LionelB Silver badge
              Joke

              Re: "but we have external input making them unpredictable"

              "... it's mostly done by tools ..." - there's your problem, right there.

    2. DJ Smiley

      Re: "but we have external input making them unpredictable"

      Trust, but verify. Applies to programs and users.

  2. druck Silver badge
    Facepalm

    To my fellow programmers

    FFS, get a clue!

    1. Tom 7

      Re: To my fellow programmers

      Try and understand what you are coding for - this exact leap second addition has happened 27 times before.

      Perhaps even look at all the open-source code that is available in other languages too.

  3. J. R. Hartley

    There is the theory of the moebius...

    A twist in the fabric of space, where time becomes a loop.

    1. Michael Thibault

      Re: There is the theory of the moebius...

      >time becomes a loop

      How do you know it's just one?

    2. bigphil9009

      Re: There is the theory of the moebius...

      Where time becomes a loop...

      Where time where time becomes a loop...

      1. J. R. Hartley

        Re: There is the theory of the moebius...

        When we reach that point, whatever happens will happen again.

        1. Midnight

          Re: There is the theory of the moebius...

          Programs assume that time is a strict progression of cause to effect but actually, from a non-linear, non-subjective viewpoint, it's more like a big ball of wibbly wobbly... time-y wimey... stuff.

  4. Anonymous Coward
    Anonymous Coward

    Use NTP

    Then time would have been drifted to absorb the second over several hours.

    1. Andrew Moore

      Re: Use NTP

      Google do this- it's called a Leap Smear

      1. John Sager

        Re: Use NTP

        Google do this- it's called a Leap Smear

        I set my internal NTP servers to sync from Google. Google drifted their time linearly from 10 hours before to 10 hours after the leap. My servers responded with a rapid 50ms offset initially which then decayed, and then a -50ms offset at the end. The drift changed by 14ppm. Personally, I would have preferred them to do a raised cosine profile over -/+ 12 hours. That would reduce the initial offset errors considerably for a peak drift change of ~18ppm.

    2. DavidRa

      Re: Use NTP

      Not necessarily. http://www.ntp.org/ntpfaq/NTP-s-time.htm#Q-TIME-LEAP-SECOND is prescriptive, but basically you end up with 23:59:59 --> 23:59:60 --> 00:00:00 (that's the leap second) instead of the normal 23:59:59 --> 00:00:00, when the extra second is inserted. After that it's up to the kernel (if it knows how to handle a LS) or NTP will adjust slowly back to sync.

      It's simple, time never goes backwards. I mean seriously, this is a solved problem, and someone didn't understand enough about time.

  5. Anonymous Coward
    Anonymous Coward

    2038 is already a problem, today.

    Root CAs with expiry periods longer than 20 years are already being issued, and these post-2038 expiry dates can't be verified on 32-bit Linux systems using LibreSSL v2.3.1 (and later) after the BSD-fanbois removed post-2038 support on systems with 32-bit time_t, cutting 32-bit Linux off at the knees.

    OpenSSL, for all its faults, still has the original workaround for post-2038 expiry dates on 32-bit systems. LibreSSL intentionally removing this workaround is just asinine, and renders LibreSSL wholly unsuitable for use on 32-bit Linux.

    From what I've read so far, the 32-bit Linux ABI won't be changing any time soon (although there are plans in progress), presumably because they believe they've got a few more years to sort it out. However 20 year expiry periods for Root CAs are pretty much the norm, and it's 2017 now.... "do the math".

    The clock, as they say, is ticking.

    1. Lee D Silver badge

      Re: 2038 is already a problem, today.

      32-bit systems are already in the minority, and rather limited to embedded and small solutions.

      Almost all modern ARM and Intel chips are 64-bit and in the case of Intel have been for decades.

      Rather than put a-patch-on-a-bandaid-on-a-bodge on a 32-bit structure that's inherent in everything, when changing the structure is inevitable, it's easier to just demand 64-bit. There's no reason a 32-bit compiler can't cope with 64-bit numbers in structures even if they have to do it manually.

      The problem, also, evaporates for MORE than a sensible amount of time - 64-bit time is unbelievably huge and viable into the future (29,227 years even if you use a single 64-bit number to nano-second accuracy!).

      As we did with bodges like LBA and it's numerous iteration, rather than faff about, just demand 64-bit for anything that is going to be critical for the next 20 years. Because 17 years ago, you could pick up 64-bit chips, and 17 years from now you shouldn't be using anything else.

      In fact, it's quite possible that we won't see 128-bit computing in common use for decades, because 64-bit is so unfathomably huge that it's likely going to be the epitome of modern computing for a long time to come. Until, of course, you need to access more than 16 x 1024 x 1024 Terabytes!

      1. Anonymous Coward
        Anonymous Coward

        Re: 2038 is already a problem, today.

        > 32-bit systems are already in the minority, and rather limited to embedded and small solutions.

        Embedded systems are about to explode exponentially -- because of something called the IoT -- and I fully expect a lot of the IoT to be 8-bit, let alone 64-bit.

      2. Anonymous Coward
        Anonymous Coward

        Re: 2038 is already a problem, today.

        > 32-bit systems are already in the minority, and rather limited to embedded and small solutions.

        Are you freaking kidding me? Think about embedded ARM solutions. Single board computers (eg. Raspberry Pi) and other compute solutions. Mobile devices (including Android-based systems). Internet of Things devices (which may not even stretch to 32-bit in all cases). The sum total of all these non-64-bit devices will easily run into the billions.

        So I'd hardly call 32-bit ARM systems running Linux/Android "in the minority" as they outweigh the installed base of 64-bit systems by a very significant margin, and will continue to do so for many years to come.

        1. Tom 7

          Re: 2038 is already a problem, today. Are you freaking kidding me

          Fear not for two reasons:

          1) the new Pis are already 64bit and in 10 years time I'd imagine the Moore's last guess will mean that we will have 8 core 64b bit machines running on brownian motion power supplies and people will not be using the old 32 bit jobbies other than for fun.

          2) Gel capacitors in all the power supplies are pretty much going to ensure any still running aren't.

          1. John Brown (no body) Silver badge

            Re: 2038 is already a problem, today. Are you freaking kidding me

            "2) Gel capacitors in all the power supplies are pretty much going to ensure any still running aren't."

            3) the lead-free solder will pretty much ensure that even if the caps are working, the rest of the pcb will all short out anyway.

        2. Doctor Syntax Silver badge

          Re: 2038 is already a problem, today.

          "The sum total of all these non-64-bit devices will easily run into the billions."

          32-bit registers do not preclude handling >32 bit numbers. Arbitrary precision has been with us for a long time. A longer time than electronic computers, in fact.

          1. John Brown (no body) Silver badge

            Re: 2038 is already a problem, today.

            "32-bit registers do not preclude handling >32 bit numbers. Arbitrary precision has been with us for a long time. A longer time than electronic computers, in fact."

            It does make me wonder sometimes what some people think we did back in the days of 8-bit computing. It's not we were limited to 0 to 65536 integer only arithmetic. We had clever stuff like 2's complement so we could have -32768 to +32767!! Then some clever bugger invented floating point libraries. Even my ancient 8-bit z80 based TRS-80 model 1 had "double precision" floating point operations in the BASIC interpretor.

            And as you say, much further back. Some people today can be easily dumbfounded by demonstrating to them decimal arithmetic on an abacus. Or playing with Napiers Bones. Or a slide-rule. I'm 55 now and was probably almost the last generation to actual be taught how to use log tables at school.

        3. Anonymous Coward
          Anonymous Coward

          Re: 2038 is already a problem, today.

          >So I'd hardly call 32-bit ARM systems running Linux/Android "in the minority

          Somewhat out of scope (not running Linux) but I bet just my company sells more 8 bit microcontrollers in a month than all ARM 32bit processors sold worldwide in a quarter. My company probably also in quarter sells more 8/16/32 bit microcontrollers than all 64 bit cpus ever made unit wise.

    2. asdf

      Re: 2038 is already a problem, today.

      > LibreSSL wholly unsuitable for use on 32-bit Linux.

      Honestly I wouldn't even bother with LibreSSL on Linux anyway. As much as I like the OpenBSD guys and despise what a bird's nest OpenSSL had become the internet really got their shit together and got the OpenSSL code base in order similar to LibreSSL. Crypto in general tends to be somewhat tightly coupled with the OS (some usecases simply can't be trusted in userland) so with the LibreSSL folks having fewer resources and less incentive to support Red Hat butt raping POSIX (ie Linux these days) its probably not the best choice. Also both suffer from one the bigger problems not easily fixed and that is what a giant clusterfsck *SSL APIs were allowed to become over time (such as making internal data and types available publicly in the API, so much for encapsulation, yikes).

      Also to the guy saying Intel has supported 64 bit for decades try again. AFAIK the first true 64 bit processor was the Alpha and it just barely at the two decades mark (an amazing chip especially compared to x86 at the time, sadly put out by a dinosaur of a company in every other way).

      1. asdf

        Re: 2038 is already a problem, today.

        >some usecases simply can't be trusted in userland

        Yuck said that wrong. Basically saying a lot of crypto has to be done in the kernel and lot of that will be platform specific. Obviously the OpenBSD folks are going to target their platform first and by far the most is what I am saying.

  6. Ken Moorhouse Silver badge

    the code was updated to check if rttMAX was equal to or less than zero

    Less than zero? How big do you want to allow that "less than zero" value to be? If the biggest time adjustment ever to be made is one second then allowing that negative value to exceed that could open up other problems.

    But what puzzles me is how do they deal with Daylight Saving? Maybe they don't adjust internally for that at all, but if you have two computer systems that adjust at slightly different times then you would see this anomaly then, surely? Here we are talking vastly higher negative values of time.

    1. The Vociferous Time Waster

      Re: the code was updated to check if rttMAX was equal to or less than zero

      DST (and indeed timezones) are not a change to the time, just a change to the way that the time is displayed; for the reason you mentioned and many more. The US and many other places change their DST on differing dates and many places don't do it at all.

      Time is expressed in UTC, which is essentially very similar to Greenwich Mean Time. Daylight saving in the UK (BST; British Summer Time) is simply a display of UTC+1 so the systems comparing time clocks would still store and compare UTC to UTC. The difference with a leap second is that it is actually a change to UTC.

    2. The Vociferous Time Waster

      Re: the code was updated to check if rttMAX was equal to or less than zero

      DST (and indeed timezones) are not a change to the time, just a change to the way that the time is displayed; for the reason you mentioned and many more. The US and many other places change their DST on differing dates and many places don't do it at all.

      Time is expressed in UTC, which is essentially very similar to Greenwich Mean Time. Daylight saving in the UK (BST; British Summer Time) is simply a display of UTC+1 so the systems comparing time clocks would still store and compare UTC to UTC. The difference with a leap second is that it is actually a change to UTC.

      When "the clocks change" normally it is just the function of displaying the clock rather than the internal clock that changes except in these very rare and specific cases.

      1. A K Stiles
        Coat

        Re: the code was updated to check if rttMAX was equal to or less than zero

        Well that really was some Vociferous Time Wasting to post essentially the same thing twice, no?

        (Sorry, just noticed the coincidence of handle and minor error - Mine's the hi-vis one with the reflective strips for the dark winter days).

      2. Phil O'Sophical Silver badge

        Re: the code was updated to check if rttMAX was equal to or less than zero

        The difference with a leap second is that it is actually a change to UTC.

        That's true, but AFAIK UTC still never goes backwards, a leap second just means that a minute contains 61 seconds instead of 60. I still don't follow how a time difference between successive "now" instants on the same system could ever be negative if it's measuring UTC.

        1. AdamT

          Re: the code was updated to check if rttMAX was equal to or less than zero

          I think the point is that the "implementation" of their UTC functions just doesn't know about leap seconds so all you can do is externally set it back by a second at some point. i.e. the now() function can't ever report "23:59:58", "23:59:59", "23:59:60", "00:00:00", which is what it should do in a leap second. So the only option that they have is to "manually" knock back the time counter so you repeat a second as reported by the now() function. Having done that there is a risk that if you make two requests less than an actual second apart and the knock back occurs between the two, then you will get a negative value.

          Hence Google's approach of just smearing the second out over a day or so by making multiple tiny adjustments such that you can guarantee that the smallest interval between now() requests will always still result in a positive number.

          1. Doctor Syntax Silver badge

            Re: the code was updated to check if rttMAX was equal to or less than zero

            So the only correct option that they have is to "manually" knock back the time counter so you repeat a second as reported by the now() function fix it.

            FTFY

        2. This post has been deleted by its author

          1. DropBear

            Re: the code was updated to check if rttMAX was equal to or less than zero

            That's actually going _forward_, and is by far the less problematic case. It just makes something that actually took 1 second look like it took 2...

          2. Anonymous Coward
            Anonymous Coward

            Negative leap seconds

            "It is entirely possible that a second may need to be removed from UTC to align with solar time"

            Perhaps "theoretically possible" rather than "entirely possible"?

            It is claimed that "it is unlikely that a negative leap second will ever occur". I'm not sure if anyone has estimated exactly how unlikely, or what kind of meteorite strike could necessitate a negative leap second without at the same time making UTC redundant by wiping out our whole miserable species.

            1. phuzz Silver badge

              Re: Negative leap seconds

              A particularly big object, wizzing past in the right direction could speed up the rotation of the Earth enough to require a negative leap second.

              I may now spend the rest of the afternoon working out exactly how big of an object, how close, and if there would be any other effects...

          3. Vic

            Re: the code was updated to check if rttMAX was equal to or less than zero

            UTC can indeed go backward

            No it can't. It's defined as being monotonic.

            It is entirely possible that a second may need to be removed from UTC to align with solar time rather than one being added.

            That would make the time go from 23:59:58 directly to 00:00:00 on the following day. This is not going backwards...

            Vic.

            1. Brangdon

              Re: No it can't. It's defined as being monotonic.

              Was it just me that read that as "moronic"?

          4. Doctor Syntax Silver badge

            Re: the code was updated to check if rttMAX was equal to or less than zero

            " It is entirely possible that a second may need to be removed from UTC to align with solar time rather than one being added."

            That can be achieved by having a 59 second minute. It still doesn't need to go backwards.

        3. Vic

          Re: the code was updated to check if rttMAX was equal to or less than zero

          I still don't follow how a time difference between successive "now" instants on the same system could ever be negative if it's measuring UTC.

          Go's Now() function is defined as returning "the current local time", rather than UTC. This would appear to be a blunder.

          Vic.

          1. Cynic_999

            Re: the code was updated to check if rttMAX was equal to or less than zero

            "

            I still don't follow how a time difference between successive "now" instants on the same system could ever be negative if it's measuring UTC.

            "

            Due to how it is implemented.

            Imagine you have an analogue clock that shows the exact UTC time. Before, during and after the leap second, the times will be 23:59:59 - 23:59:60 - 00:00:00

            But an analogue clock cannot display a time of 23:59:60 There is no number 60 on the dial. The second-hand will always move from second 59 to second 00. So in practice you will have to either stop the clock for a second as it reaches 00:00:00, or knock the second-hand back a notch. If you use the latter method, then it is possible to get the illusion that time is running backwards - look at the clock just before the hand was knocked back, and again just after it was knocked back, and it will seem that time has gone backwards. But both methods will cause strange results in a system that is looking at the current time frequently and using the result to calculate elapsed time.

            1. Vic

              Re: the code was updated to check if rttMAX was equal to or less than zero

              But an analogue clock cannot display a time of 23:59:60

              Then it is not a clock showing UTC...

              Vic.

              1. Cynic_999

                Re: the code was updated to check if rttMAX was equal to or less than zero

                "

                Then it is not a clock showing UTC...

                "

                Huh? What position would the second-hand be when displaying a second count of 60, and how would that position be different to displaying a second count of 00?

                It's not the 24 hour system that's the issue, it's the fact that an analogue dial has 60 divisions (for seconds & minutes) rather than 61 ...

                1. Vic

                  Re: the code was updated to check if rttMAX was equal to or less than zero

                  Huh? What position would the second-hand be when displaying a second count of 60, and how would that position be different to displaying a second count of 00?

                  I don't care - it's not my clock. But if it can't show 23:59:60 as distinct from 00:00:00, then it's not showing UTC, because those are different times.

                  It's not the 24 hour system that's the issue, it's the fact that an analogue dial has 60 divisions (for seconds & minutes) rather than 61 .

                  Nobody said anything about the 24-hour system being at fault. And if you can't put together an analogue dial that can show at least 61 divisions, then you can't make an analogue clock that sows UTC, because UTC requires that many divisions in order to be UTC. It's the only way you can make a UTC clock.

                  So - as I said - if you make an analogue clock that can't show 23:59:60, you haven't made a clock that is showing UTC.

                  Vic.

        4. Doctor Syntax Silver badge

          Re: the code was updated to check if rttMAX was equal to or less than zero

          'I still don't follow how a time difference between successive "now" instants on the same system could ever be negative if it's measuring UTC.'

          And that's only a part of it. It's not impossible to have a system's idea of UTC being reset backwards if the machine was started with the wrong time; this should not bring a system down. Surely a random function should be able to return a negative random number if called with a negative argument and even if it can't it should fail gracefully. And if you want the function to provide a positive number surely you can either check the number you call it with or call ABS() on the result.

          A solution that requires the system to have the wrong UTC value for several hours must surely be the wrong one.

          BTW did we ever get an explanation of why the London Ambulance Service systems crashed in the early hours of New Year's Day?

    3. gnasher729 Silver badge

      Re: the code was updated to check if rttMAX was equal to or less than zero

      "But what puzzles me is how do they deal with Daylight Saving? "

      You don't deal with it at all. You have one clock that is running in UTC. That clock increases exactly one second every second, and occasionally a leap second is added or removed. That's different from GMT, which increases approximately one second every second, depending on how fast or slow the earth rotates on a particular day, and never uses leap seconds.

      Daylight savings time comes into play when you take UTC and convert it to a time in the local calendar that is displayed to the user, or from a time that is entered by the user. It's purely for display.

  7. John Smith 19 Gold badge
    Thumb Up

    OMG "reviewing its code for time calculation problems

    They have found bugs in their code.

    And are reviewing the code base for others with the same pattern.

    Isn't that how all developers should do it?

  8. Anonymous Coward
    Anonymous Coward

    "One of the problems with computers is whenever you deal with the real world, you deal with the mess of the real world."

    The solution is obvious in that case... don't have computers deal with the real world.

    1. DropBear

      I'd very much like to rephrase that as "One of the problems with computers is whenever you sit down to write software you build all your implicit assumptions about the real world into it instead of doing it properly and the real world gleefully proceeds to call out all your bullshit immediately".

      1. Bob Camp

        Exactly. Ask a programmer how many seconds are in a minute, and they will say "60", which is incorrect.

      2. Doctor Syntax Silver badge

        "the real world gleefully proceeds to call out all your bullshit immediately"

        The real world is craftier than that. It leaves out the immediate bit.

  9. J.G.Harston Silver badge

    I thought it was a fundamental part of programming that you always do if(now>=start+delay) and not if(now=start+delay). How would it ever even occur to somebody to test for an exact equality of time?

  10. Alan Sharkey

    Its not just time that can have this issue

    When I was a student, we were asked to write a program to count down from 10 and stop when it got to zero (it was actually to write the song "10 green bottles"). I forgot to set the initial value to ten and I checked that the value equalled zreo AFTER I decremented the counter. So, of course, bu the time I checked, the counter was set to -1. And my check was not less than or equal, but just equal.

    2 full boxes of printout later (this was in 1972 on a big ICL machine) the operators manually stopped my job......

    I had scrap paper for years :)

    Alan

    1. Doctor Syntax Silver badge

      Re: Its not just time that can have this issue

      "2 full boxes of printout later (this was in 1972 on a big ICL machine) the operators manually stopped my job"

      Getting your printer control characters wrong was another fruitful way of doing that.

      No, it wasn't me, I didn't have a motor-bike at the time so I couldn't try to carry the stack home on the pillion, not properly secured...

  11. Anonymous Coward
    Anonymous Coward

    So...

    I read TFA, and I still can't grok how the leap second made "time go backwards".

    One thumbs up: they did use go's time libraries instead of rolling their own. I'm not much of a programmer, but I know that one should never write your own custom time or crypto libraries.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like