back to article 'Data saturation' helped to crash the Schiaparelli Mars probe

The European Space Agency (ESA) has released results of its early investigations into the crash of the Schiaparelli Mars probe and it sounds like software may have been a part of the problem. "A large volume of data recovered from the Mars lander shows that the atmospheric entry and associated braking occurred exactly as …

Page:

  1. Ashley_Pomeroy

    "Oh no, not again"

    1. Anonymous Coward
      Anonymous Coward

      Yeah, didn't we see something similar with Ariane 5?

      (Corr: not really, there sensors (correctly) reported an acceleration so strong an overflow occurred when converting from float-to-integer in useless Ariane 4 code left in the package, an overflow trap was raised (in violation of IEEE 754 default policy), a trap handler did not exist, software then trapped to an exception handler which proceeded to dump debugging information into a memory area used by motor guidance. Repeat with the backup computer. See also page 22 of How Java's Floating-Point Hurts Everyone Everywhere from 2004 by Numerical Computation Wrestler Prof. William Kahan et al.)

      1. Martin Gregorie

        Yeah, didn't we see something similar with Ariane 5?

        The problem with the Apollo 11 LM's onboard computer looks like a better match.

        There, leaving a docking radar on overloaded the computer's interrupt handler when they got near the lunar surface, but fortunately there was an astronaut on board who was able to manually fly the landing.

        Here, violent gyrations as the parachute opened seem to have overloaded the IMU and caused it to output garbage which upset the computer that managed the landing.

        A faster IMU and improved garbage detection and rejection would both seem like a good idea.

        1. placeforhandle

          Re: Yeah, didn't we see something similar with Ariane 5?

          "The problem with the Apollo 11 LM's onboard computer looks like a better match." - no, no, no and again no. You have done a terrible thing with your post - you should delete it!

          The story of the 1201 and 1202 alarms on the lunar module is a marvellous and wonderful story of how to do it *right*.

          https://www.hq.nasa.gov/alsj/a11/a11.1201-pa.html

          I commend it to all readers, it's an amazing thing to read.

          Then go listen to / watch the original footage and here Buzz(?) call out the alarms and feel the tension.

          1. swm

            Re: Yeah, didn't we see something similar with Ariane 5?

            As I remember (and I listened to this live) the error was not flipping a switch that would synchronize the radar measuring height from the lunar surface and the radar measuring the distance to the lunar orbiter. This switch was not thrown because of a documentation/checklist error. So instead of having about 30% free time the computer had only 2% free time. Whenever an astronaut attempted to query the system it ran out of real time and caused these alerts.

            But the software didn't crash - it just dumped some real-time processes so it could keep computing more important stuff. The fix was for the astronauts not to interrogate the computer. The final landing was manual after the computer had perfectly guided the lander to a particular place/velocity over the lunar surface. The tenseness during the landing (when ground control realized something was happening but didn't want to interrupt the pilot) was that everything had gone according to plan except that there were so many craters and the pilot was looking for a place to land. The media commentators did not have a clue during the whole landing.

      2. Arthur the cat Silver badge
        Facepalm

        an acceleration so strong an overflow occurred when converting from float-to-integer in useless Ariane 4 code left in the package, an overflow trap was raised (in violation of IEEE 754 default policy)

        If my very rusty memory isn't lying to me, Ada requires a trap on overflow, so IEEE 754 policy has nothing to do with it. The problem was caused by the Ariane 4 code saving two precious bytes of RAM and then nobody checking whether the variable was large enough for Ariane 5 conditions. There was no trap handler defined because "of course it can never happen, Ariane 4 can't accelerate that much". If they'd actually put a comment in to that effect, maybe someone would have noticed.

    2. Kane
      Coat

      "Oh no, not again"

      Actually, that was the bowl of petunias. But have an upvote anyway.

      Mines the one with the Sub-Etha Sens-O-Matic in the back pocket, thank you.

    3. Julifriend

      @Ashley_Pomeroy It was the bowl of petunias that said 'Oh no, not again'.

    4. energystar
      Boffin

      Seems this thing wasn't even AI tested...

      How Can a Real Time System Ignore Priorities? Hopeful More detailed Info get Public Later.

  2. Anonymous Coward
    Anonymous Coward

    Welcome to embedded system engineering

    I have seen it so many times, I no longer even laugh.

    Trying to explain the Postel principle to an embedded engineer is like trying to convince an evangelical fundie that the world is more than 6000 years old.

    YOU HAVE TO CHECK YOUR INPUTS. ALWAYS. IF THE INPUTS DO NOT MAKE SENSE DISCARD, RESET, REPENT.

    And do it again.

    There is nothing easier than validating location and altitude readings. Just compute f*** first derivatives. If it looks like you have accelerated at 10000G and have broken the light barrier you definitely got a duff reading.

    Sigh...

    1. John H Woods Silver badge

      Re: Welcome to embedded system engineering

      indeed. I'm not sure that there's any reason deploying parachutes at a negative altitude.

      1. Mark 85

        Re: Welcome to embedded system engineering

        One would think a negative altitude would cause a "reset" "input fresh data" sequence. I'm just surprised that it discarded the chutes and fired the retros as don't think the retros would have helped after impact.

        1. Sleep deprived
          Happy

          Re: Welcome to embedded system engineering

          Maybe it tried to fire the retros backwards to climb back to surface.

      2. Destroy All Monsters Silver badge

        Re: Welcome to embedded system engineering

        I'm not sure that there's any reason deploying parachutes at a negative altitude.

        NEVER SURRENDER!

      3. Blofeld's Cat
        FAIL

        Re: Welcome to embedded system engineering

        " I'm not sure that there's any reason deploying parachutes at a negative altitude."

        I believe Wile E Coyote has done this on numerous occasions.

        It appears to be the standard failure mode for parachutes according to the rules of comedy. The parachute then floats down and completely covers the lander-shaped hole in the Martian surface.

        1. Anonymous Coward
          Anonymous Coward

          Re: Welcome to embedded system engineering

          You mean ESA main supplier is ACME?

        2. You aint sin me, roit

          Don't look down!

          As Wile E Coyote regularly finds out to his peril, you're OK walking off the cliff... until you look down.

          At which point your Inertial Measurement Unit gets saturated...

      4. julianh72

        Re: Welcome to embedded system engineering

        Re: Deploying a parachute at negative altitude:

        This happens just about every day to Wile E. Coyote: pursues Road Runner, falls off cliff, tugs desperately at rip-cord, hits the ground, making a coyote-shaped hole - and then the parachute pops out of the ground, and settles gracefully down over the coyote.

        I'd like to think that is how Schiaparelli's last moments transpired!

      5. This post has been deleted by its author

      6. This post has been deleted by its author

        1. Destroy All Monsters Silver badge
          Windows

          Re: Welcome to embedded system engineering

          From: http://www.nytimes.com/1985/06/20/us/laser-test-fails-to-strike-mirror-in-space-shuttle.html

          "Several critics quickly cited the failure as evidence of bigger problems to come. They said this mistake, a simple human error capable of upsetting a complex technological effort, was the type that could be the ultimate undoing of the proposed antimissile shield."

          Yeah, I remember when the news was that one could get a functional SDI infrastructure for, like, 1 trillion dollar. Which is, what, 1/42th of the national debt now (or 1/200th by some reckonings)? And the only thing that does even work 30+ years later is anti-Iranian missile bases in Romania (more like first-strike caps on Russia, amIrite?)

          The retardation and belief in technological marvels was just amazing.

      7. Anonymous Coward
        Anonymous Coward

        Re: Welcome to embedded system engineering

        > I'm not sure that there's any reason deploying parachutes at a negative altitude.

        Oversimplified code?

        if (altitude < 3000)

        deploy_chute()

    2. Steve K

      Re: Welcome to embedded system engineering

      Unfortunately it is definitely an embedded system now - in the Martian surface...

      1. gregthecanuck
        Pint

        Re: Welcome to embedded system engineering

        Hey Steve - thanks for the laugh. You win LOL of the day.

        Cheers!

  3. Anonymous Coward
    Anonymous Coward

    I would never fly anything ESA designed (if they ever get to that stage) if they let this sort of thing completely screw up their system.

    How friggin fragile do they design their software?

    Don't they test with various sensor fail (or temporary fail) issues, as well as take bad readings into account even during the initial design process?

    Just embarrising for ESA.

    1. SkippyBing

      I wonder if they borrowed their engineers from Thales, makers of a drone that can think it's on the ground if it's a bit cloudy. Certainly displays the same lack of creative thinking when it comes to failure modes.

    2. Lars Silver badge
      Joke

      "Just embarrising for ESA". Yes, and the solution is, not doubt, for the UK to pull out of the ESA to show how to do it proper. What the hell are you waiting for.

      1. Phil O'Sophical Silver badge

        In this case the British had been there, done that, 12 years ago. Possibly even less dramatically, see: https://www.newscientist.com/article/2112484-beagle-mars-probe-probably-didnt-crash-new-analysis-shows/

        1. Anonymous Coward
          Boffin

          On the other hand it didn't actually return any data. I realise it must have been a success because it was british, but you need a definition of 'success' which includes 'failure'.

    3. Anonymous Coward
      Anonymous Coward

      It's not outside the realms of possibility that both the IMU and the navigation system performed to within the written system specs.

      I don't know about software, but from a mechanical engineering point of view, specifications for spaaaaace engineering are very tightly controlled and a lot of written and test evidence has to be supplied to show that the specification is met.

      Difficult to believe, but a lot of engineers can actually say their part of the lander design was a success.

  4. Anonymous Coward
    Mushroom

    Seems like the system could have been a little more roboust

    So a disagreement in data between sensors for 1-2 seconds cause the parachute to be discarded and the retros to stop firing?

    Even if the computer made a momentary calculation that the probe had landed, wouldn't it be safer to reduce thrusters and retain the chute until landing had been confirmed?

    1. Anonymous Coward
      Anonymous Coward

      Re: Seems like the system could have been a little more roboust

      No, the parachute must be discarded earlier so it doesn't risk to cover the probe, and thrusters must also be turned of before landing if you don't want to heat, blow and contaminate the landing site.

      That said, as others pointed out, the software should have been able to cope with unexpected readings - they had the NASA example when the vibrations caused by the deploy of the landing legs triggered a reading as if the probe has landed - and the subsequent engine cut-off.

      But they wouldn't be the first developers who don't understand the need to cope with unexpected situations... data never lie, right?

      1. SkippyBing

        Re: Seems like the system could have been a little more roboust

        I would have thought a simple timer would have solved a lot of problems, the time to fall to the Mars surface must be trivially easy to calculate for a space agency. There's no need to even start thinking about the ground until you're within a tolerable margin of that time and then you can start looking for it with a radar altimeter.

        1. Anonymous Coward
          Anonymous Coward

          Re: Seems like the system could have been a little more roboust

          I'm slightly surprised it's done using sensors at all. Considering how good boffins are at calculating things like you say - time to fall, it seems odd to me that it's not all done on a timer.

          1. Anonymous Coward
            Anonymous Coward

            Re: Seems like the system could have been a little more roboust

            "Considering how good boffins are at calculating things like you say - time to fall, it seems odd to me that it's not all done on a timer."

            As I recall, the Russians did actually use drum timers for a lot of things on their spacecraft.

            Many years ago I designed a system which carried out a number of interlocks and tests before engaging an array of instruments and then hitting something with a most enormous zap and waiting to decide if it was necessary to pull the main breakers before uncontrolled rapid ignition of the test system happened. The whole thing was software controlled but, in parallel, I had an old fashioned drum timer with microswitches which was able to abort the whole sequence at each phase if critical conditions were not met. Because sooner or later after I left somebody was going to try to alter the program. The drum timer lived in a transparent plastic box so it could be inspected before each run. It is difficult to do this with software.

            1. Destroy All Monsters Silver badge

              Re: Seems like the system could have been a little more roboust

              Drum timers: Will survive axially-placed gamma-ray bursts in your galaxy. 100% guaranteed!

              Galactic Equip-R-Us. Call now by Ansible!

        2. a_yank_lurker

          Re: Seems like the system could have been a little more roboust

          @SkippyBing - There is a tendency to be overly complex when something simple will work much more reliably. The physics for the descent are well known and the calculations should be doable by an undergraduate without a computer. An example when KISS should be remembered.

        3. Anonymous Coward
          Boffin

          Re: Seems like the system could have been a little more roboust

          Yes, solving a great mass of fluid-dynamics equations in advance, when you don't know what the atmosphere will be doing that day or the exact details of the velocity and position of the spacecraft as it enters the atmosphere or the topography of the ground where it will end up is easy. That's why the Apollo LEM, which didn't have the problem of atmosphere to deal with didn't need all that landing radar stuff. Oh, wait, it did need landing radar.

  5. Anonymous Coward
    Anonymous Coward

    Dynamic analysis...

    Appears to have been forgotten or been inadequate. There are loads of software tools out there that could have helped spot this sort of issue.

    You would have thought that everything would have been thrown at a system that's going to fly so far before it gets run "live"...

  6. Dwarf

    Bravo El Reg for the HHG reference - Perfect, unlike the code on the lander.

    I'm left wondering how time sensitive the "getting your kit out" stage is on landing. I would expect that it would at least court the ground its just landed on for a bit before deciding to whip it all out.

    After all, if anything unlikely were to happen on landing, then all the sensitive bits are still packed away nice and safe and out of harms way.

    1. Simon Sharwood, Reg APAC Editor (Written by Reg staff)

      Thanks Dwarf. You can stay and comment again :-)

  7. Anonymous Coward
    Anonymous Coward

    Its called Failure Modes and Effects Analysis....

    try it sometime. You'll like the results.

    1. Destroy All Monsters Silver badge
      Windows

      Re: Its called Failure Modes and Effects Analysis....

      As I recall, FMEA and/or FMECA or similar analysis are required on any spacegoing systems at ESA, you can be sure this wasn't dropped on the floor.

      Now, according to Jimbo's Patent Entry On Fault Tree Analysis, FMEA is a bit problematic?

      Early in the Apollo project the question was asked about the probability of successfully sending astronauts to the moon and returning them safely to Earth. A risk, or reliability, calculation of some sort was performed and the result was a mission success probability that was unacceptably low. This result discouraged NASA from further quantitative risk or reliability analysis until after the Challenger accident in 1986. Instead, NASA decided to rely on the use of failure modes and effects analysis (FMEA) and other qualitative methods for system safety assessments. After the Challenger accident, the importance of PRA (Probabilistic Risk Assessment) and FTA (Fault Tree Analysis) in systems risk and reliability analysis was realized and its use at NASA has begun to grow and now FTA is considered as one of the most important system reliability and safety analysis techniques.

      Within the nuclear power industry, the U.S. Nuclear Regulatory Commission began using PRA methods including FTA in 1975, and significantly expanded PRA research following the 1979 incident at Three Mile Island. This eventually led to the 1981 publication of the NRC Fault Tree Handbook NUREG–0492, and mandatory use of PRA under the NRC's regulatory authority.

      NUREG-0492 (from 1981, 209 pages) can be obtained from yonder page or apparently for ~12 Brexit Pounds from amazon.co.uk (someone at amazon.com sells it for USD $183.38 wut?). It's very readable.

  8. Anonymous Coward
    Anonymous Coward

    Life somewhat imitating Art

    Stanislaw Lem writes about a failed landing of a 1⁵ ton rocket on Mars in "Ananke" (More Tales of Pirx The Pilot)

    Such was the brain, so overburdened with spurious tasks as to be rendered incapable of dealing with real ones, that stood at the helm of a hundred-thousand-tonner. Each of Cornelius’s computers was afflicted with the “anankastic syndrome”: a compulsion to repeat, to complicate simple tasks; a formality of gestures, a pattern of ritualized behavior. They simulated not the anxiety, of course, but its systemic reactions. Paradoxically, the fact that they were new, advanced models, equipped with a greater memory, facilitated their undoing: they could continue to function, even with their circuits overloaded.

    Still, something in the Agathodaemon’s zenith must have precipitated the end—the approach of a strong head wind, perhaps, calling for instantaneous reactions, with the computer mired in its own avalanche, lacking any overriding function. It had ceased to be a real-time computer; it could no longer model real events; it could only founder in a sea of illusions… When it found itself confronted by a huge mass, a planetary shield, its program refused to let it abort the procedure, which, at the same time, it could no longer continue. So it interpreted the planet as a meteorite on a collision course, this being the last gate, the only possibility acceptable to the program. Since it couldn’t communicate that to the cockpit—it wasn’t a reasoning human being, after all—it went on computing, calculating to the bitter end: a collision meant a 100 percent chance of annihilation, an escape maneuver, a 90-95 percent chance, so it chose the latter: emergency thrust!

    It all made sense. Logical—but without the slightest shred of evidence. It was something unprecedented. How could he confirm his suspicions? The psychiatrist who had treated Cornelius, helped him, given him job clearance? The Hippocratic oath would seal his lips, and the seal of secrecy could be broken only by a court order. Meanwhile, six days from now, the Ares…

    1. arctic_haze

      Re: Life somewhat imitating Art

      You beat me to it. Yes, Ananke was the story which came out of my deep memory the moment I read the article.

      Mars - tick; crash landing - tick; overloaded computer - tick; bad programming - tick.

      The only difference was that the automatic ship Ananke was landing on Mars while we already had bases over there.

    2. energystar
      Gimp

      Re: Life somewhat imitating Art

      The computed massively overloading, contradicting the sensed. Paranoia Mode.

      Beautiful Reference.

  9. Sampler

    Sci-Fi Story

    Wouldn't that make a great Sci-Fi story, the opening sequence half of a major city (New York say, because it's always America) get's destroyed by an alien weapon that impacts and reduces the place to a crater, ~1.6 million wiped out in a second.

    When the next wave turns up in orbit we nuke 'em to hell and the war begins in earnest, man vs invader.

    Only, in the M.Night Shyamalan twist at the end, it wasn't a weapon at all, just a probe to see what we humans were all about, but guidance fucked up and it planted itself in Central Park...

    1. Destroy All Monsters Silver badge
      Alien

      Re: Sci-Fi Story

      Nah, I would redo the ending with humanity having been taken over and euthanised by an evil collusion of pseudo-intelligent talkbots descended from Siri, Tay, Ms. Baidu et al. and badly programmed, insecure IoT devices.

      These proceed to kick the Alien's arse fiercely because, well, they are soulless killer zombie bots from hell.

      In the final scene, Clippy appears on the beleaguered Alien's Mothership Main Screen and goes: "Ding Ding! You seem to be trying to develop automatics. Do you want a little help?". Then everything is blown away.

      THE END!

      1. Mystic Megabyte
        Happy

        Re: Sci-Fi Story

        Not "Ding Ding!" but "DingDong!" :)

        http://www.theinquirer.net/inquirer/news/2478323/amazon-echo-and-google-home-get-take-on-from-bejing-ling-long-ding-dong

    2. ToXik-yogHurt

      Re: Sci-Fi Story

      More or less the plot of 'Battleship'. Aliens arrive by accident, want to go home, borrow one of our telescopes, shoot back when we shoot at them.

      At least, it's more entertaining to watch 'Battleship' if you pretend this is what the plot is supposed to be...

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like