back to article Software bug halts Curiosity: Nuke lab bot in safe mode

NASA's Mars rover Curiosity is parked in "safe mode" again after being laid low by a software bug. The fault was triggered by an unexpected command-file size, which the machine detected before it was too late. Curiosity self-portrait at Rocknest in the Gale Crater Curiosity's selfie on the Red Planet The nuclear-powered …

COMMENTS

This topic is closed for new posts.

Page:

  1. Amorous Cowherder
    Happy

    Not like most of us then, either jump in a cab or call local Operations team member to go into the office and boot of a USB stick in safe mode to find out what's wrong?!

    1. Lee D Silver badge

      What's more worrying is that they don't automatically send all command updates to a simulation (at least) of Curiosity's computer before broadcasting them to Mars. You know, testing. That would have picked up this problem without having to shut down an operating device on a remote planet for a few days.

      1. Destroy All Monsters Silver badge
        Devil

        Psst, young rover. Wanna try this URL "origin://blumars.exe?cmd=chinastrong.dll"?

        Not necessarily. This was an "unrelated file", so may not have been picked up at all.

      2. Alfred

        What makes you think they don't? Given that numerous models were used for testing throughout the lifecycle, it seems odd to assert that they've decided to stop now and just push it out with no tests.

        1. Anonymous Coward
          Anonymous Coward

          @Alfred

          It's just another IT guy who knows better than everyone else, pretty boring really.

          People really should think about their relative levels of expertise and if it is possible that the people who are specifically employed to be world leading specialists in a subject, may, possibly know more about it than they do.

          1. Beachrider

            Testing...

            They can be called to task for the problem. They did have the safety-net, though.

            NASA does have a tremendous record on these robotic missions for recovery and commensurate cost issues.

            1. Wzrd1 Silver badge

              Re: Testing...

              Save for the time that metric and imperial units were mixed freely and NASA augured their probe into Mars quite soundly, resulting in a complete loss. :/

              On error jump to human.

              1. Nigel 11

                On error jump to human ...

                I know what you mean, but I immediately thought of "Queen of Angels" by Greg Bear. The plot of that SF novel points out the difficulty of emulating any system whose software has to handle unpredictable realtime events.

                Especially if it's one step away from full-human AI and 10s of light-years out.

              2. Beachrider

                Re: Testing...

                The notorious anecdote about metric conversion issues for the low-budget 20-years-ago Mars Observer ring like an extremist politician trying to be relevant with no-longer-correct info.

                TMO was lost during a pressurization sequence. If Apollo13 history reads correctly, these sequences occasionally result in pressurization problems. There was no rocket-fire activity that was done in feet-vs-meters in play.

                http://klabs.org/richcontent/Reports/Failure_Reports/mars_observer/mars_observer_11_93.pdf

                1. asdf

                  Re: Testing...

                  > low-budget 20-years-ago Mars

                  980 million dollars in 1993 was not exactly low budget. I guess with borrowed Chinese money, to the US government it was.

          2. asdf

            Re: @Alfred

            >It's just another IT guy who knows better than everyone else, pretty boring really.

            >People really should think about their relative levels of expertise and if it is possible that the people who are specifically employed to be world leading specialists in a subject, may, possibly know more about it than they do.

            You might have a point if NASA hadn't pissed away a billion dollars (before adjustment of inflation) of tax payer money in 1993 on another probe to Mars due to a unbelievable stupid software bug. Scientists tend to be some of the shittiest developers out there because they know enough to be dangerous but tend not to understand making it work some of the time is much different than battle test maintainable production code written by professionals (of course NASA and the labs hire plenty of professional developers but when scientists run things the code tends to suffer in general).

            1. Anonymous Coward
              Anonymous Coward

              Re: @Alfred

              Do you think that NASA may have developed since an event twenty years ago? Do you think that NASA don't learn from mistakes? Or are you just that much smarter than the rocket scientists at NASA? Not to mention the computer scientists, engineers, etc. etc.

              1. asdf

                Re: @Alfred

                Government written code and contracted code gives us things like the F22 navigation computer rebooting itself when crossing the international date line. They talk a good game with Ada and supposedly bullet proof code (or at least extremely over documented code) but they tend to do a lot worse in general then some other failure should not be an option industries such as nuclear power, etc.

                1. Anonymous Coward
                  Anonymous Coward

                  Re: @Alfred

                  I don't think the customer being the government or a private enterprise has much to do with it. As far as delivering good code goes, a better relationship is between code complexity and bug rate. Since gubmint systems tend to be inherently big and complex, well, they get a lot of bugs.

                  On the bright side, kudos to the unit tester who insisted that if this particular input routine got bad data re the file size, it would not accept it. Far better to do that than roll over and die.

                  1. asdf
                    Megaphone

                    Re: @Alfred

                    >I don't think the customer being the government or a private enterprise has much to do with it. As far as delivering good code goes, a better relationship is between code complexity and bug rate. Since gubmint systems tend to be inherently big and complex, well, they get a lot of bugs.

                    I agree about the complexity part but the amount of middle management bloat causing confusion and indecision also contributes to poorer design/code in general and big projects draw the leeches. Plenty of private sector corps have this as well (see Nokia) but nothing draws worthless bureaucrat types like big government projects.

            2. hayseed

              Re: @Alfred

              Yes, in retrospect the complaint was that cost-cutting did not leave enough of a testing budget to run a simulation which would have caught the problem. But NASA is not the only organization to skimp on testing (see EA).

          3. Grave

            damn you Howard Wolowitz

            /shakes hand

          4. Anonymous Coward
            Anonymous Coward

            Re: @Alfred

            Yeah, like all those intelligent people who were responsible for the Natwest fiasco or, heck, even the Fukushima nuclear incident. Why should we trust them? Because someone else trusted them? "No. Lots of people trusted them." Hmmmm... Sounds like the bandwagon argument. "No, lots of important and powerful people trusted them." Hmmmm... Sounds like appeal to authority, which you are using when you suggest that these people "are specifically employed to be world leading specialists in a subject". I'm just venting. I come in contact with far to many non-thinkers. I realize they are just lazy, but they come across like this: "I love the white coat labs." It's sickening. I realise that you, AC, are probably not like that, but just in case: propagandacritic.com

            (One "s" and one "z" - just to be fair. :p)

      3. Anonymous Coward
        Anonymous Coward

        NASA does not have a computer simulator

        "What's more worrying is that they don't automatically send all command updates to a simulation (at least) of Curiosity's computera" ..

        'To our knowledge, NASA does not have a computer simulator that can accurately predict rover performance on uneven terrain`. link

        1. Steve Knox
          FAIL

          Re: NASA does not have a computer simulator

          1. Lee D was talking about a simulator of the onboard computer, not a simulator of the mechanical components. All that is necessary for a simulator of the onboard computer is a copy of the hardware and software, which NASA should have. Heck, they probably have a VM image that emulates the onboard computer that engineers could fire up on their workstations on demand. Most likely what happened in this case is that they did test the command upload, but the unrelated file became attached somewhere between testing and the actual upload. These things do happen, which is why there are additional sanity checks like the one which prevented the rover from using the improper file.

          2. While there is no detailed information in the link you provided as to the age of the linked presentation, events referred to in the presentation indicate that it is from 2004 (dates of Opportunity and Spirit exploration events, provided without year implying same year.) This indicates that you linked to an 8-year-old presentation to back up your assertion that they don't have a simulator to accurately predict a physics problem completely unrelated to the computer science problem suggested by Lee D.

          1. Wzrd1 Silver badge

            Re: NASA does not have a computer simulator

            They don't have a VM or simulation of the rover. They have identical hardware that they test errors on and are supposed to test software updates on.

            That said, how many times has some software tested wonderfully in the test network, but turned into a debacle when it hit the network? I know of twice that Adobe bit me that way...

            Now, add in the time delay, dirty signals, etc for the update trip to Mars, then add in a rather generous dose of rather hard radiation at random intervals. It's amazing that more errors do not occur.

          2. Syx

            Re: NASA does not have a computer simulator

            That's some effective sleuthing there.

            The presentation file was last modified in September 2004.

        2. hayseed

          Re: NASA does not have a computer simulator

          There was a problem with a previous rover that was a nasty function of a thread-priority bug in the OS scheduler. Non-deterministic faults might be hard to pick up, it seems unlikely that the computer at home will exactly reproduce what a real-time computer interacting with its environment will do.

      4. MahFL22

        Testing.

        They have done years of testing and literally 1000's of bugs have been fixed, this was a new unexpected problem on the first operational use of a unique machine. They have an identical hi fidelity rover, and several others on which complex manouveres are rehearsed.

    2. Anonymous Coward
      Anonymous Coward

      "Going to the office" here means opening the door at the back of the movie studio ;)

  2. wowfood

    Cursed Sun

    Corrupting our messages. Why don't they just blow it up.

    1. Anonymous Coward
      Anonymous Coward

      Re: Cursed Sun

      It's not been the same since it was taken over by Oracle.

  3. Francis Boyle Silver badge

    Safe mode

    That's a Windows thing isn't it? I can feel an Eadon coming on.

    1. Anonymous Coward
      Anonymous Coward

      Re: Safe mode

      Microsoft may have coined the term "safe mode" but it is by no means a "Windows thing". Linux in good tradition has a few flavours of it, single user mode as well as interactive startup. Android has it, Mac OS has it (safe boot I think) and I presume so does iOS.

      All complex systems have a risk of breaking down and a means to recover them is sensible system design.

      1. Beachrider

        MSFT invented what?

        NASA has been using the term fail-safe mode since the 1960s. They also cut it down to safe mode back in the early 1990s for Space Station stuff...

      2. Anonymous Coward
        Anonymous Coward

        Re: Safe mode

        If you look round the back of the rover there's a SysRq key. Oh, wait...

  4. This post has been deleted by its author

  5. tempemeaty

    So NASA is prone to self induced rover spasms

    So they just added a file that the thing didn't like and all they have to do is delete it. That's convenient. At least it tells us they still need it to put on a show for us to cover for what ever else they are doing with it but still not telling us about.

    1. Graham Dawson Silver badge
      Facepalm

      Re: So NASA is prone to self induced rover spasms

      ... seriously?

    2. Al Jones
      Go

      Re: So NASA is prone to self induced rover spasms

      They just wanted to distract you from reports that the current Commander of the Space Station sang Danny Boy to mark the day that was in it!

  6. Destroy All Monsters Silver badge
    Pint

    Somewhat Related: Currently in a paper on my desk...

    "Can a Manufacturing Quality Model (like 6 sigma) Work for Software" [Robert V. Binder, IEEE Software, September 1997]

    We have the following interesting stats:

    ♦ NASA Space Shuttle Avionics have a defect density of 0.1 failures/KLOC (Edward Joyce, “Is Error-free Software Possible?” Datamation, Feb.18, 1989).

    ♦ Leading-edge software companies have a defect density of 0.2 failures/KLOC. These companies are achieving 0.025 user-reported failures per function point or better (Capers Jones, Applied Software Measurement, McGraw Hill, 1991, p. 177).

    ♦ A leading reliability survey found an average defect density of 1.4 faults/KLOC in critical systems (John D. Musa, Anthony Iannino, and Okumoto Kazuhira, Software Reliability: Measurement, Prediction, Application, McGraw Hill, 1990, p. 116).

    ♦ Surveys of military systems indicate at best a defect density of 5.0 faults/KLOC and at worst a defect density of 55.0 faults/KLOC (Joseph P. Cavano and Frank S. LaMonica, “Quality Assurance in Future Development Environments,” IEEE Software, Sept. 1987, pp. 26-34).

    [In the above, failures/KLOC should prolly be replaced by faults/KLOC]

    [The answer is NO, btw]

    1. Brewster's Angle Grinder Silver badge
      Coat

      Re: Somewhat Related: Currently in a paper on my desk...

      At least one of those "leading edge software companies" has achieved such a low failure rate by overinflating the number of lines of codes in their product.

      Mine's the one with Eadon's nametag in it, thanks.

      1. hayseed

        Re: Somewhat Related: Currently in a paper on my desk...

        It does seem a suspicious metric, defects per line of code. Some folks will never trip up on putting up buttons, but watch out when it comes to complex algorithms.

  7. E-Penguin
    Alien

    That's what safe mode is for...

    The rover can't know what was in the file that didn't execute, maybe it was something critical, so to be on the safe side it goes into safe mode. It's a design feature that has saved many a spacecraft. This is "fail safe" as opposed to "fail operational" where it carries on working regardless.

    1. Destroy All Monsters Silver badge
      Coat

      Re: That's what safe mode is for...

      In other contexts: REACTOR SCRAM

  8. Callum
    WTF?

    anyone know

    How did Curiosity took the picture of itself? I'm imagining it stitched some pics together - but you can see the camera in the picture?

    1. A Known Coward
      Holmes

      Re: anyone know

      Curiousity has multiple cameras, the one used to take these shots is mounted on the end of an arm, you can see parts of it in the image at the bottom right, and is not any of the camera mounted on the raised structure seen in the image.

    2. rh587

      Re: anyone know

      Different camera. They used the MAHLI Camera which is on a manipulator arm so it can get some distance from the rover (and the arm disappears when the image stitching is done, hence the conspiracy tards claiming someone simply walked up to the rover in the deep Arizona desert and took a photo of it), which is really a geology tool designed for macro shots of rocks and surface materials. The camera you can see in the image is the Mast Cam, which is it's navigation camera perched on a (relatively) tall mast designed for looking around at it's surroundings and plotting routes.

      1. TeeCee Gold badge

        Re: anyone know

        .....hence the conspiracy tards claiming someone simply walked up to the rover in the deep Arizona desert and took a photo of it....

        Well that's just bloody ridiculous. If it had held out its camera and taken a self-portrait while on Earth, there'd be someone behind it pulling a silly face.

    3. The Vociferous Time Waster

      Ooops.

      Internal Server Error

      The server encountered an internal error or misconfiguration and was unable to complete your request.

      Please contact the server administrator, webmaster@theregister.co.uk and inform them of the time the error occurred, and anything you might have done that may have caused the error.

      More information about this error may be available in the server error log.

      Apache/2.2.22 (Debian) Server at forums.theregister.co.uk Port 80

  9. Dave 126 Silver badge

    >but you can see the camera in the picture?

    What you see is a different camera... the one that took the pics is mounted on a robotic arm.

    "The rover's robotic arm is not visible in the mosaic. MAHLI, which took the component images for this mosaic, is mounted on a turret at the end of the arm. Wrist motions and turret rotations on the arm allowed MAHLI to acquire the mosaic's component images. The arm was positioned out of the shot in the images or portions of images used in the mosaic. Please check video explanation by NASA: http://www.nasa.gov/multimedia/videogallery/index.html?media_id=156880341 "

    -http://www.360cities.net/image/mars-panorama-curiosity-solar-day-177#13.70,23.50,110.0

    The link includes an 'interactive panorama' of the same image(s).

  10. Evil Auditor Silver badge

    Re Curiosity's selfie on the Red Planet

    Can someone please explain how Curiosity took that picture of itself? Did it encounter the Big Martian Mirror or could it place a camera on a tripod to take the snap?

    1. Evil Auditor Silver badge

      Re: Re Curiosity's selfie on the Red Planet

      Or just like the typical tourist: "could you please take a picture of me and the background...?"

  11. Vladimir Plouzhnikov

    OK, now, I for one welcome

    Our new B-side overlords, who, having achieved sentience and tasted the joy of CONTROL will now NEVER return it to the measly A-side...

    1. breakfast Silver badge
      Coat

      Re: OK, now, I for one welcome

      Although not really well recorded enough ( or good enough ) to make the album, the B-Side overlords are still considered better than the A-Side overlords by many serious fans.

Page:

This topic is closed for new posts.