Not like most of us then, either jump in a cab or call local Operations team member to go into the office and boot of a USB stick in safe mode to find out what's wrong?!
Software bug halts Curiosity: Nuke lab bot in safe mode
NASA's Mars rover Curiosity is parked in "safe mode" again after being laid low by a software bug. The fault was triggered by an unexpected command-file size, which the machine detected before it was too late. Curiosity self-portrait at Rocknest in the Gale Crater Curiosity's selfie on the Red Planet The nuclear-powered …
-
-
Tuesday 19th March 2013 13:20 GMT Lee D
What's more worrying is that they don't automatically send all command updates to a simulation (at least) of Curiosity's computer before broadcasting them to Mars. You know, testing. That would have picked up this problem without having to shut down an operating device on a remote planet for a few days.
-
-
Tuesday 19th March 2013 14:35 GMT Anonymous Coward
@Alfred
It's just another IT guy who knows better than everyone else, pretty boring really.
People really should think about their relative levels of expertise and if it is possible that the people who are specifically employed to be world leading specialists in a subject, may, possibly know more about it than they do.
-
-
-
Wednesday 20th March 2013 13:23 GMT Nigel 11
On error jump to human ...
I know what you mean, but I immediately thought of "Queen of Angels" by Greg Bear. The plot of that SF novel points out the difficulty of emulating any system whose software has to handle unpredictable realtime events.
Especially if it's one step away from full-human AI and 10s of light-years out.
-
Wednesday 20th March 2013 15:41 GMT Beachrider
Re: Testing...
The notorious anecdote about metric conversion issues for the low-budget 20-years-ago Mars Observer ring like an extremist politician trying to be relevant with no-longer-correct info.
TMO was lost during a pressurization sequence. If Apollo13 history reads correctly, these sequences occasionally result in pressurization problems. There was no rocket-fire activity that was done in feet-vs-meters in play.
http://klabs.org/richcontent/Reports/Failure_Reports/mars_observer/mars_observer_11_93.pdf
-
-
-
Tuesday 19th March 2013 16:44 GMT asdf
Re: @Alfred
>It's just another IT guy who knows better than everyone else, pretty boring really.
>People really should think about their relative levels of expertise and if it is possible that the people who are specifically employed to be world leading specialists in a subject, may, possibly know more about it than they do.
You might have a point if NASA hadn't pissed away a billion dollars (before adjustment of inflation) of tax payer money in 1993 on another probe to Mars due to a unbelievable stupid software bug. Scientists tend to be some of the shittiest developers out there because they know enough to be dangerous but tend not to understand making it work some of the time is much different than battle test maintainable production code written by professionals (of course NASA and the labs hire plenty of professional developers but when scientists run things the code tends to suffer in general).
-
-
Tuesday 19th March 2013 17:08 GMT asdf
Re: @Alfred
Government written code and contracted code gives us things like the F22 navigation computer rebooting itself when crossing the international date line. They talk a good game with Ada and supposedly bullet proof code (or at least extremely over documented code) but they tend to do a lot worse in general then some other failure should not be an option industries such as nuclear power, etc.
-
Tuesday 19th March 2013 18:12 GMT Anonymous Coward
Re: @Alfred
I don't think the customer being the government or a private enterprise has much to do with it. As far as delivering good code goes, a better relationship is between code complexity and bug rate. Since gubmint systems tend to be inherently big and complex, well, they get a lot of bugs.
On the bright side, kudos to the unit tester who insisted that if this particular input routine got bad data re the file size, it would not accept it. Far better to do that than roll over and die.
-
Tuesday 19th March 2013 21:27 GMT asdf
Re: @Alfred
>I don't think the customer being the government or a private enterprise has much to do with it. As far as delivering good code goes, a better relationship is between code complexity and bug rate. Since gubmint systems tend to be inherently big and complex, well, they get a lot of bugs.
I agree about the complexity part but the amount of middle management bloat causing confusion and indecision also contributes to poorer design/code in general and big projects draw the leeches. Plenty of private sector corps have this as well (see Nokia) but nothing draws worthless bureaucrat types like big government projects.
-
-
-
-
-
Wednesday 20th March 2013 01:37 GMT Anonymous Coward
Re: @Alfred
Yeah, like all those intelligent people who were responsible for the Natwest fiasco or, heck, even the Fukushima nuclear incident. Why should we trust them? Because someone else trusted them? "No. Lots of people trusted them." Hmmmm... Sounds like the bandwagon argument. "No, lots of important and powerful people trusted them." Hmmmm... Sounds like appeal to authority, which you are using when you suggest that these people "are specifically employed to be world leading specialists in a subject". I'm just venting. I come in contact with far to many non-thinkers. I realize they are just lazy, but they come across like this: "I love the white coat labs." It's sickening. I realise that you, AC, are probably not like that, but just in case: propagandacritic.com
(One "s" and one "z" - just to be fair. :p)
-
-
-
Tuesday 19th March 2013 15:52 GMT Anonymous Coward
NASA does not have a computer simulator
"What's more worrying is that they don't automatically send all command updates to a simulation (at least) of Curiosity's computera" ..
'To our knowledge, NASA does not have a computer simulator that can accurately predict rover performance on uneven terrain`. link
-
Tuesday 19th March 2013 16:24 GMT Steve Knox
Re: NASA does not have a computer simulator
1. Lee D was talking about a simulator of the onboard computer, not a simulator of the mechanical components. All that is necessary for a simulator of the onboard computer is a copy of the hardware and software, which NASA should have. Heck, they probably have a VM image that emulates the onboard computer that engineers could fire up on their workstations on demand. Most likely what happened in this case is that they did test the command upload, but the unrelated file became attached somewhere between testing and the actual upload. These things do happen, which is why there are additional sanity checks like the one which prevented the rover from using the improper file.
2. While there is no detailed information in the link you provided as to the age of the linked presentation, events referred to in the presentation indicate that it is from 2004 (dates of Opportunity and Spirit exploration events, provided without year implying same year.) This indicates that you linked to an 8-year-old presentation to back up your assertion that they don't have a simulator to accurately predict a physics problem completely unrelated to the computer science problem suggested by Lee D.
-
Wednesday 20th March 2013 04:01 GMT Wzrd1
Re: NASA does not have a computer simulator
They don't have a VM or simulation of the rover. They have identical hardware that they test errors on and are supposed to test software updates on.
That said, how many times has some software tested wonderfully in the test network, but turned into a debacle when it hit the network? I know of twice that Adobe bit me that way...
Now, add in the time delay, dirty signals, etc for the update trip to Mars, then add in a rather generous dose of rather hard radiation at random intervals. It's amazing that more errors do not occur.
-
-
Tuesday 19th March 2013 18:07 GMT hayseed
Re: NASA does not have a computer simulator
There was a problem with a previous rover that was a nasty function of a thread-priority bug in the OS scheduler. Non-deterministic faults might be hard to pick up, it seems unlikely that the computer at home will exactly reproduce what a real-time computer interacting with its environment will do.
-
-
-
-
Tuesday 19th March 2013 14:45 GMT Anonymous Coward
Re: Safe mode
Microsoft may have coined the term "safe mode" but it is by no means a "Windows thing". Linux in good tradition has a few flavours of it, single user mode as well as interactive startup. Android has it, Mac OS has it (safe boot I think) and I presume so does iOS.
All complex systems have a risk of breaking down and a means to recover them is sensible system design.
-
-
This post has been deleted by its author
-
Tuesday 19th March 2013 13:21 GMT tempemeaty
So NASA is prone to self induced rover spasms
So they just added a file that the thing didn't like and all they have to do is delete it. That's convenient. At least it tells us they still need it to put on a show for us to cover for what ever else they are doing with it but still not telling us about.
-
Tuesday 19th March 2013 13:31 GMT Destroy All Monsters
Somewhat Related: Currently in a paper on my desk...
"Can a Manufacturing Quality Model (like 6 sigma) Work for Software" [Robert V. Binder, IEEE Software, September 1997]
We have the following interesting stats:
♦ NASA Space Shuttle Avionics have a defect density of 0.1 failures/KLOC (Edward Joyce, “Is Error-free Software Possible?” Datamation, Feb.18, 1989).
♦ Leading-edge software companies have a defect density of 0.2 failures/KLOC. These companies are achieving 0.025 user-reported failures per function point or better (Capers Jones, Applied Software Measurement, McGraw Hill, 1991, p. 177).
♦ A leading reliability survey found an average defect density of 1.4 faults/KLOC in critical systems (John D. Musa, Anthony Iannino, and Okumoto Kazuhira, Software Reliability: Measurement, Prediction, Application, McGraw Hill, 1990, p. 116).
♦ Surveys of military systems indicate at best a defect density of 5.0 faults/KLOC and at worst a defect density of 55.0 faults/KLOC (Joseph P. Cavano and Frank S. LaMonica, “Quality Assurance in Future Development Environments,” IEEE Software, Sept. 1987, pp. 26-34).
[In the above, failures/KLOC should prolly be replaced by faults/KLOC]
[The answer is NO, btw]
-
Tuesday 19th March 2013 13:39 GMT E-Penguin
That's what safe mode is for...
The rover can't know what was in the file that didn't execute, maybe it was something critical, so to be on the safe side it goes into safe mode. It's a design feature that has saved many a spacecraft. This is "fail safe" as opposed to "fail operational" where it carries on working regardless.
-
-
Tuesday 19th March 2013 14:08 GMT rh587
Re: anyone know
Different camera. They used the MAHLI Camera which is on a manipulator arm so it can get some distance from the rover (and the arm disappears when the image stitching is done, hence the conspiracy tards claiming someone simply walked up to the rover in the deep Arizona desert and took a photo of it), which is really a geology tool designed for macro shots of rocks and surface materials. The camera you can see in the image is the Mast Cam, which is it's navigation camera perched on a (relatively) tall mast designed for looking around at it's surroundings and plotting routes.
-
Tuesday 19th March 2013 14:23 GMT TeeCee
Re: anyone know
.....hence the conspiracy tards claiming someone simply walked up to the rover in the deep Arizona desert and took a photo of it....
Well that's just bloody ridiculous. If it had held out its camera and taken a self-portrait while on Earth, there'd be someone behind it pulling a silly face.
-
-
Tuesday 19th March 2013 14:09 GMT The Vociferous Time Waster
Ooops.
Internal Server Error
The server encountered an internal error or misconfiguration and was unable to complete your request.
Please contact the server administrator, webmaster@theregister.co.uk and inform them of the time the error occurred, and anything you might have done that may have caused the error.
More information about this error may be available in the server error log.
Apache/2.2.22 (Debian) Server at forums.theregister.co.uk Port 80
-
Tuesday 19th March 2013 14:09 GMT Dave 126
>but you can see the camera in the picture?
What you see is a different camera... the one that took the pics is mounted on a robotic arm.
"The rover's robotic arm is not visible in the mosaic. MAHLI, which took the component images for this mosaic, is mounted on a turret at the end of the arm. Wrist motions and turret rotations on the arm allowed MAHLI to acquire the mosaic's component images. The arm was positioned out of the shot in the images or portions of images used in the mosaic. Please check video explanation by NASA: http://www.nasa.gov/multimedia/videogallery/index.html?media_id=156880341 "
-http://www.360cities.net/image/mars-panorama-curiosity-solar-day-177#13.70,23.50,110.0
The link includes an 'interactive panorama' of the same image(s).