Do not use ".0" release. And if you do, you should know what you are doing. So, hats off to Pavel Goran for volunteering to run it, and then identifying a serious filesystem bug.
'Urgent data corruption issue' destroys filesystems in Linux 4.14
A filesystem-eating bug has been found in Linux 4.14. First reported last week by developer Pavel Goran, the problem struck bcache, a tool that lets one use a solid state disk drive as a read/write cache for another drive. bcache is often used to store data from a slow disk on faster media. Goran noticed the problem after …
COMMENTS
-
-
Wednesday 22nd November 2017 14:29 GMT Anonymous Coward
"Do not use ".0" release. And if you do, you should know what you are doing. "
He's a Gentooer (like myself but far more knowledgeable). You don't run Gentoo and shy away from .0 software. To be honest you normally embrace pre-release, let alone released. That's how bugs get found.
You have to repair your systems from time to time in new and amusing ways but Gentoo is great fun. In winter it will even keep you warm when you do an update so you can turn down the heating.
-
-
-
Wednesday 22nd November 2017 11:14 GMT Chris 155
Re: That's open source or you...
Not it wouldn't.
This would be the case of an early adopter getting their data munched. This would have been found just as fast in closed source.
Possibly the cause was found earlier and a resolution released earlier because of open source. Possibly.
This is the kind of shit that shouldn't get through at all though.
-
Wednesday 22nd November 2017 12:35 GMT CAPS LOCK
"This would have been found just as fast in closed source."
Or would it. Here, for example, is a bug that affected me back when I use Win 7: https://social.technet.microsoft.com/Forums/windows/en-US/13a7426e-1a5d-41b0-9e16-19437697f62b/windows-7-64bit-corrupting-altering-large-files-copied-to-external-ntfs-drives?forum=w7itproperf
It's hard to say how long the bug existed for before being found, but a fix was a long time coming, in fact I don't know if this was ever fixed. It's part of the reason I moved to Linux.
-
-
Wednesday 22nd November 2017 17:51 GMT Captain DaFt
Re: "This would have been found just as fast in closed source."
It's part of the reason I moved to Linux.
Talk about throwing the baby out with the bath water.
There's a work-around - use a 3rd-party tool to copy a file!!
So, your advice is to fix an OS problem by using something else?
Uh, that's what he did, isn't it? ☺
-
-
-
-
-
-
Wednesday 22nd November 2017 10:21 GMT Sandtitz
Re: That's open source or you...
"This is another success story of open source."
Was that irony? If MS had this data corruption bug in Windows there would be dozens of commenters here telling how "Why isn't MS testing their crapware", or "MS is letting end users test their crap", with everyone upvoting each other.
-
Wednesday 22nd November 2017 10:28 GMT Anonymous Coward
Re: That's open source or you...
If MS had this data corruption bug in Windows
I'm a customer of Microsoft, I give them money. My company (among 1000s of others) shovel money in Microsofts bank. They have $millions available for testing and QA.
And, if the report came externally, it would have been ignored until 10 more people reported it. The patch (if any) would be applied next month.
-
Wednesday 22nd November 2017 11:20 GMT Anonymous Coward
Re: That's open source or you...
@Sandtitz
So, you don't recognize that there are some big differences between the latest release of Linux and Windows? Linux does an excellent job of providing a variety of releases in various stages of development. Windows, the OS used in most businesses desktops/laptops it appears, has become a beta. If you you use a significantly recent version Linux you can expect similar. If you use a Linux release that is a little older you'll have greater stability. With Linux, whether you've chosen a newer or older release, you'll have software updates and bug fixes very regularly and for a whole range of things. With Windows, one gets the feeling, that you get updates and bug fixes when convenient for Microsoft and sometimes, seemingly, only because of the embarrassment of OSS OSs having reported and fixed said issues already.
-
-
-
Thursday 23rd November 2017 12:52 GMT Chairman of the Bored
Re: This is another success story of open source.
@bitbeisser,
Respectfully disagree here. Professional software designers do test extensively; and believe me - open or closed source the devs are pros who take pride in their work.
Bugs in the wild though will happen due to the sheer complexity of the system - for any decently complex system an full factorial experiment of all potential decision paths is infeasible for any reasonable length of time. One is literally trying to prove a negative.
Suggested link for starters: https://users.ece.cmu.edu/~koopman/des_s99/sw_testing/
What separates the men from the boys is how you handl a bug or design flaw. Ten days cycle time on a single report is v good.
-
-
-
Wednesday 22nd November 2017 10:22 GMT Anonymous Coward
Re: That's open source or you...
bcache isn't something the majority of users use (or even heard of)*
It took ~10 days for this bug to be discovered, fixed, and released. I think that's pretty good.
I'm not sure what you have against open source. If it's because you're a monkey-dancing Microsoft fan-boy then you should know Microsoft are github's largest org, with the most contributors. If there are other reasons, then please let us know.
*it's used to turn an SSD into a cache for HDDs
-
-
-
-
Wednesday 22nd November 2017 09:43 GMT jake
Re: Slackware-current ...
I live on -stable for important stuff, I'm not an idiot. That would be kernel 4.4.88-smp at the moment. But I also run -current on a couple of spare boxen, and report any errata I run across, along with workarounds/fixes if I can. It's called giving back. Try it, you might like it. Or you can just bitch about those of us who do, if that makes you feel good about yourself.
-
-
-
Wednesday 22nd November 2017 10:15 GMT jake
Re: Slackware-current ...
It occurs to me that folks might not know how Slackware does things. Essentially, there is an LTS version called slackware-stable, with a very stable, solid software package (if not the most modern), and a "work in progress" version called slackware-current that is a kind of rolling release, aiming to be the next -stable. More at slackware.com/info/ and slackware.com/changelog/ if you're interested.
-
This post has been deleted by its author
-
-
-
Thursday 23rd November 2017 14:04 GMT FIA
Re: Slackware-current ...
I'd dread to think what this industry would be like if it weren't for Linux
-
-
-
-
Wednesday 22nd November 2017 13:30 GMT BinkyTheMagicPaperclip
Crap, I think I am using that..
Not lost any data so far, but it's not that rare to see drives drop out of an mdraid for no discernible reason, including one time where a mirror completely failed to assemble despite one of the devices it used being a partition on the SSD the system had just booted from(!).
You say Windows isn't so great, but I've found its software RAID to be absolutely rock solid with sensible defaults. Not so Linux RAID, it's a pain in the arse. Going to put my backend file server on FreeBSD/ZFS..
-
-
Thursday 23rd November 2017 09:41 GMT BinkyTheMagicPaperclip
Re: Crap, I think I am using that..
How are you finding performance? I did a little reading around and there were some concerns over maintaining kernels and the level of performance.
I'm running Salix, so currently doing custom kernel builds..
If it is ok it would make a lot of sense to move to it, as medium term I want to use FreeBSD as a base once the functionality I need is included. I've plenty of ECC memory to spare..
-
-
Thursday 23rd November 2017 09:27 GMT Anonymous Coward
Re: Crap, I think I am using that..
Yeah, I've had random problems with mdraid too. But then h/w raid is far from perfect these days, especially when rebuilding arrays after a failure, now that 'disks' have become so large; it's likely to take a couple of days and then fail to successfully complete anyway. I much prefer JBOD based systems now, and make multiple backup copies (using standard os tools - not proprietary packages) frequently.
-
-
-
Wednesday 22nd November 2017 16:07 GMT BinkyTheMagicPaperclip
Yeah.. If you shut down a system with an mdraid RAID10, and on bootup one of the drives isn't there, it won't establish the RAID by default on the infinitesimal grounds that something might be corrupt, except for the fact it stopped and started in exactly the same state.
How is that remotely sensible? Not to mention that there is some combination of operations where it's possible for a mirror not to start up at all (one member offline, the other member has a very brief blip perhaps). That's not even getting started on the ridiculous rebuild times when a disk magically becomes slightly out of sync.
Windows is a little finicky whether it does certain things in standard vs dynamic disks, but other that it's quite straightforward.
Still it's possibly better than ESXi which can't be arsed to implement any software RAID at all..
-
-
Thursday 23rd November 2017 09:24 GMT BinkyTheMagicPaperclip
The shutdown/startup issue is 'working as designed', there's a kernel boot parameter to allow an array to start up in a degraded state. I don't agree with this, but it's easy to work around.
The long rebuild times, as far as I can make out, are normal.
As to the occasional dropouts and the mirror failing to establish itself, the latter has only happened once - nothing much in the logs. Haven't looked at increased logging for the other case.
-
-
-
-
Wednesday 22nd November 2017 21:34 GMT Anonymous Coward
I have just encountered a simple fault on AWS. Maybe something related to this.
I made a 1 character change to a file.The change was effective, and the timestamp for the file was correct. Went back the next day and it had reverted to the original file & timestamp. I have repeated this again today with the same result.
-
Friday 24th November 2017 04:47 GMT Henry Wertz 1
difference here though...
"If MS had this data corruption bug in Windows there would be dozens of commenters here telling how "Why isn't MS testing their crapware", or "MS is letting end users test their crap", with everyone upvoting each other.""
And rightfully so, usually; gentoo it's typical to either run very recent or bleeding edge versions of almost every package. You would not have seen this bug running any typical Linux distro. People tend to make comments regarding Microsoft's mistakes (and upvote it!) when people run the regular release of Windows, update it, and run into big problems; not when they are running something like the Win10 bleeding edge channel and run into problems.