back to article Bug of the month: Cache flow problem crashes Samsung phone apps

It's not been a good summer for Samsung. It packed its Galaxy Note 7 smartphones with detonating batteries, sparking a global recall. And its whizzy Exynos 8890 processor, which powers the Note 7 and the Galaxy S7 and S7 Edge, is tripping up apps with seemingly bizarre crashes – from null pointers to illegal instruction …

  1. Korev Silver badge

    Mono

    Dumb question: is this the same Mono as in the open source .Net?

    1. Hans Neeson-Bumpsadese Silver badge

      Re: Mono

      Supplementary dumb question: if it is, then does this also apply to Xamarin, as that is basically an implementation of Mono as I understand it?

      1. Ben Tasker

        Re: Mono

        Yes and Yes.

        Xamarin bug is here - https://bugzilla.xamarin.com/show_bug.cgi?id=39859

        Edit: clicky

        1. Korev Silver badge
          Pint

          Re: Mono

          Ta

    2. Pirate Dave Silver badge
      Pirate

      Re: Mono

      Here's a dumber question (from a guy who learned a little assembly on the 8088) - why does the OS clear the CPU's cache? I thought the CPU was supposed to be in charge of stuff like that. But I admit my knowledge is quite dated.

      1. 21mhz

        Re: Mono

        Here, it has to be done by applications even, not the OS. The reason is, modifying program's own code is something that's "not normally done", so CPUs are not designed to automatically invalidate instruction cache lines when the associated memory gets modified. Even if that was the case, you still need to ensure the newly written code is flushed through to RAM from the _data_ cache before you can safely jump to it in a multi-core system.

        1. Pirate Dave Silver badge
          Pirate

          Re: Mono

          Ah, thanks for the explanation. That makes sense.

  2. DCFusor
    FAIL

    Self-modifying code

    Really? AFAIK, that's been deprecated since I learned to program on a PDP-8, well before Intel and of course ARM existed, and also of course well before this cache thing that attempts to hide lousy memory latency and bandwidth from a CPU so manuf's can deny there's been basically little progress in RAM all these years.

    I'm sorry guys, you'e not smart enough to do self modifying code right. One really smart guy might pull it off on a small project, but no team will ever succeed at that.

    Heck, even overuse of globals for "spooky action at a distance" is bad practice. And this at the system level?

    1. ThomH

      Re: Self-modifying code

      I'm not sure I agree. This isn't a case of somebody deciding that a store absolute + a load absolute is six bytes and eight cycles but a store absolute + a load immediate is five bytes and six cycles so they'll do the less readable thing, it's a compiler just like any other compiler except that compilation is just-in-time and somebody didn't think enough to realise that constants you read from your processor may not be subjectively constant if processors are heterogeneous.

      So as to the code generation itself, this is just a compiler doing exactly what compilers have always done. It isn't self modifying. It's one actor, and it's outputting another — not modifying it, and not modifying itself.

      They've just messed up the announcement of completion.

    2. Byz

      Re: Self-modifying code

      I once had to look after self modifying code in the 1980s, it was a real lesson as overtime you listed code it was different :)

      Making changes to the code that modified was fun as well as bugs tended to eat the whole system (you learn't the value of backups when developing).

      I was then tasked with writing a report generator that looked at the system and worked out where the relevant data was and created the report. It is the only time in my career that I've had to use triple indirection (used double many times) and recursion together. I used to come home with stunning headaches and next day spend an hour working out what I'd written the previous day.

      After a few weeks I'd got it working and written a user interface for selecting the data you wanted and how to layout the report type if you wanted a new report. All fully documented :)

      I left a few months later and then came back a few years later to see that no-one else had ever generated another report type after I left. The reason... you had to understand data structures in the original system to build a report and no one could be bothered to learn. Some programmers tried building static programs to build reports, however when system modified they stopped working :)

    3. Loud Speaker

      Re: Self-modifying code

      I think you will find RAM is quite a bit faster than in the days of a PDP8. Did you ever use an 8S?

      And self-modifying was actually quite useful in dedicated purpose low power microprocessors. In fact, with FPGAs, we can also have self-modifying hardware.

      However, now that the average phone can outperform a Cray, what is worth implementing? One does have to wonder quite where sanity lies!

      Personally, I am not going to buy a smart watch till it has a SCSI interface and tape backup. (At the present rate, I expect that to be before 2020).

      1. Anonymous Coward
        Anonymous Coward

        Re: Self-modifying code

        "However, now that the average phone can outperform a Cray, what is worth implementing? One does have to wonder quite where sanity lies!"

        You should note that most of the guilty programs are emulators: in particular, emulators of some pretty recent systems. The general rule of thumb is that the host system needs to be overpowered by a factor of about 100 to reliably emulate the system. You can reduce it, though, if you account for newer CPU optimizations. Emulation optimizations like DynaRec and JIT also help to reduce that ratio.

      2. Crazy Operations Guy

        "I think you will find RAM is quite a bit faster than in the days of a PDP8."

        In terms of raw speed, yes, but not as a factor of processor core speed. Many of the old dinosaurs would be equipped with RAM that operated with short enough latency that the requested data would be at the processor before the next instruction even begins to execute. In modern systems, you have to initiate the copy operation, then wait 20+ cycles before the data is available to be used by the processor.

        1. martinusher Silver badge

          Re: "I think you will find RAM is quite a bit faster than in the days of a PDP8."

          There's a number of things that don't quite make sense here. I'd guess the real villain is the lack of a 'dirty' bit for the instruction cache to automatically invalidate cache lines when written by something other than the processor executing the code. This would make sense in a single core design but in a modern machine one processor's instruction may very well be another processor's data. Having the instruction cache manipulated manually by an application as 'workaround' or 'kludge' written all over it (or as we all tend to find out the hard way, "An Accident Waiting To Happen").

          1. Charles 9

            Re: "I think you will find RAM is quite a bit faster than in the days of a PDP8."

            " Having the instruction cache manipulated manually by an application as 'workaround' or 'kludge' written all over it (or as we all tend to find out the hard way, "An Accident Waiting To Happen")."

            That's why they sometimes call it the bleeding edge. The programs in question are trying to extract every last bit of performance from the CPU (because they're doing something pretty demanding like emulating a CPU and other hardware from less than ten years ago) because raw performance becomes the baseline by which everything else becomes possible for it.

  3. Anonymous Coward
    Anonymous Coward

    No one has asked...

    What idiot decided that having 2 different types of processor core on a single chip was a good idea? Here's an idea - use the same core type and underclock half of them. Problem solved.

    1. Crazy Operations Guy

      Re: No one has asked...

      The two sets of cores operate with different instruction sets. The M1 cores contain quite a few additional instructions for media applications (Video and audio decoding, accelerated 3-D graphics), advanced math functions, encryption/decryption, etc. Each of those instructions would take much, much more time and energy to execute on an A53 core. However the complexity of the M1 core requires quite a bit more power to run, even executing the same instructions. With this set-up, the M1 can be powered off the majority of time until it becomes more efficient to use the beefier cores. The M1 cores become the much more efficient option when viewing a DRM-Protected content or playing a game that is highly graphics-intensive.

  4. DrBandwidth

    Seen this one before....

    When Apple switched from the Motorola PowerPC 604 to the IBM PowerPC 970 (aka "G5"), a similar problem occurred. All prior PowerPC processors had used 32 Byte cache lines, and software was written with the expectation that the "DCBZ" (Data Cache Block Zero) instruction would zero 32 Bytes. The PowerPC 970 used a 128 Byte cache line, so the DCBZ instruction zeroed the 32 Bytes that were expected, then continued and zeroed the next 96 Bytes as well. Sometimes it was data, sometimes it was text, but frequently it was a mess. IBM added a mode bit that caused the DCBZ instruction to operate on 32 Bytes instead of the full cache line and made that the default setting on the parts sent to Apple.

    1. David Roberts

      Re: Seen this one before....

      Well, yes, I was wondering about this as well.

      We have a report of a bug where only half the cache is cleared. What happens in the opposite switch case when the code tries to clear twice as much cache as there is?

  5. John Smith 19 Gold badge
    Unhappy

    Self modifying code.

    I was taught about this in high school and it took years for me to find real cases.

    The ones I found were.

    The Apollo Guidance Computer. Used to extend the instruction set.

    The "Blit" bit mapped terminal developed by Bell Labs as the UI for the Plan 9 OS. Used to assemble optimal instruction streams on the stack for certain graphic functions.

    Both were work around specific limitations of the architecture, either in terms of word length (allowing an extended instruction set) or processor speed.

    It looks like squeezing the last gram of performance out of an architecture remains the reason for doing this.

    But boy can it get messy.

    1. Anonymous Coward
      Anonymous Coward

      Re: Self modifying code.

      Common practice in the early 80's game programming to wring the most out of the feeble processors, also used for a lot of the copy-protection of the time, which some times even included undocumented op-codes (6510 in c64) to confuse the debuggers so the code looked like garbage when examined.

      Not really needed now as games are pretty much C/C++ or even standardized engines with scripting for game logic.

      1. Charles 9

        Re: Self modifying code.

        But if there's one place where wringing the most performance still exists, it's something like a console emualtor, and emualting a Wii or PSP counts as pretty much the pinnacle of console emulation for the time being (no one expects anything above those to be feasible anytime soon, as it was around that time that computer performance stopped climbing so rapidly).

      2. John Smith 19 Gold badge
        Thumb Up

        Common practice in the early 80's game programming to wring the most out of the feeble

        processors

        Yes, that would be exactly the sort of environment I'd expect it to get used.

        I've nothing against self modifying code in principle just as long as people recall Knuth's point that "premature optimization is the root of most (programming) evil," and leave it as a last resort, not a first resort. This is low level stuff and likely to break most efforts at a compilers attempts to optimize the code first.

        1. Charles 9

          Re: Common practice in the early 80's game programming to wring the most out of the feeble

          Thing is, while DynaRec and JIT can be compiled high-level, self-modifying code usually isn't compiled but assembled, as to do it right you really need to go low-level and hand-tune everything. It takes a VERY specialized language to be able to practically repeat the feat with a compiler.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like