back to article FYI: Processor bugs are everywhere – just ask Intel and AMD

In 2015, Microsoft senior engineer Dan Luu forecast a bountiful harvest of chip bugs in the years ahead. "We’ve seen at least two serious bugs in Intel CPUs in the last quarter, and it’s almost certain there are more bugs lurking," he wrote. "There was a time when a CPU family might only have one bug per year, with serious …

  1. Nate Amsden

    stay on top of firmware updates

    Semi regularly anyway ..

    Up until I joined my present company and moved them out of public cloud into hosted infrastructure (HP Proliant) in 2011 firmware updates prior seemed to be problematic, difficult to keep track of and sometimes really difficult to apply.

    Enter the Proliant Service Pack ISO image, combined with ILO virtual media really changed the game for me anyway in being able to easily apply firmware updates, and know what versions are installed, I can just tell support I am on PSP 2016.10 or something like that. All firmware components updated whether it is BIOS, ILO(out of band management), Power management firmware, Network cards, storage controllers, disk drives etc..

    Oh what a joy.. in 2012 a flaw was discovered in the Qlogic (HP OEM) NICs, and HP had me apply firmware updates to them.. those updates weren't available through PSP(yet), so had to make I believe a custom boot CD (FreeDOS ?? or linux I forget), in order to apply the updates(ESX 4.1 was the server OS), took me several hours alone to just to build that, hadn't done that in years and my only access was remote over iLO virtual media. But I got it done.. it was a harsh reminder on how firmware updates used to go for me. Those Qlogic NICs eventually got replaced, manufacturing defect.

    At a previous company in about 2009 they asked me to track down a performance issue on their Dell servers, ended up being related to Seagate drives, and there was a firmware fix(prior to that I think I had never NEEDED to apply a firmware update to a hard disk connected to a server) -- however the firmware fix had to be applied via DOS floppy boot disk (no fancy management on those servers). Hardware guy had to go to each one plug in USB floppy to update the firmware. Firmware update fixed the performance issue. Damn dell and their multi vendor setup, servers had at least 3 different brands of disks in them(even those bought within the same batch of gear). Company tried to troubleshoot the issue for a year prior to my arrival.

    Earlier than that working with Supermicro gear.. just forget it.. I mean they even used to(maybe still do) specifically say DON'T DO FIRMWARE UPDATE unless you have a problem that support says is fixed by firmware. Not only that but they often didn't even put a list of changes in the firmware files(as someone who had purchased about 400 servers(2004-2005) with supermicro stuff I was pretty shocked). My last experience updating firmware on supermicro was (ironically) on my own personal server at a colo. To update the out of band management firmware the first step they say to do is to reset the configuration to defaults(really never a viable option for remote management). So I did, and I lost connectivity immediately. That was probably 2 or 3 years ago now, fortunately haven't had a failure since, haven't gone on site to try to fix it. Next step is to replace the system it is getting old.

    I know in fancier setups with blades and stuff the process is even simpler and more automated(even more so for vmware shops to apply firmware and driver updates in the right order - fortunately I have never had an issue with driver/firmware versions). I have about 40 DL38x systems running about 1300 VMs nothing converged here, I apply firmware updates typically once per year. vs prior to the PSP servers would typically only get firmware updates when they were first built(if that), or unless there was a problem support said to apply a firmware fix.

    I know there was one or two issues with PSP in the past year or so HP recalled one of the PSPs I think, didn't affect me I never get the latest one right away, always give it at least 1-3 months to bake in (which is on top of the time taken by the updates before they make it into the PSP).

    Recently due to size constraints I guess HP split the PSPs out, so instead of 1 ISO, I have to use 1 for G7, one for G8, and one for G9/10 (I only have G7-9). Not that big of a deal though.

    I had used HP gear back in 2003-2008 though as far as I recall there was no such easy PSP method to install firmware at the time.

    1. NoneSuch Silver badge
      Coffee/keyboard

      An Added Dimension

      "Thanks to growing chip complexity, compounded by hardware virtualization, and reduced design validation efforts, Luu argued, the incidence of hardware problems could be expected to increase."

      Let's not forget people are actually actively looking for flaws now instead of blindly believing Intel marketing blurbs.

    2. Anonymous Coward
      Anonymous Coward

      Re: stay on top of firmware updates

      It’s a good thing too, seeing as HPe’s firmware (and drivers) are bug ridden crap!

    3. Anonymous Coward
      Anonymous Coward

      Re: stay on top of firmware updates

      @Nate Agree HP firmware process rocks now, except for the whole warranty registration thing. Dell PowerEdge servers are similarly easy - download the repo from Dell then point your server's lifecycle controller at it via FTP/SMB. Tell it to update everything and it does, all remotely via the iDRAC.

      Been a long time getting to this point but it's sure nice.

      1. Anonymous Coward
        Anonymous Coward

        Re: stay on top of firmware updates

        Wasn't KCL's data loss meltdown due to a firmware inconsistency when replacing a RAID/disk controller?

      2. Phil O'Sophical Silver badge

        Re: stay on top of firmware updates

        point your server's lifecycle controller at it via FTP/SMB. Tell it to update everything and it does, all remotely via the iDRAC.

        Been a long time getting to this point but it's sure nice.

        Until some miscreant uses this nice simple process to update your firmware and install a backdoor. There are advantages to making invisible software hard to update.

  2. karlkarl Silver badge

    Unfortunately we, the plebs don't have any choice in the matter of terrible, terrible chips because there is yet to be one that is fully open-source and can be fabricated by someone other than Intel in Israel.

    1. YourNameHere

      Open source a chip with billions of transistors?

      Seriously... The bugs per million transistors is pretty low. If you have ever worked with large SW or HW projects like this, it takes a freaking army to do this. And oh, by the way who's going to pay and manage all of the multi-million dollar mask sets to actually build one, etc... I know Bobby, the your neighbor down stairs says he's really good, but I'm not using something from him. You might want to go get some other cheese to go with your whine as the cheese you're eating is not good for you...

      1. Anonymous Coward
        Anonymous Coward

        Re: Open source a chip with billions of transistors?

        Oh another "computing is in someway more complex than any other human endeavor", bollocks and it always has been. Computing is one of the few science subjects where we are actually in control, yes there are physics problem but so are there in every other science application and they do not have "well it is good enough for the sheep" philosophy

        The truth is they have been getting away with selling crap so long they think it is their right, well it is not.

        The days where CPUs were designed on the back of cigarette papers are long gone, now there are plenty of electronic design aids that can minimuise logical errors but since I would suggest layout, obscrufication and clockspeed are given a higher priority than functionality then bad design is seen as okay.

        Lastly do not come the "if you have ever worked with large SW or HW projects like this".

        Large scale projects just need to be managed properly and have the right people doing the right jobs, it works for everything else except computing where actually producing a professional and finished product is seen as optional. Due I must add to people like you pushing the modular design is somehow completely different on computing that anywhere else it is applied. These problems are not down to the physics they are down to sloppy design and that is a purely human problem.

        1. DavCrav

          Re: Open source a chip with billions of transistors?

          "that can minimuise logical errors"

          I just think that's too good not to immortalize in another post. Not saying anything about your other points, so carry on.

        2. Anonymous Coward
          Anonymous Coward

          Re: Open source a chip with billions of transistors?

          @AC "The days where CPUs were designed on the back of cigarette papers are long gone, now there are plenty of electronic design aids that can minimuise logical errors but since I would suggest layout, obscrufication and clockspeed are given a higher priority than functionality then bad design is seen as okay"

          I think this kind of post just goes to show how little most people know about integrated circuit design and manufacture, But because they know how to use/fix an iPhone/laptop or are "in IT" they think they are suddenly chip design experts.

          1. Anonymous Coward
            Anonymous Coward

            Re: Open source a chip with billions of transistors?

            @AC2 "show how little most people know about integrated circuit design and manufacture"

            Yes, that why there is an article about the leading PC CPU manufacturer's failure to produce working components, too many people were afraid to question Intel's secrecy and the result is intel sold crap as gold and told us they were doing us a favour.

            Favours do not typically cost thousands and then not work properly, that sort of favour is a sign that your "friend" doesn't actually like you at all.

            That sort of "friend" is one that noone needs, so don't try to tell me that is a thermometer and you are just checking my temperature, everyone knows when they have been shafted.

    2. bazza Silver badge

      Er, there's OpenPOWER from IBM (current and open source) and Sun used to give away SPARC designs for free (I think Oracle still do).

      OpenPOWER is particularly attractive, there's a bunch called Raptor Engineering doing a completely open source machine (chips, board schematic, firmware and Linux) based on it. There's lots of reasons to buy one of those!

      1. Anonymous Coward
        Thumb Up

        @bazza

        Now that's intriguing. This is a problem that the crypto community wrestles with all the time. Definitely worth watching; be interesting to see the price-points once it's out of pre-order.

    3. Brian Miller

      the plebs don't have any choice in the matter of terrible, terrible chips

      "The plebs" never have any choice in chips because there aren't any convenient chip foundries to pop out just a few on a wafer. Seriously it's a big undertaking, closed or open design.

      The closest to any of that are the ARM designs, but then of course the licensing has to be paid, etc. And some of the designs are still vulnerable to Specter.

      1. Dan 55 Silver badge

        "The plebs" never have any choice in chips because there aren't any convenient chip foundries to pop out just a few on a wafer. Seriously it's a big undertaking, closed or open design.

        RISC-V, it seems there are suppliers already.

        1. Warm Braw

          RISC-V, it seems there are suppliers already

          RISC-V is not a CPU design, it is the specification of an instruction set. You can't take it to a foundry and get them to produce you a CPU.

          1. Dan 55 Silver badge

            So you can't use SiFive's open-sourced designs based on RISC-V?

            (Maybe you can't, but that's what I've understood the news articles about SiFive to mean.)

            1. Warm Braw

              So you can't use SiFive's open-sourced designs based on RISC-V?

              There are a number of preliminary implementations of RISC-V (and really only the user-mode instruction set is fully settled at this stage), but you have to read what they mean by "open source" very carefully. RISC-V is being touted as a common instruction set that can be targetted by open source software with the aspiration that this leads to a diversity of chip suppliers, freed of the licensing constraints for the ISA. That doesn't say anything about licensing of the hardware design, though. It also says nothing about the large number of patents on processor design that might constrain any particular implementation seeking to be entirely open.

              I've not loooked into it in detail, but I'm aware of only one significant project aiming to produce truly "open" hardware based on RISC-V. Although SiFive say they're "changing the way people buy IP", their hardware is not, as far as I can tell, "open source" - there's still a licence agreement and a fee, as there would be, say, with ARM, although the process overhead is said to be much lower.

    4. gap

      Modern processors require large design teams and huge compute ranches for simulation and verification. The crazy things modern processors do (particularly the CISC) to obtain the performance is truly amazing, but like software, the complexity comes with a cost - design defects.

      While you could get something fabricated yourself, it won't be the cutting edge processors.

      1. Anonymous Coward
        Anonymous Coward

        "Modern processors require.... yadda yadda" says who? yes companies that sell individual chips for thousands can employ huge teams but clearly the masses of helpers failed to prevent these stupid errors in what was sold as finished products.

        CISC, yes additional inherant complexity but there should be nothing crazy included in the design process but clearly you are correct, Intel did put crazy inside their CPUs and their customers were crazy to buy them.

    5. Anonymous Coward
      Anonymous Coward

      >Unfortunately we, the plebs don't have any choice in the matter of terrible, terrible chips because there is yet to be one that is fully open-source and can be fabricated by someone other than Intel in Israel.

      Amateur hour at the El Reg Commentard VLSI Design Center. Hilarious.

      Chip designs are not software. Those who treat them as such come unstuck.

      But that said, there are open source processors design out there. e.g. Leon.

    6. imanidiot Silver badge

      @karlkarl

      Uhmm, Intel in <u>Israel</u>..... Really? *Checks memory of visit to a Chipzilla plant* Nope. Not really.

      Chipzilla has 2 plants in Israel, https://en.wikipedia.org/wiki/List_of_Intel_manufacturing_sites. They're mostly making exactly the same stuff also coming out of the fabs in the USA for 45 and 22 nm nodes. Those plants exist to have a backup that is not in the same general geographic region as the other plants, whereas both Hillsboro and Chandler could potentially be affected by the same large natural/political/man-made disaster.

      The location has nothing to do with it.

  3. Doctor Syntax Silver badge

    Stop making them faster

    Start making them better.

    Sometimes it's just better to spend time improving quality rather than adding features and speed.

    1. Anonymous Coward
      Anonymous Coward

      Re: Stop making them faster

      Problem is technically this is already happening. Raw GHZ is not increasing like it was. It's parallel and future branch prediction where it is all at.

      1. Doctor Syntax Silver badge

        Re: Stop making them faster

        "It's parallel and future branch prediction where it is all at."

        Which are aimed at making them faster in terms of computing power. And look where we've just discovered it's got us.

  4. StheD

    One bug per year? What has this guy been smoking? Any processor comes with an errata sheet. Some bugs get fixed in firmware, some are not serious enough to do anything about until the next release. Like the fdiv bug. Oops.

    Reduced design verification? Not that I've seen. But I'd like to see any DV methodology which includes attacks by hackers.

    I wonder what the bug rate is for the equivalent amount of Microsoft code. Remember, processors work pretty well without hardware patches every two weeks.

    1. Doctor Syntax Silver badge

      "Any processor comes with an errata sheet"

      And is that a good thing?

      1. Anonymous Coward
        Anonymous Coward

        Yes, because you know the list of flaws. They really aren't intended to public consumption, it is the people who design the PC hardware, write the BIOS/UEFI, write the operating systems and write the compilers who need to know that stuff. The average Joe who buys a PC with an Intel CPU doesn't need to see the list of two dozen errata for the stepping (which will grow over time a few more are found) Most of them are corner cases of a corner case, and not worth worrying about (i.e. they'll find a way to mitigate it if they can but a lot are basically marked "wontfix" because they don't matter in the real world)

        There aren't any chips that are errata free, at least not anything much more complex than a 6502, so while in a perfect world chips would have no errata its the same perfect world where software has no bugs. It doesn't exist in the real world when humans design things.

        1. John Smith 19 Gold badge
          Unhappy

          you know the list of flaws. They really aren't intended to public consumption,

          Because the "public" might decide all microprocessor mfg are a bit s**t?

          Quell f**king surprise.

        2. td97402

          Even the 6502

          The 6502 had a couple of well known bugs. You just had to code around those. The early 16 bit chips like the 68000 had bugs. I doubt that there has been a bug free processor. Ever.

          1. Anonymous Coward
            Anonymous Coward

            Re: Even the 6502

            I know for certain there are some bug-free formally verified cores, used for security roles. I don't know this for sure, but I'd bet on the CPU core Apple is using for its secure enclave was formally verified. The L4 microkernel it is running is formally verified, but without the CPU it runs on also being formally verified that's not really worth much.

            Any CPU you want to formally verify would have to be a very small simple in-order with a single core. Once you go OOE or SMP I'd have to think it would get too complicated for formal verification, even if you could automate most of it.

            1. Warm Braw

              Re: Even the 6502

              And even formal verification of the logic doesn't guarantee that you won't have problems like the Atom C2000 clock issue - translating it all into silicon is still a bit of a black art.

              1. Ken Hagan Gold badge

                Re: Even the 6502

                "And even formal verification of the logic doesn't guarantee that you won't have problems like ..."

                ...like Spectre? Let's not lose sight of the fact that Spectre is not a bug. The chip is doing exactly what its designers intended. It's just that, with hindsight, they wish they'd intended something less susceptible to side-channel attacks.

            2. Anonymous Coward
              Anonymous Coward

              Re: Even the 6502

              >I know for certain there are some bug-free formally verified cores, used for security roles.

              Wanna bet? Formal verification does not mean a design is bug free. Just that it matches the specified design intent.

              1. Anonymous Coward
                Anonymous Coward

                Re: Even the 6502

                Wanna bet? Formal verification does not mean a design is bug free.

                Yes it does guarantee the design is bug free. What it does not guarantee is that the actual device is bug free - i.e. when manufacturing issues rear their ugly head like they did with the Intel Atom C2000.

                1. Anonymous Coward
                  Anonymous Coward

                  Re: Even the 6502

                  DougS> Yes it does guarantee the design is bug free. What it does not guarantee is that the actual device is bug free

                  No. Test will discover on-chip/device issues. Formal verification, in a chip design flow context, will verify if your design intent RTL/HDL matches your design gate level netlist after each flow step, right up until the final netlist that is generated by your place & route tool.

                  (Processor design doesn't necessarily follow SOC design methodology 100% but the principles are there and for certain processor IP implemented as ASICs they still apply.)

                2. Steve the Cynic

                  Re: Even the 6502

                  "Yes it does guarantee the design is bug free."

                  No, it doesn't guarantee that, either. It guarantees that the design matches the formal specification, which just shifts the location of the bug.

                  If the formal specification says that a read issued from user-mode code to a supervisor-only page will send the read to the actual memory before validating the permissions (and furthermore use the returns of that in speculative execution that is discarded if the permissions fail), then the formal specification will validate a CPU that, nevertheless, has the Meltdown bug.

              2. Nick Kew

                Re: Even the 6502

                Wanna bet? Formal verification does not mean a design is bug free. Just that it matches the specified design intent.

                My experience with formal verification[1] is that it leads to *more* bugs.

                The reason: the verification process is itself complex and therefore error-prone, and the longwinded processes involved provoke humans into taking their eyes off the ball and possibly even cutting corners.

                I recollect a very brief (between-client-projects) involvement with a former employer's formally verified satellite telemetry, tracking and control system. I made myself unpopular when I found an error which I tracked down to an off-by-one in the implementation of the formal tests. Whoever had produced the code in question had naturally concentrated on the hardest part of the job - getting it through the tests - and was evidently too distracted to apply the commonsense to see that the outputs were wrong.

                [1] admittedly from sometime last century.

                1. Anonymous Coward
                  Joke

                  @Nick Kew - satellite telemetry, tracking and control system

                  You worked on the Hathaway project at Pacific Tech?

                  1. Nick Kew

                    Re: @Nick Kew - satellite telemetry, tracking and control system

                    @DougS - sorry, that reference goes right over my head. Googling "hathaway pacific" doesn't enlighten me, and I'm not going to spend time on trying to tweak search terms.

                    OK, I guess from the joke icon it's some kind of cultural reference to something I haven't read/seen/heard, rather than an actual project meeting my description. Though not a particularly famous one, 'cos I'm sure googling, say, Sirius Cybernetics would've turned up something :D

                    1. Anonymous Coward
                      Anonymous Coward

                      Re: @Nick Kew - satellite telemetry, tracking and control system

                      Its from the movie Real Genius.

            3. Anonymous Coward
              Headmaster

              Re: "I know there are some bug free cores"...

              The "Halting problem" may wish to have a conversation with you... or it may not. It's a bit annoying like that.

            4. MarcC

              Re: Even the 6502

              https://arstechnica.com/information-technology/2020/10/apples-t2-security-chip-has-an-unfixable-flaw/

          2. Anonymous Coward
            Anonymous Coward

            Re: Even the 6502 - The early 16 bit chips like the 68000 had bugs

            IIRC the early 68000 bug caused two sets of drivers on the internal bus to turn on, one all ones and one all zeroes, leading to a crack down the middle of the case and a smell of decomposing epoxy. Literally Meltdown.

            1. Loud Speaker

              Re: Even the 6502 - The early 16 bit chips like the 68000 had bugs

              That was the "Halt and Catch Fire" instruction.

              Very useful in military applications where you did not want you software leaking from chips with on board ROM.

              1. ByeLaw101

                Re: Even the 6502 - The early 16 bit chips like the 68000 had bugs

                HCF was a 6809 instruction, not a 6502 :)

          3. analyzer

            Re: Even the 6502

            I won't downvote you for the inaccuracy because it is shockingly common.

            The 68000 was a 32 bit processor with a 16 bit data bus much like the 8088 was a 16 bit processor with an 8 bit bus.

            1. Anonymous Coward
              Anonymous Coward

              Re: Even the 6502

              "I won't downvote you for the inaccuracy because it is shockingly common.

              The 68000 was a 32 bit processor with a 16 bit data bus"

              It wasn't really.

              Although it had 32 bit registers it only had a 16 bit ALU. The 8086 also has a 16 bit ALU and is considered a 16 bit processor. The RCA 1802 had 16 bit registers and an 8 bit ALU and is considered 8 bit. As a computer, the really important thing is the ALU width. When we benchmarked the current version of the 68000 against the NS 16032, which did have a 32 bit ALU, the 16032 absolutely wiped the floor with it on our 32 bit integer arithmetic test set, . National Semi in their marketing always described the 16032 (later the 32016) as the first 32 bit microprocessor for this reason. Acorn used the 16032 as a coprocessor in some of their designs to give "workstation" performance, and it's said that while the series was a bit of a flop, some of its design features eventually made their way to the Pentium.

          4. Danny 2

            Re: Even the 6502

            The Z80A was bug free thanks to the sprites. My Spectrum also had the perfect keyboard. I keep telling potential employers that a fully secure business should only deploy Spectrums as one Sys Admin can listen if any modems or tape decks are being accessed.

          5. Anonymous Coward
            Anonymous Coward

            Re: Even the 6502

            Yes the 6502 has logical errors like ROR being not passing the bit back to the left but at the time and for it's price it was deemed acceptable until it was fixed in later versions.

            Low unit price was a big factor with the 6502 in regard to it's uptake, not to mention that Acorn used the 6502 as the model for the ARM.

            Then add in the size of the 6502 design team and lack of modern design aids to the low CPU price and you can understand why they kept making the faulty part. Basically what worked on the 6502 for it's price and availiblity was enough for other people to sell computers designed around it

            Intel on the other hand were selling what people had been told they wanted at the highest price possible and they had no execuse at all. More than enough money, people, time and resources to do it right.

          6. Nimby
            Pint

            Re: Even the 6502

            And let's not forget the infamous Pentium floating-point unit bug! I loved that one.

            Oh for the days when these kinds of things were just "errata" and you took a sip of coffee, shrugged, and continued on with life. Maybe you patch your software. Maybe you flash in some new firmware. Maybe you actually bother to replace some hardware. Maybe not. But life went on.

            Not sure when this kind of thing became the end of the world, but me, I'm off to the pub to ignore it until it blows over and people forget all about it so that rational conversation can resume.

            Kudos to someone at Vulture Central finally pointing out that increased complexity = more bugs and that this isn't somehow magically limited to Intel.

      2. Hans 1
        Boffin

        @Dc SynTax

        Any processor comes with an errata sheet

        And is that a good thing?

        You might excel at syntax, context is not your forte, right ?

        One bug per year? What has this guy been smoking? Any processor comes with an errata sheet.

        Let me explain what the above means:

        Some MS guy comes along and pulls the following out of his backside:

        I predict the number of bugs in CPU's will increase [...] [to] one bug per CPU per year.

        The guy never heard of processor errata sheets which prove we have already long passed the 1 bug per year milestone ... more like 5 or 10, if you ask me, you don't and that's fine.

        CPU errata sheets are better than "This update fixes an issue in Microsoft Windows" boilerplate patch descriptions we get in Windows Update.

        Books and scientific publications come with an ERRATA sheet and I think it is good, because honest.

        For a successful technology, reality must take precedence over public relations, for Nature cannot be fooled.

        1. paul2718

          I don't think you read the Dan Luu post from 2016, recently updated.

          https://danluu.com/cpu-bugs/

          The guy knows what he's writing about and is worth paying attention to.

        2. Doctor Syntax Silver badge

          "Books and scientific publications come with an ERRATA sheet and I think it is good, because honest."

          Agreed. But the problem is the ready acceptance that they should be needed, particularly in this context. Would it not be better if development effort were concentrated on fixing the errata so as to eliminate them rather than adding more features which in turn add more errors?

        3. allthecoolshortnamesweretaken

          It's a good thing that errata sheets for chips are provided.

          It's not so good that they are a necessity in the first place.

          1. Charles 9

            "It's not so good that they are a necessity in the first place."

            Hey, we're only human. If you can find the perfect human, he/she probably wouldn't be human at all.

            Basically, if it's made by man, it's probably going to have mistakes, plain and simple.

  5. Notas Badoff

    We repeat ourselves

    A long time ago in a machine room far, far away as the codger walks, there was a four (!) CPU mainframe wondrous fast. It made seismic reflection squiggles shimmy so pretty the geophysicists danced. Migrations moved them to tears. All was happiness.

    Back then patches and/or "local code" required the whole OS be recompiled. Hundreds of large files of assembly massaged, then 'linked' into binaries. Long jobs taking much time away from happy-making production, thus much frowning.

    Occasionally the 'link' step would die horribly and mysteriously, losing the entire rebuild job. Then occasionally seemed to be more frequently, and always very very mysterious. This shouldn't happen became over time "this can't happen!". And that was the techs, not the customer mangers, for indeed there was nothing wrong in the linker code or the data or the OS.

    One bright spark wasting time over the rising stacks of dump listings, happened to notice that each dumped failure was running exactly the same section of tight code in the linker. Catching scent in verifying that, they then happened to notice it was always the same CPU. Same code, same CPU, every time.

    A feature of these machines was that the processors could retain up to four (!) instructions in a on-CPU cache. And if you could architect a loop in four or less instructions, wicked fast. Great for array and string processing.

    And the linker had such a four instruction loop, and for what the code was doing there, it was very very clever (and fast). But sometimes died mysteriously, on that one CPU, of hundreds of those world-wide.

    Cue more "shouldn't happen" and "can't happen", 'cept back at Engineering Central. We got to see a half-dozen luminaries - stars! - come to see us in the field, arriving from on high. (Minnesota)

    Much muttering and magic made as they tracked the bug down. Awed murmuring from us, watching hands flying over the engineering "front panel" 6 feet by 7 feet tall with hundreds of blinkin's and switches and rollers to access the dozens of internal registers. Awed tittering from us, as instructions were toggled into memory to replicate the bug on barest possible metal.

    Pure awe as they lifted "that one" card from the wall of 6" by 8" cards, from one of the four CPUs. Replacing it, they carefully packed the card away for interrogation later back at Engineering.

    We're repeating ourselves, without thinking things out. The circuits are ever so shrunk, but the instances are ever so multiplied. Shrunk by 6 magnitudes but multiplied by 8+ magnitudes?

    Anyway, after seeing that, when Aunt Edith complains that that one file in her collection of Fabio pictures crashes her tablet, I just smile and ask "have you tried using a different program?" I'm good, but I "can't handle the truth!"

    BTW: ECL keeps you warm on a cold day! I miss it just now...

  6. Snowy Silver badge
    Facepalm

    Ship it now...

    patch it later, increasing a thing for both Hardware and software.

  7. John Smith 19 Gold badge
    Unhappy

    Microprocessors really are getting just like mainframes,

    right down to how they handle a hardware design failure.

    Simple.

    Patch the microcode.

  8. Destroy All Monsters Silver badge
    WTF?

    Insane!

    reduced design validation efforts

    Why? Are these people utterly retarded? Does their risk analysis say that it is cheaper to give the customer a good one up the prostate from time to time rather than make sure the chip passes tests?

    There is more computing power than ever to formally check the design, either statistically or else using formal methods. I suppose whatever stuff drops off the university conveyor belt is no longer as performant as it used to be.

    1. Anonymous Coward
      Anonymous Coward

      Re: Insane!

      Making sure that the chip passes the tests takes time, lots of it and therefore money.

      The pressure is on to ship

      1) When the CEO told the world that it would

      2) To meet the forecasts that so called experts who's name begins with 'anal' predict. Failure means that the stock price sinks like a stone tossed into water.

      Ship and be done with it. or... Ship and fuck the issues.

      Is the new mantra in this 'DevOps', (fr)Agile world.

      There is never any time to go back and fix problems/reduce technical debt. It is all about the bling, the shiny shiny that 99.99999999% of users will never use or see but it impresses the PHB's so it must be good.

  9. Teiwaz

    All Processors have bugs

    Question is...

    Would there be any less if theses companies didn't try hare-brained schemes like IME or other bloat and complexities when there is little need.

    I'd prefer the hardware equivalent of Archlinux and i3wm - that way, I can be sure any bugs aren't introduced by fripperies.

    And the UEFI on my new built system is not only no quicker to boot, it's several minutes slower.

    1. Charles 9

      Re: All Processors have bugs

      "Would there be any less if theses companies didn't try hare-brained schemes like IME or other bloat and complexities when there is little need."

      Do you think they would've stuffed such a thing into the chip if there wasn't a demand for things like long-distance remote administration? Remember, ME was demanded by admins.

      "I'd prefer the hardware equivalent of Archlinux and i3wm - that way, I can be sure any bugs aren't introduced by fripperies."

      That may be you, but you're in the minority. In the REAL real world, people just want to get crap done, using whatever tools are at hand.

      1. Teiwaz

        Re: All Processors have bugs

        Charles 9 said: That may be you, but you're in the minority. In the REAL real world, people just want to get crap done, using whatever tools are at hand.

        I'd prefer tools that aren't over complex up the wazoo, and the Arch/i3 was an example of that, not a hardline about interfaces, but I guess that went over you head.

        So 'cause admins want it, it was rammed down every throat - and I thought a lot of h/w venders had a problem with business users using consumer tech

        1. Charles 9

          Re: All Processors have bugs

          "So 'cause admins want it, it was rammed down every throat - and I thought a lot of h/w venders had a problem with business users using consumer tech"

          But what about the other way around--consumers using business tech--which means the foundry process gets simpler and you know what they say about the KISS principle?

  10. Joerg

    AMD doesn't even publish its own CPU erratas... they are hiding the most.

    AMD doesn't even publish its own CPU erratas... they are hiding the most.

    1. Dan 55 Silver badge

      Re: AMD doesn't even publish its own CPU erratas... they are hiding the most.

      Don't AMD call them Revision Guides? And they're published.

      1. Joerg

        Re: AMD doesn't even publish its own CPU erratas... they are hiding the most.

        And they list a few of the real erratas. Intel lists 90% or more although not all of the erratas.

        1. Dan 55 Silver badge

          Re: AMD doesn't even publish its own CPU erratas... they are hiding the most.

          Wasn't that debunked a decade ago?

  11. Wolfclaw

    Will users outside the USA get compensated for a cpu they bought in good faith and now must loose x% performance. I personally believe they should be doing CPU exchanges and if that also means a new motherboard and memory, so be it, next time, they may actually check their products !

  12. Anonymous IV

    I'm sure you will be able to insist on a replacement CPU if you can prove that you were guaranteed a certain level of performance from your current one.

  13. JeffyPoooh
    Pint

    "...Microsoft senior engineer Dan Luu..."

    Such problems with hardware is the real reason that Microsoft bought Minecraft for billions and billions.

    A few years ago, Microsoft senior management had noticed all those people with too much time on their hands building up virtual CPUs made from those virtual Minecraft blocks. So they figured that if they could recursively run the Minecraft code base on a Minecraft-world virtual CPU, then they could build-up to running Windows on a virtual PC running within Minecraft which is itself recursively running within Minecraft, thus completely dispensing with the requirement for any pesky and unreliable hardware whatsoever. You may think that this is silly, but hey, recursion is well established and weird.

    Besides, this is the best explanation that you've ever seen as to why Microsoft would have bought Minecraft for billions and billions of dollars. See?

    1. Blue Pumpkin

      Re: "...Microsoft senior engineer Dan Luu..."

      These are not the Minecrafts you are looking for. They already have the technology and that technology is Powerpoint :

      https://www.youtube.com/watch?v=uNjxe8ShM-8

      The accompanying paper is also a good read.

      1. allthecoolshortnamesweretaken

        Re: "...Microsoft senior engineer Dan Luu..."

        Brilliant!

  14. Anonymous Coward
    Anonymous Coward

    Re. C2000

    I don't feel so bad about my broken Atom machines now.

    Seems that consumer grade netbooks and tablets used a similar flawed chip, and I determined experimentally that under certain conditions it can be worsened by RAM which is larger than specified.

    eg machine shipped with 512MB, put 1GB in and it might work for a while but eventually it will fail POST with infamous black screen even if you put in the slowest possible DDR2.

    nb: this is the *32* bit Atom, AFAIK 64 bit are not affected seriously by this problem.

  15. Chairman of the Bored

    I don't think people realize how much it costs to fab

    Some numbers for you... Bear in mind I'm an analog guy so I cannot speak to very small process nodes. For me to do a mixed signal design at 180nm, costs break out as follows:

    Design NRE - $$$. Disciplines here are requirements development, requirements verification, functional decomposition, functional allocation, circuit synthesis (schematic capture), nonlinear circuit modeling, physical layout (chip artwork), further modeling. So for a simple HV op-amp you've got about a man year or two in before you talk to the fab. That's a quarter to half million, burdened rates. If the design is digital we would do an FPGA implementation first (*)

    The cheapest way to fab is use a "shuttle run" where you team up with other vendors and split the cost of the mask between yourselves. A mask exposes perhaps 100mm x 100mm area; you might get 20mm x 20mm of this for your work, with a yield of perhaps 20-30 good die when it comes back. In one to two months. Cost at 180nm is around $25-50k. Faster? Pay more.

    Now you have to saw, package, test. Typically the first mask or two is a no-go. Full functional plus HALT/HASS testing will consume another man year or so, and requires capital equipment. Call it an additional quarter of a million plus any additional mask sets - and this assumes the design is reasonably successful.

    So that's why a simple circuit can push up to the million dollar level quickly, and timelines are long compared to, say, software innovation.

    (*) Essentially any digital logic can be implemented in FPGA fabric. Microprocessor designs generally get prototyped and tested that way before a design goes beyond prototype phase There are numerous ARM and other"soft cores" you can license, tweak,and incorporate into your FPGA. One very interesting multicore microcontroller - the Parallax Propeller P8X32A - is wholly open source and you can use this in your own FPGA. See: https://www.parallax.com/microcontrollers

    Downsides? Vastly more power hungry than custom silicon. Typically slower. Really expensive. Large. But of you insist your micro is the one true micro to rule them all, that's where you start.

    Fun things to play with are the Xilinx CoolRunner CPLD and Cypress PSoC devices. A mere mortal can afford them, and quickly learn that doing custom digital is really, really hard.

    1. Anonymous Coward
      Anonymous Coward

      Re: I don't think people realize how much it costs to fab

      " mere mortal can afford them, and quickly learn that doing custom digital is really, really hard." hard compared to what? and what design aids were availible

      As to high cost of fabrication is this down to yield/material or patents, it would not suprise me to discover that the big boys have made getting a competing product produced as expensive as possible via patents on the fab processes.

      My local Univercity had it's own Semiconductor fab availible to researchers and a quick google shows that they are not alone. Perhaps you could get your small projects fabbed cheaper via your local academic institution, as you said sharing reduces costs and you already know some people who want to share.

      1. BinkyTheMagicPaperclip Silver badge

        Fabbing is really expensive

        Fabbing is really, really expensive. It's not some secret plot; to create market leading fabs even large companies have to work together these days to afford it.

        Yes, some universities and large companies have their own fab, but these are at a vastly larger process size than the market leading fabs. Maybe they could churn out a pentium 3...

        As has mentioned above this is just really, really, expensive and difficult.

        To quote from one blog post about Meltdown 'The root cause of the problem is the combination of the following subsystems of a processor’s microarchitecture. Note that none of the individual components is to be blamed for the vulnerability - it is how they work together'

        Intel probably should have improved their architecture somewhat with Meltdown. Spectre is even more obscure.

        Far worse happens than this all the time in software, the difference is that it can be patched.

        If I was to speculate, this is only going to get worse. Intel aren't going to turn their back on things such as speculative execution. I'd expect some sort of silicon to introduce a bit of jitter in the timing, and ideally some enhancements to what microcode can do.

        1. Charles 9

          Re: Fabbing is really expensive

          "The root cause of the problem is the combination of the following subsystems of a processor’s microarchitecture. Note that none of the individual components is to be blamed for the vulnerability - it is how they work together"

          IOW, we have a "gestfault" here: something worse than the sum of the parts.

          As for solutions, why not find a way to solve the bottleneck of context switching so that it can be used more liberally to keep things more separated without killing your performance?

          1. BinkyTheMagicPaperclip Silver badge

            Re: Fabbing is really expensive

            Context switching is not the problem, OS designers have been working to reduce that time for years (although any hardware improvements would naturally be welcome).

            The problem is that because of these issues the cache is now getting dumped on context switches, and that kills performance.

            1. Charles 9

              Re: Fabbing is really expensive

              "Context switching is not the problem, OS designers have been working to reduce that time for years (although any hardware improvements would naturally be welcome)."

              So how come they only use TWO of the x64's available FOUR rings? Sounds to me like they're writing AROUND the problem rather than actively FIXING it since fixing it in this case requires a HARDWARE solution.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like