back to article Monday: Intel touts 28-core desktop CPU. Tuesday: AMD turns Threadripper up to 32

AMD this week promised to ship 32-core Ryzen Threadripper 2 processors in the third quarter of 2018 – one day after Intel bragged about a forthcoming 28-core part. On Monday, Intel, the dominant CPU maker in the worlds of desktop and server systems, touted an upcoming 28-core Core X-series aimed at workstations, gamers, and …

  1. Nate Amsden

    where's the innovation?

    Both of em are just tweaking at most server cpus to run on workstations. I have a dual socket HP opteron workstation maybe from 2009. Bought it refurb from HP maybe 2012. Upgraded the cpus from 4 core to 6 core about two years ago(12 total cores), after finally finding cpus that were a decent price. The cpus were specifically for HP blades. I just discarded the blade heatsinks and reused what the workstation already had. Nothin new here. I don't use it for much anymore but it's still a pretty solid system.

    I've seen people claim the new Ryzen chips has forced intel to compete more. I don't really see that either myself. Ryzen fell far short of my own personal expectations on power usage anyway (not that intel is much better now). Sad to see seemingly everyone running into manufacturing walls relative to the past.

    Where AMD forced intel to innovate was when intel came out with the core series architecture.

    1. Anonymous Coward
      Anonymous Coward

      Re: where's the innovation?

      What sort of 'innovation' did you expect? Just because you want something doesn't mean it is going to come (whatever it is you think they should have 'innovated')

      1. Nate Amsden

        Re: where's the innovation?

        I wasn't expecting myself, I was commenting on the article:

        "[..]what matters is that someone is putting pressure on monopoly giant Intel, forcing it to innovate in the desktop "

    2. sharkando

      Re: where's the innovation?

      Where is the innovation?? The innovation is Infinity Fabric. Innovation is the 12nm node. Innovation is in the design of the intelligent decoder and branch prediction. Innovation is in providing 128 PCIe lanes. Innovation is making it all affordable. not

      There'sch innovation from Intel unfortunately given that they still use a monolithic design that ends up being unaffordable to the point that the 28 core demo you speak about was actually a scam using a Xeon platinum Skylake cooled with a $1000 Sub-Zero chiller.

    3. rav

      Re: where's the innovation?

      Not everybody is satisfied driving a Prius. Some folks do want a Maserati or Shelby Cobra.

      As for me, I will build one for computer chess. I can't wait to get my hands on TR2.

    4. rav

      Re: where's the innovation?

      Actually the innovation is 4 - 8 core 16 thread dies rather than a single massive and very expensive die.

      The innovation is also infinity fabric. Then there will be 7nm TR3!!!

      TR1 is now about $500. I expect TR2 to be released at less than $1200.

  2. TReko

    Gimme speed

    I don't want more cores on a workstation, I want fewer cores that I can clock higher.

    Most tasks are hard to parallelise - Gimme a 10GHz CPU.

    1. VeganVegan

      Re: Gimme speed

      That’s the case for most users, but I have genetic analyses that are highly parallel, so more cores scales better than more clock speed.

      For example, one of my jobs would typically take a month or so, using all 24 threads of a 12 core machine, running 24/7. I would love to have a 48 thread machine, because it should almost halve the time of the run. It is much harder, and gets into impossible territory, to scale clock speed in a similar way: A 2.5 GHz chip could possibly be sped up to 5GHz, but 10 or 20GHz? I do appreciate higher clock speeds, it’s just that more cores gives me more bang for the buck.

      When you get down to the basics, a GPU is a massively parallel chip, because much of the graphics task can be nicely parallelized. One can imagine (hope?) that other common tasks can better take advantage of getting paralleized to the extent possible. Modern OS’s already generate many threads, to take advantage of the cores available.

      1. JeffyPoooh
        Pint

        Re: Gimme speed

        Vegan^2 noted, "...one of my jobs would typically take a month or so..."

        Unless it's already been done, then there's very likely an order of magnitude (or maybe three) in optimizing the code.

        Hand-tuned assembler written by somebody that really understands exactly what they're doing can be stupidly fast. Even if it's just a few LoC in the innermost loop.

        1. Richard 12 Silver badge

          Re: Gimme speed

          ^JeffyPoooh

          Honestly, no. Modern compilers are really, really good.

          Hand-tuned assembler is an outdated concept. It's incredibly expensive - months or years - and risks the results being wrong. It is better to spend the time doing runs and making the simulation more accurate.

          Making it run faster is more generally done by two methods: Make it more efficiently parallel, and make it do less - figure out which parts of the simulation aren't actually necessary, and omit them.

          Eg making sure it does things in the best order (not waiting on memory or other tasks), finding early-exit cases etc.

          Domain knowledge is the best way to optimise. Only compiler writers should be looking at assembler.

          1. Joe Werner Silver badge

            Re: Gimme speed

            Plus you have to factor in the amount of time it costs for the rewrite. Honestly: Unless you run it several times you don't bother. If it can be solved "quickly enough" it is good enough, and if it can be sped up by parallelised code that's an order of magnitude you have. And that is "for free" (depends on the OS, really easy under any *nix-like system, i.e. Linux or MacOS) if you just have to link against openBLAS instead of your bog-standard BLAS in case you are doing a lot of linear algebra stuff. Or if you have stuff that is embrassingly parallel (i.e. just several processes that do not need to communicate).

            One other thing: make sure that stuff is using the correct layout in arrays, there is some speed-up just from changing between column-major and row-major ordering (which is language dependent).

            Don't bother about the rest. That's what the compiler should do and optimised libraries take care of.

          2. Ken Hagan Gold badge

            Re: Gimme speed

            ^ Richard 12

            I'd argue that assembler is the out-dated concept, not just the hand-coded variety. Modern processors don't really execute instructions anymore, they simply move data from place to place and sometimes the data is changed en route. The limiting factor on speed is nearly always how long it takes to shuffle the required data through the required list of places it has to visit, and making sure that when two pieces of data have to meet up in a place, that they do so at the same time.

            Assembly language isn't a particularly good way of expressing the requirements, but out-of-order processing allows the hardware to converts a sequence of instructions into a data-flow, on-the-fly. So we end up with people writing lists of instructions in a high-level language, which the compiler tries to turn into a data-flow but then has to write out as another sequence of (assembly) instructions, which the CPU tries to turn back into a data-flow for the most efficient execution.

            Maybe one day we'll figure out how to express non-embarrassingly-parallel algorithms directly. I'm not holding my breath, though. The academics have been looking for such languages for all of my life and no-one has found one. (I think the commercial incentives are now such that any successful solution to this problem would go mainstream within 2-3 years.)

            1. big_D Silver badge

              Re: Gimme speed

              On the other hand, GRC's Spectre checker (Inspectre) was written (quickly) in assembler and weighs in at 110KB, 96KB of which is the Windows requirement for a high definition icon.

              Some things can be written better in assembler, but very complex tasks are much easier to debug in high level languages. You can still optimize that code as well.

              BUT some jobs just take time. We don't know how well optimized that 1 month job is, it could be that they have got the runtime down to "just" 1 month. Speculating on optimizing it, just because it doesn't finish in milliseconds, without understanding what it is doing is tilting at windmills.

              I've worked on projects before, where analyses or reports took days or weeks to run. They had been well optimized over the years and the runtimes had come down with successive hardware generations. At some point, you can't optimize the algorithm any further and more processing power (whether raw speed or parallel processing) is the only way forward.

              1. Anonymous Coward
                Anonymous Coward

                Re: Gimme speed

                "I've worked on projects before, where analyses or reports took days or weeks to run."

                A new large data analysis worked - but took 11 hours to run. It already used doped vector arrays to allow easy insertion/sorting. Next day the time was brought down to 30 minutes by using a binary chop search to find things in the arrays.

                While the program was written in C - both doped vector arrays and binary chop searches were techniques I learned when writing in assembler many years ago. They both require an appreciation of low level data structures and memory allocation.

                1. big_D Silver badge

                  Re: Gimme speed

                  @AC I also worked on an early eretail site. It collapsed whenever the eBay newsletter went out. The mySQL database would cease up and the query to get the menu for the homepage would take over a minute to run. They had 4 front end servers and a mySQL database.

                  I looked at the code and quickly worked out that it was written for humans to understand, not so that a computer could execute it quickly. Changing the execution of IF statements from negative to positive and changing the WHERE clauses to work optimally on the data (different indexes and starting at the highest common denominator, instead of the lowest, which was more understandable for a human, the query time dropped from over 1 minute under load to around 12ms.

                  The next time the newsletter came out, the servers were doing well, going from 50 users per server and collapsing to 250 users per server and still plenty of headroom.

                  That was something that could be optimized and showed significant results. But, as I said above, without knowing how much the original 1 month problem has been optimized, it is pointless to speculate about further optimization. If it originally took 3 months and they are now down to 1 month after optimization, you are at the limits of what the hardware can achieve. Maybe more cores or faster storage and memory are needed?

          3. Anonymous Coward
            Anonymous Coward

            Re: Gimme speed

            I believe many compilers also risk the result being wrong, at least in some cases. I read a recent thing on an optimization commonly done by C compilers which turns code with behaviour defined by the spec into code with behaviour not defined by it. Certainly we worry a lot about checking that code gives bit-reproducable answers at high optimization settings, which it does not always do, even in Fortran which has been much more carefully designed for optimization than C.

            1. Anonymous Coward
              Anonymous Coward

              Re: Gimme speed

              "I read a recent thing on an optimization commonly done by C compilers which turns code with behaviour defined by the spec into code with behaviour not defined by it."

              Had an application failing with a new version of the compiler. Turned out that the optimisation now recognised that several functions had the same calling parameters and the same code.

              It then generated one instance of the code - no problem. It also conflated all the different function entry points too - so they all had the same memory address. This caused the problem - the program differentiated the different calls elsewhere by their entry point addresses.

              1. Anonymous Coward
                Anonymous Coward

                Re: Gimme speed

                That's a nice example! If this was C, I wonder if the spec even says whether two functions which have different textual definitions but which are clones of each other must have different addresses?

                1. gnasher729 Silver badge

                  Re: Gimme speed

                  The compiler may put both functions at the same address, but &f1 == &f2 must be false. The compiler could for example put a few nops in front of the function, to have a few different addresses available, but always call the function without any nops.

                  1. Anonymous Coward
                    Anonymous Coward

                    Re: Gimme speed

                    Thanks!

                    1. Lee D Silver badge

                      Re: Gimme speed

                      Few things are CPU limited that can't work better with a bit of rejigging and some parallel processing (eg. GPU processing).

                      But things plateaued really quickly because they hit physical boundaries.

                      Nothing stopping people making a core without a consistent clock across it. It's perfectly viable, theoretically. But it would mean architecture changes, most likely. Or it's performance for synchronous tasks would just fall back to "waiting for everything" and you would see no speed gain.

                      Heat and chip size are limiting... you need a very tiny, very hot chip, which is really bad for materials that you want to cool, where you just want everything to be spread out and cool. It's like putting a soldering iron bit on your motherboard, basically. Just because it's small doesn't mean you can stop it destroying itself / it's surroundings by blowing a fan near it.

                      I think we'd see much bigger gains, anyway, from things like memory that's closer to the chip without relying on tiny local caches to keep the CPU fed (isn't that the problem with things like Rowhammer, etc. too?). If we could bring the RAM into the CPU, and things like persistent RAM, then you'll probably see greater performance increases as the 3GHz CPU will always be kept busy as opposed to a 5GHz CPU that's constantly waiting on the RAM for data.

                      To be honest, I'm at the point where - despite as a kid looking at a 4.77MHz chip and being unable to imagine the speed of 1GHz, and then achieving it in only a few years - I look at the top-of-the-line chip frequencies and don't see them changing anywhere near as much in the next decade or so.

                      With virtualisation, parallelisation, etc. however it won't matter much for almost any "ordinary" workload. And HPC is moving towards GPGPU, custom chips etc. anyway. We'll see a quantum computer before we'll see a 10GHz home machine.

                      I think I'd rather my servers had 100 cores idle at 3GHz than anything else anyway. VM running slow? Add another half-dozen cores and some more RAM into it. Pretty much normal stuff (SQL, etc.) will scale just fine.

                      The problem there is the licensing is going to become insane unless revised (but I run Windows Server Datacenter anyway, so I don't particularly care for most things!).

                      It will lead to the point, though, where one server could in theory allocate 10 cores per client (to things like terminal services, etc.) and be just as fast as anything you could do locally, and at that point you might see a push towards thin-stuff again. Until the next fad-cycle, of course.

                      1. Martin an gof Silver badge

                        Re: Gimme speed

                        Nothing stopping people making a core without a consistent clock across it. It's perfectly viable, theoretically. But it would mean architecture changes, most likely.

                        What, do you mean like the AMULET?

                        M.

                  2. Ken Hagan Gold badge

                    Re: &f1 == &f2

                    Are we using the correct tense here?

                    I think I first encountered a discussion of this point about 20 years ago, in the context of C++ templates producing *many* byte-level-identical functions and then being pretty much obliged (for sanity's sake) to eliminate all but one as a linker optimisation. Once identified, the problem was easily fixed because the compiler can see whether function addresses are ever used as a proxy for identity. I can't say I've heard anyone mention it in the intervening decade or two.

              2. Gene Cash Silver badge

                Re: Gimme speed

                > the program differentiated the different calls elsewhere by their entry point addresses.

                Wait, what? Why the hell would it do that?

              3. Richard 12 Silver badge

                Re: Gimme speed

                @AC with the funky function calls...

                If that was C or C++, you were relying on Undefined Behaviour.

                Compilers are free to demonize your nasal passages if you do that.

            2. big_D Silver badge

              Re: Gimme speed

              @tfb optimization has always been a problem. We had a demo mainframe delivered and the sales guy gave us a tape with source code for our VAX cluster. He told us how wonderful his mainframe was, and how fast. We should compile the code with all optimization on the VAX and let it run and run the same code on his mainframe. We should call him back in a week, once the mainframe was finished, the VAX would need a month!

              There was a note waiting for him by the time he had returned to the office (those were the days before mobile phones). The VAX was finished.

              It had taken the source code, analysed it and came to the conclusions: No input, a lot of calculations, no output = nothing to do. The program created a huge multi-dimensional array, filled it with random numbers, performed some calculations on the random numbers and dumped the array, when it was finished. The mainfram dutifully compiled it and executed it, the VAX made a small .exe that finished in a fraction of a second.

            3. Anonymous Coward
              Anonymous Coward

              Optimization

              A (standards) conforming compiler is not permitted to change the behaviour of a strictly-conforming program (one that follows the rules).

              The problem is that a lot of C code contains instances of undefined behaviour. These are often benign at low (or no) optimization levels, but do cause some real surprises at higher optimization levels. The programmer is basically required to honour a contract specified within the standard, and the optimizer assumes that is the case.

              The important thing here is it is the code that is broken, not the compiler. Unfortunately, it is fairly easy to introduce undefined behaviour into the code accidentally - and there is no requirement for the compiler to issue a diagnostic.

          4. Anonymous Coward
            Anonymous Coward

            Re: Gimme speed

            Massively parallel computation is the best way forwards. The Condor CPU-sharing system is a very good exemplar of this; a researcher at a certain university which uses Condor said that a simulation run that on his own research cluster would have taken six months to run completed on the early experimental Condor system over one normal weekend. The University has, of course, expanded their Condor implementation hugely since those early days.

          5. Claptrap314 Silver badge

            Re: Gimme speed

            "Only compiler writers"--well, them and low-level drivers. And folks doing validation of the processors. :P

            1. onefang

              Re: Gimme speed

              '"Only compiler writers"--well, them and low-level drivers. And folks doing validation of the processors.'

              And people working with tiny microcontrollers.

          6. JeffyPoooh
            Pint

            Re: Gimme speed

            Richard12 offered, "...Hand-tuned assembler is....incredibly expensive - months or years..."

            You've failed to read to the end of my post, where I specifically suggested, "Even if it's just a few LoC in the innermost loop."

            Modern compilers can be good, but if a run takes a full month then it seems clear that he's almost certainly running his program as a crappy and inefficient high level script.

            Room for improvement is inevitable.

        2. Anonymous Coward
          Anonymous Coward

          Re: Gimme speed

          Admittedly, there's never any harm in pulling out some profiling tools to see what can be done, but I doubt that the potential speed benefits from unrolling a loop or two would ever outweigh the costs of implementing, testing and maintaining the changes, especially when the odds are good that the heavy lifting is being done by an industry standard library - and it's quite probably running on a souped up GPU via Cuda/OpenCL or similar.

          Writing good code is hard. Writing good assembler is much harder. Writing good parallisable assembler is several orders of magnitude harder still! And proving that what you're written is functionally correct is an absolute nightmare.

          If you think you can do a better job than the geniuses who write the compilers, or the academics who wrote the libraries, then have at it! But, y'know, you're probably not - and the time you spend hacking away at something which is Known Good is probably best spent on something else.

          I've been reminded of this recently, when trying to address some performance issues in a large lump of legacy code. It's the kind of code which makes your eyes bleed, having organically "evolved" over the last ten years with the aid of a large number of developers with highly varying abilities. And in the last few years, people have just tended to hack in changes wherever it was easiest.

          The result is fundamentally unmaintainable: it's virtually impossible to get a clear view of the overall business logic, or of all the special cases which are being handled, especially as many are implicit or conflated with other special cases. As a result, even small changes can act like a crazed chaos butterfly, causing issues in places you'd swear couldn't be impacted.

          Thankfully, we now have buy in from the business to bite the bullet and make a start on cleaning things up. But realistically, it may take weeks or even months before we can be confident that the new code is functionally equivalent!

          (blah blah unit tests. blah blah specifications. blah blah test plans. As ever, the Holy Unbalanced Tripod rule comes into play: you can have it delivered quickly, you can have it well tested, and you can have it at low cost. But at best, you can only ever get two of the three...)

      2. Korev Silver badge
        Boffin

        Re: Gimme speed

        That’s the case for most users, but I have genetic analyses that are highly parallel, so more cores scales better than more clock speed.

        I'm not familiar with your situation, but why don't you run on a cluster? If you're doing something like sequence alignment to a genome then it's pretty easy to scale the number of jobs up to the number of files or file pairs (for PE reads).

    2. Ken Hagan Gold badge

      Re: Gimme speed

      "Gimme a 10GHz CPU."

      CPU frequencies have hardly moved in over ten years. The wavelength of light at 10GHz is smaller than the die size of aforesaid CPUs. It is quite plausible that you will not live long enough to see a 10GHz part in normal commercial channels. (And no, I have no idea how old you are.)

    3. Anonymous Coward
      Anonymous Coward

      Re: Gimme speed

      "Most tasks are hard to parallelise"

      Most problems are inherrently parallel, but writing parallel code "is hard" ...

    4. HPCJohn

      Re: Gimme speed

      Not possible. Heat dissipation goes up with the square of the frequency.

      https://en.wikipedia.org/wiki/CPU_power_dissipation

      A 10Ghz chip would be as hot as the surface of the sun, or something like that.

      1. DropBear
        Joke

        Re: Gimme speed

        Unacceptable! If nine women can deliver in one month the same baby that one woman can in nine months, there's no reason we shouldn't expect CPUs to get with the program too and start getting much faster again!

      2. Anonymous Coward
        Anonymous Coward

        Re: 10Ghz as hot as the sun?

        Kit hit 7Ghz this week... granted on Liquid Nitrogen, but still, 7Ghz! We were hitting 3 or 4 on L in the past, now it's run on air or water cooling easily.

        10Ghz is possible, may take a Looooong time though.

        1. Tom 7

          Re: 10Ghz as hot as the sun?

          Last time I checked out a rig like that the rig cost far more than buying another computer and provided less performance increase. Pretty coloured tubes though.

      3. Claptrap314 Silver badge

        Re: Gimme speed

        For a fixed technology. A long time ago, we were facing this problem with bipolar processes (ie: transistors.) Folks were looking for alternatives to MOSFET more than a decade ago because they saw this coming. Depressing that nothing has been found so far.

      4. Anonymous Coward
        Anonymous Coward

        Heat dissipation goes up with the square of the frequency

        It does - for the same device geometry.

        Reduce the size (and hence things like capacitance) and the power requirement goes down about the same as the reduction in the transistor surface area.

    5. Michael 47

      Re: Gimme speed

      Sadly, given the way processors work at the moment, it is physically impossible to go above about 5GHz, because the pulse literally can't propagate fast enough. If we assume the clock pulse can propagate at the speed of light, at 5GHz the pulse can propagate about:

      (1/5e9)*3e8 = 0.06

      or about 6cm, which is roughly the size of the CPU, so if you clock it much faster than that the next tick will happen before the previous one has even reached some of the components in the CPU. I wouldn't say never, because they continue to amaze me with the innovations they make, but in their current configuration I don't believe we will ever see a CPU that can be clocked much above 5GHz

      1. UncleNick

        Re: Gimme speed

        "but in their current configuration I don't believe we will ever see a CPU that can be clocked much above 5GHz"

        You might want to Google up the overclocking world records...

      2. Claptrap314 Silver badge

        Re: Gimme speed

        You might want to look into clock distribution methodologies before making that claim. You might even want to dig into the details of pipelining. I have no doubt that any of the majors could ship a chip with 20GHz clock speed in a quarter or so. These "new" chips would have pipe lines that were four times as long as current ones, however. Also, their performance would almost certainly degrade a bit.

        Clock speed has been effectively meaningless since AMD introduced the K5. It's performance that matters, everything else is marketing. Unfortunately, performance is a per-job thing. All of which is WAY too complicated for consumer marketing. So, people talk about clock speed as if it matters.

        What is a clock cycle inside a microprocessor? It's the frequency of the latch sampling at the end of a chain of unlatched gates. If you want to increase cycle speed, you can simply reduce the number of gates between the latches. (To a point.)

    6. HPCJohn

      Re: Gimme speed

      Probably not that relevant to this discussion, but regarding performance and compilers you should look at the Julia language for scientific and technical computing. Looks like Python, runs like C. It is as fast as C in many instances. It uses multiple dispatch

      https://en.wikipedia.org/wiki/Multiple_dispatch

      And as this is a UK based website, worth flagging up that the Julia conference this year comes to London in August.

    7. Daniel von Asmuth
      Gimp

      Re: Gimme speed

      A 10 GHz CPU? Try to revive the old Alpha AXP architecture. Forget about complex instructions and speculative executing or hyperthreading; say hello to even longer pipelines. Use photonic chip interconnections. Focus on low latency and smile when you see fewer TFLOPS in benchmarks, and expect huge cooling requirements.

      Amdahl's Law will tell you that most programs will receive less than a 100-fold speed-up if you run them on 100 cores (instead of 1), but but tasks can be parallellised to some degree.

    8. rav

      Re: Gimme speed

      More cores allow you to browse, play music, use Xcel,

      You are right most folks just may not need 32 threads. So what!!!! That does not keep them from wanting 32 cores!!!

      Besides Intel has for years been touting their leadership with multicore performance. Now they are on the other end of that stick and some folks just do not like that!!!

  3. Anonymous Coward
    Anonymous Coward

    Intel was fudging

    They took an existing Xeon part and needed a -10C chiller and a 1000W power supply just to overclock it to 5Ghz.

    AMD showed production sample silicon that was air cooled.

    1. Piro Silver badge

      Re: Intel was fudging

      I believe it was a 1600W PSU.

      The whole Intel demo was an utter joke. Cooler + PC consuming north of 2kW in a very desperate move.

      This doesn't lead to a consumer product.

      1. Anonymous Coward
        Anonymous Coward

        Re: Intel was fudging

        Intel was fudging, but the press ate it up. That's all they cared about, because stock analysts will have seen Intel's demo and think "Intel is still comfortably ahead of AMD" instead of "AMD is hot on Intel's heels, we should lower our price targets on Intel!"

    2. RancidRodent

      Re: Intel was fudging

      Wow, they needed a chiller to (fail to) match a stock IBM mainframe CPU clockspeed!

      1. Peter Gathercole Silver badge

        Re: Intel was fudging

        Yes, but even IBM has backed off from pushing the clock speed to add more parallelism.

        The Power6 processor had examples being clocked at 4.75GHz, but the following Power7 clock speed was reduced to below 4GHz (but the number of SMT threads went from 2 to 4, and more cores were put on each die, again 2 to 4). Power8 kept the speed similar, but again increased both the SMT and cores per die.

        In order to drive the high clock speeds in Power6, they had to make the processor perform in-order execution of instructions. For most workloads, putting more execution units, reducing the clock speed, and putting out-of-order back into the the equation allowed the processors to do more work, but could be slower for single-threaded processes.

        The argument about compiler optimization really revolves around how well the compiler knows the target processor. Unfortunately, compilers generally produce generic code that will work on a range of processors in a particular family, rather than a specific model, and then relies on run-time hardware optimization (like OoO execution) to actually use the processor to the best it can.

        In order to get the absolute maximum out of a processor, it is necessary to know how many and what type of execution units there are in the processor, and write code that will keep them all busy as much of the time as possible. Knowing the cache size(s) and keeping them primed is also important. SMT or hyperthreading is really an admission that generic code cannot keep all of the executions busy, and you can get useful work by having more than one thread executing in a core at the same time.

        I will admit that a very good compiler, targeting a specific processor model that it knows about in detail is likely to be able to produce code that is a good fit. But often the compiler is not this good. You might expect the Intel compilers to reflect all Intel processor models, but my guess is that there is a lead time for the compiler to catch up to the latest members on a processor family.

        I know a couple of organizations that write hand-crafted Fortran (which generates very deterministic machine code - which is examined) where the compiler optimizer rarely makes the code any faster, and is often turned off so that the code executes exactly as written. This level of hand optimization is only done on code that is executed millions of times, but the elimination of just one instruction in a loop run thousands of millions of times can provide useful savings in runtime.

        All of the time an organization believes that hand-written code delivers better executables, they may justify the expense of doing it. It's their choice, and making a generalization about the efficiency of code generated by a compiler is not a reason to stop when faced with empirical evidence. Sometimes, when pushing the absolute limits of a system, you have no choice than making the code as efficient as possible using whatever means are available.

        1. RancidRodent

          Re: Intel was fudging

          Yeah, but I was talking about mainframe - not Power - okay only two threads per core (for speciality engines) but they're clocked at 5.2Ghz, yes the z14 has built-in water coolers but these are just that - water coolers - not chillers. The previous z13 ran at 5Ghz and the previous to that (zEC12) a staggering 5.5Ghz.

          1. Peter Gathercole Silver badge

            Re: Intel was fudging

            I think you would be surprised about how closely related the Power and Mainframe processors are nowadays.

            With the instruction set micro- and milicoded, the underlying execution engines rely on some very similar silicon.

            Oh, and there have been relatively recent Power6 and Power7 water-cooled systems, the 9125-F2A and -F2C systems, but only a relatively small number of people either inside or outside of IBM will have worked on them (I am privileged to be one of them). These were water-cooled to increase the density of the components rather than to push the ultimate clock speed. The engineering was superb.

            And... they were packaged and built buy Poughkeepsie, next to the zSeries development teams, and use common components like the WCU and BPU from their zSeries cousins.

            There was no Power8 system in the range, because of the radical change to the internal bus structures in the P8 processor. I don't know whether there will be a Power9 member of the family, because I'm no longer working in that market segment.

        2. Claptrap314 Silver badge

          Re: Intel was fudging

          When I was at IBM, the compiler team was really proud that their code could outperform gcc by 15%. Nobody liked it when I pointed out that their code became available 18 months after silicon came out, and that intervening hardware performance improvements were more than that...

          But, yes. Preservation of pain applies. Unless the compiler has the data to make a near-cycle-accurate simulation, there will be significant amounts of performance left on the table. As we are hitting the wall for MOSFET, however, expect compilers to start focusing more on this.

  4. Anonymous Coward
    Anonymous Coward

    Perhaps AMD will be kind enough

    to swap out and replace my defective (insecure) 16-core ThreadRipper for a new nice shiney 32-core one without the security bugs ?

    1. defiler

      Re: Perhaps AMD will be kind enough

      Oh for fuck's sake. I'm getting tired of the bitching around here regarding "oh noes my CPU is not secure!!"

      You know the stuff that you're talking about is (for the most part) ludicrously complex and obscure. The CPU designers have had a shake and realised that this is a valid attack vector. They will now proceed to figure out ways of plugging these holes, and we'll all be happy if they can do it without dropping speed. The milk has been spilt, and the tearful little boy has been told to be careful in future - there's no point in dragging the fucking thing on and on. I really hope you don't have kids before you learn this.

      Also, I will be pounds to pennies that when you went CPU shopping you had a pretty short list of requirements:

      1) Will it run my software?

      2) Is it cheap enough for me?

      3) Is it the fastest I can get for the money?

      4) Is it going to burn my house down or make the electricity meter spin like a top?

      5) Will it work with all the other bits I've got?

      I bet security did not even factor into it for even one second. But now there's a public hoo-ha? Give it a rest.

    2. Anonymous Coward
      Anonymous Coward

      Re: Perhaps AMD will be kind enough

      to swap out and replace my defective (insecure) 16-core ThreadRipper for a new nice shiney 32-core one without the security bugs ?

      You should have actually bought Intel (which is actually more "insecure").

      Then installed Windows 10 OEM.

      From some vendor which delivers Superfish with it.

  5. Drone Pilot

    Maths co-processor?

    Does it have one like my DX or do I need to fill the empty socket like my friend's SX?

    Kids today don't appreciate the difficult choices we had to make.

    1. Voland's right hand Silver badge

      Re: Maths co-processor?

      Get yer mobility scooter off my lawn whipper-snapper.

      You did not have to try to find a coprocessor that will survive the blazing 25MHz of main CPU provided by the Harris 286C on a VLSI with memory interleaving.

      1. DropBear
        Trollface

        Re: Maths co-processor?

        There was no need. If it started to glow red hot, we could any time just hit the turbo button to scale it back down...

      2. Alistair
        Windows

        Re: Maths co-processor?

        25MHz? Some of us had a blast with doubling up our throughput with a 2Mhz to 2.48 Mhz replacement crystal.

    2. defiler

      Re: Maths co-processor?

      Harumph. Some of us had to pry out our ARM2 CPUs and put ARM3 daughterboards in to get the FPA socket before we could even think of floating point copros... With the speed of FPEm, is it any wonder I learned to use integers with a liberal sprinkling of LSL and LSR?

      Oh, and can I borrow that mobility scooter? My UPS needs new batteries.

    3. phuzz Silver badge

      Re: Maths co-processor?

      All this worrying about daughter boards when you should have just bought a computer built with expandability in mind, like my Amiga with it's internal expansion port that lets me plug a board with a faster CPU and and FPU on the same board!

      (Or these days you'd whack in a Vampire accelerator, which uses an FPGA to emulate a 68060 faster than anything Motorola ever built, along with all the gubbins necessary for hi-res output via HDMI. Shame they're almost impossible to find in stock.)

      1. gregthecanuck
        Thumb Up

        Re: Maths co-processor?

        Thumbs-up for getting a reference to the Vampire accelerator series in this discussion. :)

      2. defiler

        Re: Maths co-processor?

        a computer built with expandability in mind, like my Amiga

        Don't get me wrong, the Amiga was indeed an awesome computer in its day. My Archimedes could do most of what the Amiga did in hardware, but in software. But that's only most. The thing that (to my understanding) broke the Amiga was that Workbench had some nasty bugs on memory allocation. It wouldn't check if the memory was available before allocating it (or something like that), so you had to verify it and constantly micromanage malloc(). So people just dropped Workbench and programmed to the metal.

        So far so good, but when the metal changed (AGA, anyone?) half the bloody software stopped. The software worked perfectly on only one iteration of the hardware. So when better silicon became available (and the support chips in the Amiga had some fairly tight memory limits, for example), they just couldn't deploy it without breaking the user base. So whilst it may have been expandable in some directions, it wasn't easily upgradable.

        At least when the MEMC1a allocated RAM it was your bloody RAM to use. :D

        Shame really. I never had an Amiga myself, but sometimes I'd quite like one. And most of the best Archimedes games were Amiga ports :)

        1. Anonymous Coward
          Anonymous Coward

          Re: Maths co-processor?

          Well, the Amiga was OK, I guess.

          But the Atari ST had a higher clock speed.

          1. phuzz Silver badge

            Re: Maths co-processor?

            "But the Atari ST had a higher clock speed"

            Shame it lagged behind in every other respect...

            What do you mean we should have stopped having these Amiga vs Atari arguments twenty years ago (or more)?

        2. Korev Silver badge
          Pint

          Re: Maths co-processor?

          >My Archimedes could do most of what the Amiga did in hardware, but in software.

          Speaking of Acorn... The BBC micro had "Tubes", daughterboards for external CPUs like a Z80, ARM or even x86.

          One for the Acorn folks whose computer started my career -->

          1. Martin an gof Silver badge

            Re: Maths co-processor?

            Speaking of Acorn... The BBC micro had "Tubes"

            The Tube was an external bus so expansions weren't really daughterboards (except in the Master, where there was an internal as well as an external connector), but it was a great system and the thing that really set the Acorns (6502 and ARM-based) apart from other computers of the day was what today we'd call the API. Software which correctly followed the API could be ported from machine to machine, from base configuration to co-processor and (aside from any actual assembler) would "just work".

            As for FPU etc, the Archimedes did have the "podule" bus - wasn't it possible to fit an FPU in the first slot of that, or am I mis-remembering?

            Then of course, the RiscPC. I still use one every day. Not just a podule bus, but a co-processor slot allowing the fitting of two processors. I had a '486SX in the slot next to my original ARM600, then I swapped the 600 for a StrongARM.

            There was even a daughterboard which fitted into the processor slot and could hold five processor boards, and something called "Kinetic" which I coveted...

            M.

            1. Peter Gathercole Silver badge

              Re: Maths co-processor?

              The Tube wasn't even really a bus. It was a fast synchronous I/O port that kept the original BBC micro running, but as a dedicated I/O processor handling the screen, keyboard, attached storage and all the other I/o devices the BEEB was blessed with, while the processor plugged into the Tube did all of the computational work without having to really worry about any I/O. All of the documented OSCLI calls (which included storage, display and other control functions) worked correctly across the Tube, so if you wrote software to use the OSCLI vectors, they just worked.

              When a 6502 second processor was used, it gave access to almost the whole 64KB of available memory, and increased the clockspeed from 2MHz to 3MHz(+?) IIRC. Elite was written correctly, and ran superbly in mode 1 without any of the screen jitter that was a result of the mid-display scan mode change (the top was mode 4 and the bottom was mode 5 on a normal BEEB, to keep the screen down to 10KB of memory). Worked really well, and even better with a BitStik as the controller.

              I also used both the Acorn and Torch Z80 2nd processors, and I know that there were Intel 8186 running DOS, NS32016 running UNIX (used in the Acorn Business Computer range) and ARM 2nd processors built as well.

        3. gregthecanuck

          Re: Maths co-processor?

          "Shame really. I never had an Amiga myself, but sometimes I'd quite like one. And most of the best Archimedes games were Amiga ports :)"

          There is a stand-alone Vampire V4 coming out some time... hopefully in 2018. Fully upgraded with 24 bit graphics, 512MB memory, 16 bit audio, AMMX, fast FPU, HDMI output.... 68080 processor (equivalent to ~250MHz 68060). Very cool indeed. You might want to keep an eye out.

  6. RancidRodent

    Great, but time for windoze to catch up.

    How about lifting the Windows 10 32 MCSS thread limit eh Microsoft? What use is all those cores to professional musicians if we can't use them?

    1. Anonymous Coward
      Anonymous Coward

      Re: Great, but time for windoze to catch up.

      >How about lifting the Windows 10 32 MCSS thread limit eh Microsoft?

      As it's Windows 10 SAAS of course you can, we'll be making the available in-OS purchase available through the Microsoft store soon.

      Have a nice day, kerching.

      1. Anonymous Coward
        Anonymous Coward

        Re: Great, but time for windoze to catch up.

        Oh, you want a new feature? Fine!

        We'll do the dev up front for nothing and we'll sell the product through the store so only those that want it pay.

        What an evil, evil thing to do.

        1. RancidRodent

          Re: Great, but time for windoze to catch up.

          "Oh, you want a new feature? Fine!

          We'll do the dev up front for nothing and we'll sell the product through the store so only those that want it pay."

          Er, the multimedia thread limit was INTRODUCED with windows 10 - previous versions had no such limit - so it looks like Microsoft took something away without asking and now want to charge us to get the functionality back.

          1. Anonymous Coward
            Anonymous Coward

            Re: Great, but time for windoze to catch up.

            >so it looks like Microsoft took something away without asking and now want to charge us to get the functionality back.

            Game of Solitaire or Minesweeper anyone ?

  7. ToddRundgrensUtopia

    Can anyone explain more about AMD's inter-die fabric? How fast and what are the latencies between each die and between level, 1, 2 and 3 caches?

    1. Michael Duke

      An article on Ryzen Gen 1 is available at Toms Hardware.

      https://www.tomshardware.com/reviews/amd-ryzen-threadripper-1950x-game-performance,5207-2.html

    2. HamsterNet

      Depends upon your memeory

      The clock speed of the infinity fabric is set to that of the memery it’s controlling. The fabric on VEGA will run to 1300 but haven’t tried higher yet as the HBM struggles to get must past 1250 stable, still looking at over 600GB/s at that speed.

      More on IF here

      https://en.wikichip.org/wiki/amd/infinity_fabric

  8. adam payne
    Coffee/keyboard

    Xeon was great. So was coal.

    Awesome, that made me laugh.

    1. Chris G

      Xeon was great. So was coal.

      I was wondering if that was also a slight dig at the Trumpster.

      1. Alistair
        Windows

        @Chris G.

        I don't recall Xeons shipping in orange boxes. But that could just be an unknown unknown.

  9. Craig 2

    "Whether Intel was trying to spoil AMD's announcement, or vice versa, it doesn't really matter: what matters is that someone is putting pressure on monopoly giant Intel, forcing it to innovate in the desktop world rather than burn millions of dollars on trendy side projects it later abandons or neglects."

    This here is why I love AMD, who cares what they produce* as long as they keep Intel (relatively) honest.

    *I love AMD as much as the next person, no fanboi downvotes please ;)

  10. Cuddles

    Meh

    20+ core CPUs have been around for a while now. But there's a reason hardly anyone uses them, and we've been stuck at 2-4 cores for consumer parts for so long - most regular tasks just don't benefit from being massively parallel. Indeed, many programs still only run on a single thread because there's so little benefit from trying to use more. Even for workstations there are very few cases where throwing as many cores as possible at a task is actually the best approach; you're almost always better off with fewer, faster cores, and will often end up limited by RAM or drive I/O anyway.

    People complain about Intel not increasing cores until pushed by AMD, but there were 6 and 8 core i7 CPUs parts around nearly a decade ago - no-one actually wanted them so they stopped making them. Meanwhile Xeons have been happily in the teens and into the 20s, and even those were never anywhere near as popular as those in the 10-12 core range. AMD haven't pushed Intel to do anything useful, they've just started yet another pointless willy-waving exercise where everyone tries to boast about how big their number is with no regard for whether it's actually useful. Which can be clearly seen by the mention of gamers - there isn't a game on the market that will actually benefit from having a 32 core CPU (for the most part they're GPU-limited anyway and as long as you're not doing something stupid like pairing a budget mobile CPU with a 1080 Ti they won't even notice what CPU you have), it's just a big number for people who don't understand what they're doing but want to have a big number.

    1. DropBear

      Re: Meh

      Well, there's at least one game that has been, is, and will be for the indeterminate future* definitely CPU-limited - a certain space sim in development**. Nothing else you throw at it can make it run as anything other than a slide show. Of course, whether it would benefit from a 32-core CPU or not is anyone's guess - but I'm more than happy to test it if you sponsor me with a test rig...

      * amazingly, it's scheduled to deliver some semblance of a solution to this issue exactly at the same time as El Reg switches to IPv6: "Soon". Or, reportedly, two releases from now which, considering their track record regarding deadlines and promises, is actually precisely equivalent to "Soon" down to at least twenty decimals.

      ** geologists reportedly found evidence somewhere below the Permian layer indicating that the game wasn't always in development; based on this, several cults - widely shunned by the scientific community at large generally agreeing that the Sun will go nova first - believe the game will likewise exit development stage some day, achieving what they call "Release". The subject remains poorly explored after several ethnographic expeditions setting out to study the particularly vicious attitude of these groups towards non-cult-members have gone missing to date, never to be heard about again.

    2. HamsterNet

      Re: Meh

      IO is the major strength of the Threadripper setup, giving 64pciE3 lanes on the gen1. Not seen how many gen2 will give but it could be 128, as you get 16 per Ryzen chip and gen2 has 4 of the beauties.

    3. Glenn Sammes

      Re: Meh

      Love the BIG numbers and also love the smaller $ numbers this competition brings about.

  11. Anonymous South African Coward Bronze badge

    Microsoft will come to the party sooner or later - they'll have a "basic" version of their product restricted to 10 cores.

    Want access to an extra 10 cores? Buy an extra licence. etc etc etc...

  12. elvisimprsntr

    So we need a 28+ core CPU in a desktop to make up for the reduction in performance due to Meltdown and Spectre mitigation? Nice!

  13. Nimby
    Facepalm

    But ... I don't want more cores.

    These days, I try to build silent PCs. I'm tired of my desk sounding like an airport with planes taking off every few minutes. I need to rock out to the rhythmic beats of my clackity-clacking mechanical keyboard. My priorities are electrical input, thermal output, and PCIe lanes first, GHz second, number of cores last. Which means that right now, both AMD and Intel have serious issues preventing me from having a preference.

    This whole number of cores race is like the stupid GHz race of years past. In the end, no one wins because CPUs get "optimized" into becoming their own bottlenecks.

    And whatever happened to adding more execution units and/or making complex executions require fewer cycles? Who cares about the number of sleeping cores and unused cycles you have?

    1. Anonymous South African Coward Bronze badge

      Re: But ... I don't want more cores.

      Heh, reminds me of something back in the OS/2 days - somebody soundproofed their PC - then put a resistor over the terminals of the CPU fan to keep it slow as OS/2 doesn't cook your CPU.

      Was a really quiet PC.

      Nowadays it is like McDonald's farm, a fan-fan here, a fan-fan there all fan-fan-fanning.

      You get fans with LED's and allsorts of blinkenlights. Why?

      And some components (esp RAID cards) with passive heatsinks really get hot, I prefer to have these cooled with active cooling. More noise. Plus dust is the bane of fans.

      In the end it may be better to just get a bar fridge, stick everything inside with a couple of fans and have a really quiet PC (until the fridge compressor kicks in). And have a filter inside which'll filter the air continuously.

    2. Destroy All Monsters Silver badge
      Holmes

      Re: But ... I don't want more cores.

      > And whatever happened to adding more execution units

      Isn't that about adding moar coares?

      > and/or making complex executions require fewer cycles

      How.

      > Who cares about the number of sleeping cores and unused cycles you have

      As with programming languages, "it depends on the problem".

    3. RancidRodent

      Re: But ... I don't want more cores.

      If you merely require a desktop computer for admin tasks open office etc, just use a Raspberry PI! No noise if you run off a large flashcard or SSD.

  14. Anonymous Coward
    Anonymous Coward

    Fuck Everything, We're Doing Thirty-Two Cores.

    I like how Gillette showed themselves to be beyond parody by actually introducing a five-bladed razor a couple of years after that was written. "Make the blades so thin they're invisible. Put some on the handle. I don't care if they have to cram the fifth blade in perpendicular to the other four, just do it!" Stick it on the back? Yeah, that's good.

    1. Anonymous Coward
      Anonymous Coward

      Reminds me of the 1950s USA cars that tried to outdo each other in adding fins etc.

  15. 89724102371719531892724I9755670349743096734346773478647852349863592355648544996313855148583659264921

    Intel because they run cooler and die less often.

  16. Oberoth

    This is another 10GHz Pentium 4!

    I find it virtually impossible to believe that Intel will be able to make a 28 core CPU run at 5GHz on all cores within 6 months, they can't even do that with 6 cores today. And if it was being cooled by liquid nitrogen this would be exceptionally misleading. I see only 4 explanations for this announcement:

    1) It is a misunderstanding or vapourware, maybe he meant by the end of the decade which I still wouldn't believe.

    2) They use a totally different type of core which isn't very complex but can clock quite high, kind of like Larrabee.

    3) The TDP is like 1000 Watts and you need specialist high flow water cooling system to run it which comes pre-bonded direct to the die.

    4) Intel has been sandbagging us that 10nm wasn't working properly and it's actually a magical node they defies the known laws of thermodynamics.

    My money is on this being vapourware and is just Intel trying to get people's attention in the hope if they are waiting for this to come out then they are not buying AMD CPUs.

    1. diodesign (Written by Reg staff) Silver badge

      Re: This is another 10GHz Pentium 4!

      The 28-core Core X-series part that's due out this year is 14nm, and will likely be a de-featured Xeon, and will not run at 5GHz unless massively overclocked.

      Given Intel's other SKUs, 28 cores isn't ridiculous - but it smells like a stunt to steal the thunder from Threadripper 2.

      C.

  17. Oberoth

    Should AMD have waiting for 7nm?

    I am loving the specs of this chip but I'm a little worried that firstly, do I have any programs that will get close to using all those cores and secondly, what will they clock at (once thermal equilibrium is reached) if they are all at 100% load?

    Also, if they are not coming out until Q3 this means 7nm versions will not be out anytime soon. Given 7nm Epyc is almost ready to sample I would have estimated that 7nm Threadripper could have been released Q1 2019 at a push but I can't see them coming out until Q3 2019 now with these 12nm being so late to the party, bearing in mind Q3 release could mean they are still agonist 4 months away from hitting the shelves.

    Personally I would have preferred to wait a little while longer and got much cooler, much higher clocked 7nm chip version. But I guess if 7nm wouldn't be ready until Q2 / Q3 regardless then the 12nm stop gap is a good move but if this chip has delayed 7nm TR by 6+ months then I would say that was the wrong move by AMD.

  18. Anonymous Coward
    Anonymous Coward

    Complete Kalabash

    Complete Kalabash from Intel .. a jerry rigged (Intel DE) motherboard with 19 phase VRMS and the water cooling .. hand soldered to the edges of the board and plenty of mops and buckets ready for when it leaks.

    Water sports .. no thanks .. AND no rgb anywhere on the board, how on earth is that going to sell

  19. darklord

    So when will windows licensing catch up and start per core licenses for the desktop.

    and when it happens that fancy new desktop will cost you how much to run.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like