Re: Intel was fudging
Wow, they needed a chiller to (fail to) match a stock IBM mainframe CPU clockspeed!
AMD this week promised to ship 32-core Ryzen Threadripper 2 processors in the third quarter of 2018 – one day after Intel bragged about a forthcoming 28-core part. On Monday, Intel, the dominant CPU maker in the worlds of desktop and server systems, touted an upcoming 28-core Core X-series aimed at workstations, gamers, and …
Yes, but even IBM has backed off from pushing the clock speed to add more parallelism.
The Power6 processor had examples being clocked at 4.75GHz, but the following Power7 clock speed was reduced to below 4GHz (but the number of SMT threads went from 2 to 4, and more cores were put on each die, again 2 to 4). Power8 kept the speed similar, but again increased both the SMT and cores per die.
In order to drive the high clock speeds in Power6, they had to make the processor perform in-order execution of instructions. For most workloads, putting more execution units, reducing the clock speed, and putting out-of-order back into the the equation allowed the processors to do more work, but could be slower for single-threaded processes.
The argument about compiler optimization really revolves around how well the compiler knows the target processor. Unfortunately, compilers generally produce generic code that will work on a range of processors in a particular family, rather than a specific model, and then relies on run-time hardware optimization (like OoO execution) to actually use the processor to the best it can.
In order to get the absolute maximum out of a processor, it is necessary to know how many and what type of execution units there are in the processor, and write code that will keep them all busy as much of the time as possible. Knowing the cache size(s) and keeping them primed is also important. SMT or hyperthreading is really an admission that generic code cannot keep all of the executions busy, and you can get useful work by having more than one thread executing in a core at the same time.
I will admit that a very good compiler, targeting a specific processor model that it knows about in detail is likely to be able to produce code that is a good fit. But often the compiler is not this good. You might expect the Intel compilers to reflect all Intel processor models, but my guess is that there is a lead time for the compiler to catch up to the latest members on a processor family.
I know a couple of organizations that write hand-crafted Fortran (which generates very deterministic machine code - which is examined) where the compiler optimizer rarely makes the code any faster, and is often turned off so that the code executes exactly as written. This level of hand optimization is only done on code that is executed millions of times, but the elimination of just one instruction in a loop run thousands of millions of times can provide useful savings in runtime.
All of the time an organization believes that hand-written code delivers better executables, they may justify the expense of doing it. It's their choice, and making a generalization about the efficiency of code generated by a compiler is not a reason to stop when faced with empirical evidence. Sometimes, when pushing the absolute limits of a system, you have no choice than making the code as efficient as possible using whatever means are available.
Yeah, but I was talking about mainframe - not Power - okay only two threads per core (for speciality engines) but they're clocked at 5.2Ghz, yes the z14 has built-in water coolers but these are just that - water coolers - not chillers. The previous z13 ran at 5Ghz and the previous to that (zEC12) a staggering 5.5Ghz.
I think you would be surprised about how closely related the Power and Mainframe processors are nowadays.
With the instruction set micro- and milicoded, the underlying execution engines rely on some very similar silicon.
Oh, and there have been relatively recent Power6 and Power7 water-cooled systems, the 9125-F2A and -F2C systems, but only a relatively small number of people either inside or outside of IBM will have worked on them (I am privileged to be one of them). These were water-cooled to increase the density of the components rather than to push the ultimate clock speed. The engineering was superb.
And... they were packaged and built buy Poughkeepsie, next to the zSeries development teams, and use common components like the WCU and BPU from their zSeries cousins.
There was no Power8 system in the range, because of the radical change to the internal bus structures in the P8 processor. I don't know whether there will be a Power9 member of the family, because I'm no longer working in that market segment.
When I was at IBM, the compiler team was really proud that their code could outperform gcc by 15%. Nobody liked it when I pointed out that their code became available 18 months after silicon came out, and that intervening hardware performance improvements were more than that...
But, yes. Preservation of pain applies. Unless the compiler has the data to make a near-cycle-accurate simulation, there will be significant amounts of performance left on the table. As we are hitting the wall for MOSFET, however, expect compilers to start focusing more on this.
Oh for fuck's sake. I'm getting tired of the bitching around here regarding "oh noes my CPU is not secure!!"
You know the stuff that you're talking about is (for the most part) ludicrously complex and obscure. The CPU designers have had a shake and realised that this is a valid attack vector. They will now proceed to figure out ways of plugging these holes, and we'll all be happy if they can do it without dropping speed. The milk has been spilt, and the tearful little boy has been told to be careful in future - there's no point in dragging the fucking thing on and on. I really hope you don't have kids before you learn this.
Also, I will be pounds to pennies that when you went CPU shopping you had a pretty short list of requirements:
1) Will it run my software?
2) Is it cheap enough for me?
3) Is it the fastest I can get for the money?
4) Is it going to burn my house down or make the electricity meter spin like a top?
5) Will it work with all the other bits I've got?
I bet security did not even factor into it for even one second. But now there's a public hoo-ha? Give it a rest.
to swap out and replace my defective (insecure) 16-core ThreadRipper for a new nice shiney 32-core one without the security bugs ?
You should have actually bought Intel (which is actually more "insecure").
Then installed Windows 10 OEM.
From some vendor which delivers Superfish with it.
Harumph. Some of us had to pry out our ARM2 CPUs and put ARM3 daughterboards in to get the FPA socket before we could even think of floating point copros... With the speed of FPEm, is it any wonder I learned to use integers with a liberal sprinkling of LSL and LSR?
Oh, and can I borrow that mobility scooter? My UPS needs new batteries.
All this worrying about daughter boards when you should have just bought a computer built with expandability in mind, like my Amiga with it's internal expansion port that lets me plug a board with a faster CPU and and FPU on the same board!
(Or these days you'd whack in a Vampire accelerator, which uses an FPGA to emulate a 68060 faster than anything Motorola ever built, along with all the gubbins necessary for hi-res output via HDMI. Shame they're almost impossible to find in stock.)
a computer built with expandability in mind, like my Amiga
Don't get me wrong, the Amiga was indeed an awesome computer in its day. My Archimedes could do most of what the Amiga did in hardware, but in software. But that's only most. The thing that (to my understanding) broke the Amiga was that Workbench had some nasty bugs on memory allocation. It wouldn't check if the memory was available before allocating it (or something like that), so you had to verify it and constantly micromanage malloc(). So people just dropped Workbench and programmed to the metal.
So far so good, but when the metal changed (AGA, anyone?) half the bloody software stopped. The software worked perfectly on only one iteration of the hardware. So when better silicon became available (and the support chips in the Amiga had some fairly tight memory limits, for example), they just couldn't deploy it without breaking the user base. So whilst it may have been expandable in some directions, it wasn't easily upgradable.
At least when the MEMC1a allocated RAM it was your bloody RAM to use. :D
Shame really. I never had an Amiga myself, but sometimes I'd quite like one. And most of the best Archimedes games were Amiga ports :)
Speaking of Acorn... The BBC micro had "Tubes"
The Tube was an external bus so expansions weren't really daughterboards (except in the Master, where there was an internal as well as an external connector), but it was a great system and the thing that really set the Acorns (6502 and ARM-based) apart from other computers of the day was what today we'd call the API. Software which correctly followed the API could be ported from machine to machine, from base configuration to co-processor and (aside from any actual assembler) would "just work".
As for FPU etc, the Archimedes did have the "podule" bus - wasn't it possible to fit an FPU in the first slot of that, or am I mis-remembering?
Then of course, the RiscPC. I still use one every day. Not just a podule bus, but a co-processor slot allowing the fitting of two processors. I had a '486SX in the slot next to my original ARM600, then I swapped the 600 for a StrongARM.
There was even a daughterboard which fitted into the processor slot and could hold five processor boards, and something called "Kinetic" which I coveted...
M.
The Tube wasn't even really a bus. It was a fast synchronous I/O port that kept the original BBC micro running, but as a dedicated I/O processor handling the screen, keyboard, attached storage and all the other I/o devices the BEEB was blessed with, while the processor plugged into the Tube did all of the computational work without having to really worry about any I/O. All of the documented OSCLI calls (which included storage, display and other control functions) worked correctly across the Tube, so if you wrote software to use the OSCLI vectors, they just worked.
When a 6502 second processor was used, it gave access to almost the whole 64KB of available memory, and increased the clockspeed from 2MHz to 3MHz(+?) IIRC. Elite was written correctly, and ran superbly in mode 1 without any of the screen jitter that was a result of the mid-display scan mode change (the top was mode 4 and the bottom was mode 5 on a normal BEEB, to keep the screen down to 10KB of memory). Worked really well, and even better with a BitStik as the controller.
I also used both the Acorn and Torch Z80 2nd processors, and I know that there were Intel 8186 running DOS, NS32016 running UNIX (used in the Acorn Business Computer range) and ARM 2nd processors built as well.
"Shame really. I never had an Amiga myself, but sometimes I'd quite like one. And most of the best Archimedes games were Amiga ports :)"
There is a stand-alone Vampire V4 coming out some time... hopefully in 2018. Fully upgraded with 24 bit graphics, 512MB memory, 16 bit audio, AMMX, fast FPU, HDMI output.... 68080 processor (equivalent to ~250MHz 68060). Very cool indeed. You might want to keep an eye out.
"Oh, you want a new feature? Fine!
We'll do the dev up front for nothing and we'll sell the product through the store so only those that want it pay."
Er, the multimedia thread limit was INTRODUCED with windows 10 - previous versions had no such limit - so it looks like Microsoft took something away without asking and now want to charge us to get the functionality back.
The clock speed of the infinity fabric is set to that of the memery it’s controlling. The fabric on VEGA will run to 1300 but haven’t tried higher yet as the HBM struggles to get must past 1250 stable, still looking at over 600GB/s at that speed.
More on IF here
https://en.wikichip.org/wiki/amd/infinity_fabric
"Whether Intel was trying to spoil AMD's announcement, or vice versa, it doesn't really matter: what matters is that someone is putting pressure on monopoly giant Intel, forcing it to innovate in the desktop world rather than burn millions of dollars on trendy side projects it later abandons or neglects."
This here is why I love AMD, who cares what they produce* as long as they keep Intel (relatively) honest.
*I love AMD as much as the next person, no fanboi downvotes please ;)
20+ core CPUs have been around for a while now. But there's a reason hardly anyone uses them, and we've been stuck at 2-4 cores for consumer parts for so long - most regular tasks just don't benefit from being massively parallel. Indeed, many programs still only run on a single thread because there's so little benefit from trying to use more. Even for workstations there are very few cases where throwing as many cores as possible at a task is actually the best approach; you're almost always better off with fewer, faster cores, and will often end up limited by RAM or drive I/O anyway.
People complain about Intel not increasing cores until pushed by AMD, but there were 6 and 8 core i7 CPUs parts around nearly a decade ago - no-one actually wanted them so they stopped making them. Meanwhile Xeons have been happily in the teens and into the 20s, and even those were never anywhere near as popular as those in the 10-12 core range. AMD haven't pushed Intel to do anything useful, they've just started yet another pointless willy-waving exercise where everyone tries to boast about how big their number is with no regard for whether it's actually useful. Which can be clearly seen by the mention of gamers - there isn't a game on the market that will actually benefit from having a 32 core CPU (for the most part they're GPU-limited anyway and as long as you're not doing something stupid like pairing a budget mobile CPU with a 1080 Ti they won't even notice what CPU you have), it's just a big number for people who don't understand what they're doing but want to have a big number.
Well, there's at least one game that has been, is, and will be for the indeterminate future* definitely CPU-limited - a certain space sim in development**. Nothing else you throw at it can make it run as anything other than a slide show. Of course, whether it would benefit from a 32-core CPU or not is anyone's guess - but I'm more than happy to test it if you sponsor me with a test rig...
* amazingly, it's scheduled to deliver some semblance of a solution to this issue exactly at the same time as El Reg switches to IPv6: "Soon". Or, reportedly, two releases from now which, considering their track record regarding deadlines and promises, is actually precisely equivalent to "Soon" down to at least twenty decimals.
** geologists reportedly found evidence somewhere below the Permian layer indicating that the game wasn't always in development; based on this, several cults - widely shunned by the scientific community at large generally agreeing that the Sun will go nova first - believe the game will likewise exit development stage some day, achieving what they call "Release". The subject remains poorly explored after several ethnographic expeditions setting out to study the particularly vicious attitude of these groups towards non-cult-members have gone missing to date, never to be heard about again.
These days, I try to build silent PCs. I'm tired of my desk sounding like an airport with planes taking off every few minutes. I need to rock out to the rhythmic beats of my clackity-clacking mechanical keyboard. My priorities are electrical input, thermal output, and PCIe lanes first, GHz second, number of cores last. Which means that right now, both AMD and Intel have serious issues preventing me from having a preference.
This whole number of cores race is like the stupid GHz race of years past. In the end, no one wins because CPUs get "optimized" into becoming their own bottlenecks.
And whatever happened to adding more execution units and/or making complex executions require fewer cycles? Who cares about the number of sleeping cores and unused cycles you have?
Heh, reminds me of something back in the OS/2 days - somebody soundproofed their PC - then put a resistor over the terminals of the CPU fan to keep it slow as OS/2 doesn't cook your CPU.
Was a really quiet PC.
Nowadays it is like McDonald's farm, a fan-fan here, a fan-fan there all fan-fan-fanning.
You get fans with LED's and allsorts of blinkenlights. Why?
And some components (esp RAID cards) with passive heatsinks really get hot, I prefer to have these cooled with active cooling. More noise. Plus dust is the bane of fans.
In the end it may be better to just get a bar fridge, stick everything inside with a couple of fans and have a really quiet PC (until the fridge compressor kicks in). And have a filter inside which'll filter the air continuously.
> And whatever happened to adding more execution units
Isn't that about adding moar coares?
> and/or making complex executions require fewer cycles
How.
> Who cares about the number of sleeping cores and unused cycles you have
As with programming languages, "it depends on the problem".
Fuck Everything, We're Doing Thirty-Two Cores.
I like how Gillette showed themselves to be beyond parody by actually introducing a five-bladed razor a couple of years after that was written. "Make the blades so thin they're invisible. Put some on the handle. I don't care if they have to cram the fifth blade in perpendicular to the other four, just do it!" Stick it on the back? Yeah, that's good.
I find it virtually impossible to believe that Intel will be able to make a 28 core CPU run at 5GHz on all cores within 6 months, they can't even do that with 6 cores today. And if it was being cooled by liquid nitrogen this would be exceptionally misleading. I see only 4 explanations for this announcement:
1) It is a misunderstanding or vapourware, maybe he meant by the end of the decade which I still wouldn't believe.
2) They use a totally different type of core which isn't very complex but can clock quite high, kind of like Larrabee.
3) The TDP is like 1000 Watts and you need specialist high flow water cooling system to run it which comes pre-bonded direct to the die.
4) Intel has been sandbagging us that 10nm wasn't working properly and it's actually a magical node they defies the known laws of thermodynamics.
My money is on this being vapourware and is just Intel trying to get people's attention in the hope if they are waiting for this to come out then they are not buying AMD CPUs.
The 28-core Core X-series part that's due out this year is 14nm, and will likely be a de-featured Xeon, and will not run at 5GHz unless massively overclocked.
Given Intel's other SKUs, 28 cores isn't ridiculous - but it smells like a stunt to steal the thunder from Threadripper 2.
C.
I am loving the specs of this chip but I'm a little worried that firstly, do I have any programs that will get close to using all those cores and secondly, what will they clock at (once thermal equilibrium is reached) if they are all at 100% load?
Also, if they are not coming out until Q3 this means 7nm versions will not be out anytime soon. Given 7nm Epyc is almost ready to sample I would have estimated that 7nm Threadripper could have been released Q1 2019 at a push but I can't see them coming out until Q3 2019 now with these 12nm being so late to the party, bearing in mind Q3 release could mean they are still agonist 4 months away from hitting the shelves.
Personally I would have preferred to wait a little while longer and got much cooler, much higher clocked 7nm chip version. But I guess if 7nm wouldn't be ready until Q2 / Q3 regardless then the 12nm stop gap is a good move but if this chip has delayed 7nm TR by 6+ months then I would say that was the wrong move by AMD.