nav search
Data Center Software Security Transformation DevOps Business Personal Tech Science Emergent Tech Bootnotes BOFH

back to article
AMD does an Italian job on Intel, unveils 32-core, 64-thread 'Naples' CPU

Silver badge

About effing time

We desperately need some competition in CPU space. Anything else aside we need it so Intel does not price gouge while sitting on its laurels.

If at least some of that trickles down into desktop space I am going to update my development desktop :)

53
0
Alert

Re: About effing time

This is good for servers, but I still want my 16-core / 32-thread RyZen for my home machine. Or at the very least, a dual RyZen CPU mobo.

15
0

Re: About effing time -- But what about performance per watt?

It's great to see AMD finally stepping up, but what about performance per watt? I've yet to see any claim on how well it performs against Intel on a per watt basis.

Energy consumption in data centers plays a huge part in hardware selection, so it's surprising to see this left out.

13
1
Silver badge
Happy

Re: About effing time

In a raid on Intel's x86 server heartland, AMD has unveiled its next shot at server market glory with Naples,...

In the Italian culture, we used to call Naples "Covo di Ladri", or "Den of Thieves".

There must be an Italian at AMD with a great sense of humor!

4
0
Silver badge

Re: About effing time - dual RyZen

I'm more concerned about the RAM and PCIe limits of the desktop RyZen boards. I want 64GB RAM and the potential for quad GPU and more of the new storage. That's the server (workstation) chips for me then. The price is going to sting.

As long I'm posting though... The other shoe that has yet to drop is watts. I think that in terms of compute per watt Intel's about to get slaughtered here. That's actually the biggest deal of all in the datacenter space. Can't wait to see those specs.

4
0
Silver badge

Re: About effing time -- But what about performance per watt?

>what about performance per watt?

Overall system (power) performance and pricing may be more important.

A few watts difference on the CPU may mean little if you are doubling up on systems. Twice the 40G adapters, power supplies, cabling, 40G switch ports and anything else which goes in.

2
0
Bronze badge

Re: About effing time -- But what about performance per watt?

I saw somewhere else that these are clocked pretty low (1.4Ghz base) and rated around 150W.

Still with all those PCIe lanes and DIMMs, the socket itself must be pretty damn big.

1
0
WTF?

and The Register chooses to illustrate it with an American style pizza?

17
0
Silver badge

Agreed

I'm sure that there is a picture of some Neapoitian Ice Cream that El Reg could have used.

Alternatively one of Pompeii/Vesuvius would do nicely as it is not that far away.

5
0
Silver badge

@Ian

I agree.

El Reg pictures hardly ever add value to the articles. At best its a generic heavily re-used stock picture with no relevance or a one-word match as in this case.

Its hard to see how this enhances the site as the technical content is what most come here for. Perhaps its time for us to all switch to Lynx !

12
6
Silver badge

"and The Register chooses to illustrate it with an American style pizza?"

Hey, Pizza Hut made 'em an offer they couldn't refuse. Capiche?

6
2
Silver badge
Pint

@Titter

Sorry mate, you get down voted.

In Amerika, its Pizza Slut not Pizza Hut.

In terms of 'strong arm' marketing, its 'Pappa Johns' (see their tie in w NFL)

But here in Chicago, if you want good pizza, you have a lot more options. Thick or thin, or even as a Calazone or Oven Grinder. So call me a snob.

BTW, the slice looks more like it came from a DiGiorno commercial than a pizza chain.

Pizza and Beer is what makes technology go around and my gut hang over my belt.

(Hence the beer icon)

3
8
Anonymous Coward

> I agree.

>

> El Reg pictures hardly ever add value to the articles

They are just one of the items AdBlocker removes from ElReg pages for me the other being those irritating "badges" in the comments (ooh, look at me, see how many posts I've made) ... and I'm sure there's something else that it blocks for me :-)

4
8
Silver badge

Re: @Titter

But here in Chicago, if you want good pizza, you have a lot more options. Thick or thin, or even as a Calazone or Oven Grinder. So call me a snob.

You can be a pizza snob?! I gave you a downvote solely due to my recollection of eating a chicago style deep dish at Pizzeria Uno on Wabash, being unable to finish half a "small", and spending the next 16 hours belching from reflux due to all the cheese.

Put me off pizza for months.

6
0

Re: @Titter

I've eaten several times at that original Pizzeria Uno. Other than the 45 minute wait (more reason to get another cool one), it is great. Deep dish pizza isn't for everyone, especially those with weak constitutions - so to speak. I can see indigestion being the result.

4
0

Re: @Titter

Pizzeria Uno hasn't been Pizzeria Uno for a couple of decades now (this is true of all of the formerly great Chicago pizza places, as they got bought by "entrepreneurs" and immediately lost all of their individuality and unique recipes.

Plus, deep dish pizza, despite being introduced in Chicago, is hardly what I would call Chicago-style.

(Currently buying my pizzas from Apart, FWIW).

1
0

Re: @Titter

I never wasted time on Pizza in Chicago... Too many rib joints to visit.

Chicago do the best spare ribs.

2
0

Re: @Titter

Deep dish is actually akin to a quiche, a french dish, so suited for less subtile barbarians.

0
1
Anonymous Coward

"El Reg pictures hardly ever add value to the articles. At best its a generic heavily re-used stock picture with no relevance or a one-word match as in this case"

Bullshit. I can state a lot of bias shit that pours off this site, but back off the pics! The pictures here sometime tell the whole story regardless of who wrote it or even where you read it. They are glyphs, which isn't a recent imaginary (or is if you're a rock).

I'd like to see them just post pics and then ask which historical I.T. event it describes. For instance, what 1 pic would describe the entire SCO vs. IBM sun dance?

1
2
Silver badge
Boffin

@Tom Re: @Titter

You should have gone to Lou's a couple of blocks away if you wanted a good deep dish.

And if Deep isn't your thing, Lou's does a good thin too.

0
0
Silver badge

Re: @Titter

Yeah, I agree Uno and Duo are tourist traps. Uno franchised I think...

Depending on your style... Roots out on Chicago Ave is one nice place, or if you want Piece Pizza up in Wicker Park.

Apart is too far North.

Chicago Pizza and Oven Grinders in Lincoln Park is ok too.

There are others and everyone has their favorites.

1
0
Silver badge

I typically browse with images disabled, as normally visiting sites for content, not pretty pictures

1
0

There is no need to go that far. I use a snippet in the Stylish extension to disable El Reg’s headline pics.

@-moz-document domain("www.theregister.co.uk") {

div.article_head img.article_img { display: none; }

}

2
0
Silver badge
Happy

Sod the picture, it's what's underneath that's important!

AND? it is about time AMD Zen reached the market ... Intel will have to catch up, never learned the lesson last time, was caught sleeping on its laurels again ... "It's Opteron once more, shalalalala!"

Ryzen and Naples kick Intel's backdoors !

Competition ? I love it!

20
0
Bronze badge

Competition is good & Ryzen looks great

Not sure I'd trust such an immature platform for a Home PC upgrade just yet though.

1
2

Re: Competition is good & Ryzen looks great

I'm building a few (cheap) home PCs at present and Ryzen is yet to arrive; best value CPU chip this week is suprisingly the Intel Pentium dual-core/4-thread Kaby-Lake G4600 @ 3.60GHz - combined with a KFA2-GTX1060OC 3GB and a few other bits.

0
3
Bronze badge

Re: Competition is good & Ryzen looks great

AMD looks to have provided decent performance across the board with Ryzen which should put some pressure on Intel prices.

Happier times given the value of the pound etc...

4
0
Silver badge

@Hans1 No catch up from Intel.

I think Moore's law failed because there was no need to innovate in terms of core performance, just shrink the die and reduce the power and TDP output.

The question is which chip will better support virtualization and open source software like Hadoop. (Intel paid $$$$$$$ for a chunk of Cloudera)

Looking at AMD entails risk in an environment that is supposed to be risk adverse. So AMD has to offer something over and beyond that risk. Can they do it? We'll have to see what SuperMicro or some of the other well known server board / white box manufacturers do.

I wish them luck, we need that competition to keep things evolving.

1
2
Silver badge
Gimp

OMG! This is seriously good news.

I hoped AMD would have something like this at some point in time. But this good and quick, wow!

OK, fanboi icon ----->

6
0

Incredible achievement.

I wonder what their yield is...

2
0

I need a dual 32 core

I need a dual 32 core if only to have 128 threads, why I don't know but there is always use for more power, better economy, and competition in the CPU market.

3
0

Ryzen

So why do we not have any articles on Ryzen that was released last week?

7
0

Re: Ryzen

This has been already asked !! and somewhat answered.

https://forums.theregister.co.uk/forum/1/2017/02/28/Known_Hero_Lacklustre_reporting/

2
0
Silver badge
Coat

Re: Ryzen

No good Ryzen?

5
0
Silver badge

Probably

because the hardware section has been replaced by the devops section.

4
0
Silver badge
Windows

Hrrrm

looks at HDFS cluster.

Can I get that creature in a quad CPU board please. Hell Yes that would help out with Harduup. <sic, dev somewhere in Philippines>

I'd go for a refresh of the 35 original nodes with that and move them from 15 4Tb drives to the 24 8Tb drives.

And the namenodes would *LOVE* the extra cores.

2
0
Silver badge

No jokes

about blowing bloody doors off?

4
0
Silver badge

Multicore Performance Improvement for the PC ?

Hi,

Not sure as not a computer science graduate, but will the operating system be able to benefit from all those cores ?

I assume that the main RAM is still single access per fetch/put - so is it the law of diminishing returns ?.

Will virtualisation benefit from more cores - even 8 cores ?. (for home use)

Or is more RAM the overriding requirement ?

Thanks and regards,

Shadmeister.

0
0
Bronze badge

Re: Multicore Performance Improvement for the PC ?

In a standalone server/PC, probably not unless you take special cases such as high load web servers or other tasks that are well threaded.

From a server standpoint, it's more capacity to spin up either more or bigger VM's on a single server to use al those cores. Or stick to cheaper 2 socket platforms instead of moving to 4 sockets.

3
0
Silver badge

Re: Multicore Performance Improvement for the PC ?

I was thinking about this as I read the article. Although this new AMD offering has better memory bandwidth than Intel's, the bandwidth per core is less. It's similar to the situation with AMD's desktop range: decent compute power, but not quite as good memory bandwidth (which matters for, eg, games that need to transfer a lot of texture data). At least that was the case when I bought my at-the-time top-of-the-range AMD A10-7870K.

Of course, it's all about making engineering trade-offs, and I think that this is something that AMD does very well. Depending on the amount of L1 cache (L2 is usually* shared across cores, so it doesn't have quite as big an impact, but is still important) and the particular workload, it should be possible for this new offering to out-perform the Intel part quite a lot of the time, as AMD is claiming with their "seismic data" chart.

As for the OS, I know that Linux (can't speak for Windows or others) scales pretty well when you throw extra cores at it. The main overheads will come from the algorithms used in applications: their memory access requirements, inter-thread/-process synchronisation patterns and whether they're written to be cache-aware.

* Actually, following a link to a previous article here, it appears that the L2 is 512Kb per core, not shared. There's also 8Mb of L3 (shared), so I could imagine using this sort of system to run a Docker farm (probably not the right word). The base OS + Docker + shared guest binaries could easily fit in L3. With a more heterogeneous collection of components (different VMs with different guest OSes, effectively like a shared-hosting scenario), there won't be as much duplicated code (or static data) so there'll be many more page faults needing to access real memory.

5
0
Silver badge

Re: Multicore Performance Improvement for the PC ?

Hi All,

Thanks for the replies - much appreciated.

Regards,

Shadmeister.

0
0
Silver badge
Devil

Re: Multicore Performance Improvement for the PC ?

"Not sure as not a computer science graduate, but will the operating system be able to benefit from all those cores ?"

Probably NOT if it's a Microshaft OS. However, I know for a _FACT_ that FreeBSD would benefit, depending on what you're doing. And most likely Linux as well.

Example, use 'make -j n' where 'n' is the # of simultaneous 'jobs' you want to run. I usually pick a value that's at least 50% more than the # of cores I'm running (so that it takes advantage of idle time waiting for I/O and stuff like that). it can make builds go significantly faster, up to 'm' times faster when 'm' is the # of cores you have (yeah, duh).

I understand that there are mpeg encoding libraries now that can ALSO take advantage of multiple cores. I do not know if mpeg DEcoding can use multi-core, but it wouldn't surprise me.

And, not to forget mention of, GAMES. But they're generally OS-agnostic as far as how the game maker wants to implement things.

Worth pointing out, CLASSIC X11 [and _NOT_ Wayland] is a CLIENT/SERVER model, which theoretically runs the graphics in one core, and the application in another one. So by design, an X11 system is _ALREADY_ configured to benefit from multi-core, though the total # of cores that give you a measurable benefit seems to be small (like, 2, maybe?)

Linux also has kernel threads (BSD as well) and whenever you have multiple threads and multiple cores, possibly processing multiple simultaneous I/O requests without blocking one another, you get performance benefit from multi-core. 32 cores, as compared to 4 or even 2, might not make much of a difference, though.

Anyway, aside from algorithms specifically written to leverage multi-core (you can do it with a number of them, from DFT to qsort), most operating systems (inherently) will probably NOT have much of a performance boost between 2 or 4 cores, and 32 cores. That's my $.10 worth, anyway...

0
9

Re: Multicore Performance Improvement for the PC ?

It depends on the workload if the OS can utilize all these threads. We distinguish between two different scaling: scale-up and scale-out.

-Scale-out workloads run all in parallel (embarassingly parallel workloads), there is not much communication going on between the threads. This is HPC cluster number crunching territory. Typically they run a tight for loop on the same grid of points, solving the same PDE over and over again, integrating in time. Everything fits in the cache. All these servers are clusters, such as SGI UV3000, supercomputers, etc. These clusters have 10.000s of cores, as they are a bunch of PCs sitting on a fast switch. They are cheap, if you buy a large cluster, you just pay the pay the price for a individual PC x the number of nodes.

Because all the workload fits into a cache, you never go out to RAM. Cpu cache is 10ns, and RAM is 100ns. Typically one scientist starts up a huge HPC task which takes several days to complete. So one user at a time.

-Scale-up workloads have lot of communication going on. They typically run business ERP workloads, such as SAP, Databases, etc. These workloads always serve many users at the same time, thousands of users or more. One user might do accounting, another payroll, etc. This means all these separate thousands of users data can not fit into a cpu cache. So business workloads always go out to RAM. That means 100 ns latency or so.

Say the cpu runs at 2 GHz. If you have 100 ns latency as you always go out to RAM, that means the 2GHz CPU slows down to 20 MHz. I dont know if you remember those 20 MHz cpus, but they are quite slow. So business workloads (communicating a lot, waiting for other threads to synch) serving thousands of users - have large problems with scaling up. Business servers maxes out at 16 or 32-socket cpus. Every cpu needs a connection to other cpus for fast access, and with 16 or 32 cpus, there will be lot of connections. Say you have 32 sockets, then you need (32 over 2) connections. That is 32*31 = 992 connections, that is quadratic growth. That is very messy. Going above 32 sockets is not doable, if you require that every cpu connects to another (which you do, for fast access). Look at all the connections for this 32--socket SPARC server:

https://regmedia.co.uk/2013/08/28/oracle_sparc_m6_bixby_interconnect.jpg

So large business servers maxes out at 16- or 32-sockets. Clusters can not run business workloads. The reason is clusters have far too few connections. Clusters typically have 100s of cpus, or 1000s. You can not have a direct connection between cpu to cpu with that many cpus. So you cheat, one cpu connects to a group of other cpus. So accessing a cpu to another takes long time, because you need to locate the correct group, and then go to another cpu, and another, etc until you find the correct cpu. There are many hops.. And if you try to run business workloads on a cluster, performance will drop far below 20 MHz. Maybe down to 2MHz. And that is not doable.

So, clusters are scale-out servers typically having 10.000s of cores and 128 TB RAM or so. They are exclusively used for HPC workloads. Supercomputers belong to this arena. They typically run Linux.

Scale-up business servers typically have 16 sockets or so. This arena belongs to RISC such as SPARC / POWER / Mainframe running Solaris, AIX, or IBM zOS. There are no Linux nor x86 here. The reason is Linux does not scale well, x86 does not scale well either. The largest x86 business server was until recently 8-sockets. Look at all the business benchmarks, such as official SAP. All top SAP spots belong to SPARC. x86 comes far far below. Business workloads scales bad, so you need extraordinary servers to handle them, such as old and mature RISC servers. RISC has scaled to 32-sockets for decades. x86 not so. The largest scale-up business server on the market is Fujitsu M10-4S, which is a 64-socket Solaris SPARC server.

Linux does not scale well on business workloads, because until recently there did not exist large business servers beyond 8-sockets - so how can Linux scale well when there does not exist large x86 business servers?

The business arena belongs to RISC and Unix. One IBM P595 POWER6 server costed $35 million. Yes, one single server. Business servers are very lucrative and costs very much. Scalability is very very difficult and you have to pay a hefty premium. Business servers does not cost 1 PC x 32 nodes. No, the cost ramps up quadratically, because it becomes quadratically difficult to scale.

2
0

Article misses a factor of 2 on memory BW

The article states: "Also "Naples" supports up to 21.3GBps per channel with DDR4-2667 x 8 channels (total 170.7GBps), versus the E5-2699A v4 processor's implied 140GBps."

This is based on a faulty interpretation of the statement that the AMD processor has "122% more memory bandwidth". The author apparently interpreted this as 1.22x as much memory bandwidth, to compute 170.7GB/s/1.22 = 140 GB/s. The correct interpretation of "122% more" is "2.22x", yielding 170.7 GB/s vs 76.8 GB/s. This implies 4 channels of DDR4-2400 on each socket of the Xeon E5-2699 v4, which is consistent with Intel's published specifications. The use of "1866 MHz" for the Xeon E5-2699A v4 in an earlier slide may be correct for a configuration with multiple DIMMs per channel -- the details vary by product and are not easy to look up.

9
0
Silver badge
Thumb Up

Very interesting

Currently testing a new MPI algorithm for seriously big images (max 38.6 Gpixel and growing) on our cluster, and getting some pretty good speed-up up to 64 and even 128 and 256 processes (reaching rates of >300 Mpixel/s for a complex image processing task). A cluster of beasts like this would be very nice to test this on further.

Very nice indeed

5
0
Silver badge
Boffin

Re: Very interesting

Would you be able to share with us some more details about what you're doing? We have some similar size images from Digital Pathology and are always on the lookout for better ways to process them.

1
0

Oh, I wish, I wish, I could believe those numbers you show us, Lisa Su.

But you have proven over, and over again to be dishonest and an outright liar about AMD's very selective performance benchmarks.

Fool us once Lisa Su, shame on us. Fool us twice...

0
11

Check out the Ryzen Benchmarks

You will see that in most workloads except Gaming the Ryzen pretty much equals Intel and beats Broadwell-E.

It looks to be a very good multithreader, even with a little less IPC it appears to be far more efficient

in multi-threading workloads than intel.

5
0
Silver badge
Meh

We'll see . . .

Many years ago, we deployed Opteron servers on the basis of their superior core count and performance per watt, but we discovered that real-world performance was worse than Xeons even of the previous generation, excepting certain narrow graphical workloads. While this current batch looks amazing, I would want to see how they actually perform before committing to large-scale deployment.

2
2

The current numbers might look good for cpu performance but what about bus performance.

While the current cpu benches we are being fed look good for performance. It really makes me wonder how strong the pcie bus performance is. Not to mention performance outside the main interconnect in general.

The APU's from AMD have had great internal interconnect performance. But once you leave that space and start trying to add discrete cards or components to the mix it went into the toilet fast. Not to mention stability problems.

I just don't trust AMD at this point with how far behind they have been, and the attitude of well we will fix this next chip. Or we left this out cause it cost us to much despite the demand for it, or already being implemented by our competitors.

if the real world performance ends up fitting the application, and its stable than by all means I will suggest it/implement it as a solution. But at this point I just don't have the trust that i once did in AMD.

1
1

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

The Register - Independent news and views for the tech community. Part of Situation Publishing