back to article AMD does an Italian job on Intel, unveils 32-core, 64-thread 'Naples' CPU

In a raid on Intel's x86 server heartland, AMD has unveiled its next shot at server market glory with Naples, a 32-core, 64-thread CPU based on its Zen microarchitecture. Naples targets the high-performance server market and confirms AMD wants to be a significant data centre CPU player. It features: Scalable, 32-core System …

  1. Voland's right hand Silver badge

    About effing time

    We desperately need some competition in CPU space. Anything else aside we need it so Intel does not price gouge while sitting on its laurels.

    If at least some of that trickles down into desktop space I am going to update my development desktop :)

    1. NoneSuch Silver badge
      Alert

      Re: About effing time

      This is good for servers, but I still want my 16-core / 32-thread RyZen for my home machine. Or at the very least, a dual RyZen CPU mobo.

      1. BillG
        Happy

        Re: About effing time

        In a raid on Intel's x86 server heartland, AMD has unveiled its next shot at server market glory with Naples,...

        In the Italian culture, we used to call Naples "Covo di Ladri", or "Den of Thieves".

        There must be an Italian at AMD with a great sense of humor!

      2. Mikel

        Re: About effing time - dual RyZen

        I'm more concerned about the RAM and PCIe limits of the desktop RyZen boards. I want 64GB RAM and the potential for quad GPU and more of the new storage. That's the server (workstation) chips for me then. The price is going to sting.

        As long I'm posting though... The other shoe that has yet to drop is watts. I think that in terms of compute per watt Intel's about to get slaughtered here. That's actually the biggest deal of all in the datacenter space. Can't wait to see those specs.

    2. brainbone

      Re: About effing time -- But what about performance per watt?

      It's great to see AMD finally stepping up, but what about performance per watt? I've yet to see any claim on how well it performs against Intel on a per watt basis.

      Energy consumption in data centers plays a huge part in hardware selection, so it's surprising to see this left out.

      1. P. Lee

        Re: About effing time -- But what about performance per watt?

        >what about performance per watt?

        Overall system (power) performance and pricing may be more important.

        A few watts difference on the CPU may mean little if you are doubling up on systems. Twice the 40G adapters, power supplies, cabling, 40G switch ports and anything else which goes in.

        1. Tom 64

          Re: About effing time -- But what about performance per watt?

          I saw somewhere else that these are clocked pretty low (1.4Ghz base) and rated around 150W.

          Still with all those PCIe lanes and DIMMs, the socket itself must be pretty damn big.

  2. Ian Thomas
    WTF?

    and The Register chooses to illustrate it with an American style pizza?

    1. Steve Davies 3 Silver badge

      Agreed

      I'm sure that there is a picture of some Neapoitian Ice Cream that El Reg could have used.

      Alternatively one of Pompeii/Vesuvius would do nicely as it is not that far away.

    2. Dwarf

      @Ian

      I agree.

      El Reg pictures hardly ever add value to the articles. At best its a generic heavily re-used stock picture with no relevance or a one-word match as in this case.

      Its hard to see how this enhances the site as the technical content is what most come here for. Perhaps its time for us to all switch to Lynx !

      1. Anonymous Coward
        Anonymous Coward

        > I agree.

        >

        > El Reg pictures hardly ever add value to the articles

        They are just one of the items AdBlocker removes from ElReg pages for me the other being those irritating "badges" in the comments (ooh, look at me, see how many posts I've made) ... and I'm sure there's something else that it blocks for me :-)

      2. Anonymous Coward
        Anonymous Coward

        "El Reg pictures hardly ever add value to the articles. At best its a generic heavily re-used stock picture with no relevance or a one-word match as in this case"

        Bullshit. I can state a lot of bias shit that pours off this site, but back off the pics! The pictures here sometime tell the whole story regardless of who wrote it or even where you read it. They are glyphs, which isn't a recent imaginary (or is if you're a rock).

        I'd like to see them just post pics and then ask which historical I.T. event it describes. For instance, what 1 pic would describe the entire SCO vs. IBM sun dance?

      3. tiggity Silver badge

        I typically browse with images disabled, as normally visiting sites for content, not pretty pictures

      4. petef

        There is no need to go that far. I use a snippet in the Stylish extension to disable El Reg’s headline pics.

        @-moz-document domain("www.theregister.co.uk") {

        div.article_head img.article_img { display: none; }

        }

    3. TitterYeNot

      "and The Register chooses to illustrate it with an American style pizza?"

      Hey, Pizza Hut made 'em an offer they couldn't refuse. Capiche?

      1. Ian Michael Gumby
        Pint

        @Titter

        Sorry mate, you get down voted.

        In Amerika, its Pizza Slut not Pizza Hut.

        In terms of 'strong arm' marketing, its 'Pappa Johns' (see their tie in w NFL)

        But here in Chicago, if you want good pizza, you have a lot more options. Thick or thin, or even as a Calazone or Oven Grinder. So call me a snob.

        BTW, the slice looks more like it came from a DiGiorno commercial than a pizza chain.

        Pizza and Beer is what makes technology go around and my gut hang over my belt.

        (Hence the beer icon)

        1. Tom 38

          Re: @Titter

          But here in Chicago, if you want good pizza, you have a lot more options. Thick or thin, or even as a Calazone or Oven Grinder. So call me a snob.

          You can be a pizza snob?! I gave you a downvote solely due to my recollection of eating a chicago style deep dish at Pizzeria Uno on Wabash, being unable to finish half a "small", and spending the next 16 hours belching from reflux due to all the cheese.

          Put me off pizza for months.

          1. Rob Isrob

            Re: @Titter

            I've eaten several times at that original Pizzeria Uno. Other than the 45 minute wait (more reason to get another cool one), it is great. Deep dish pizza isn't for everyone, especially those with weak constitutions - so to speak. I can see indigestion being the result.

          2. John Gamble

            Re: @Titter

            Pizzeria Uno hasn't been Pizzeria Uno for a couple of decades now (this is true of all of the formerly great Chicago pizza places, as they got bought by "entrepreneurs" and immediately lost all of their individuality and unique recipes.

            Plus, deep dish pizza, despite being introduced in Chicago, is hardly what I would call Chicago-style.

            (Currently buying my pizzas from Apart, FWIW).

            1. majorursa

              Re: @Titter

              Deep dish is actually akin to a quiche, a french dish, so suited for less subtile barbarians.

            2. Ian Michael Gumby

              Re: @Titter

              Yeah, I agree Uno and Duo are tourist traps. Uno franchised I think...

              Depending on your style... Roots out on Chicago Ave is one nice place, or if you want Piece Pizza up in Wicker Park.

              Apart is too far North.

              Chicago Pizza and Oven Grinders in Lincoln Park is ok too.

              There are others and everyone has their favorites.

          3. Ian Michael Gumby
            Boffin

            @Tom Re: @Titter

            You should have gone to Lou's a couple of blocks away if you wanted a good deep dish.

            And if Deep isn't your thing, Lou's does a good thin too.

        2. Slackness

          Re: @Titter

          I never wasted time on Pizza in Chicago... Too many rib joints to visit.

          Chicago do the best spare ribs.

  3. Hans 1
    Happy

    Sod the picture, it's what's underneath that's important!

    AND? it is about time AMD Zen reached the market ... Intel will have to catch up, never learned the lesson last time, was caught sleeping on its laurels again ... "It's Opteron once more, shalalalala!"

    Ryzen and Naples kick Intel's backdoors !

    Competition ? I love it!

    1. ArrZarr Silver badge

      Competition is good & Ryzen looks great

      Not sure I'd trust such an immature platform for a Home PC upgrade just yet though.

      1. An nonymous Cowerd

        Re: Competition is good & Ryzen looks great

        I'm building a few (cheap) home PCs at present and Ryzen is yet to arrive; best value CPU chip this week is suprisingly the Intel Pentium dual-core/4-thread Kaby-Lake G4600 @ 3.60GHz - combined with a KFA2-GTX1060OC 3GB and a few other bits.

        1. theblackhand

          Re: Competition is good & Ryzen looks great

          AMD looks to have provided decent performance across the board with Ryzen which should put some pressure on Intel prices.

          Happier times given the value of the pound etc...

    2. Ian Michael Gumby

      @Hans1 No catch up from Intel.

      I think Moore's law failed because there was no need to innovate in terms of core performance, just shrink the die and reduce the power and TDP output.

      The question is which chip will better support virtualization and open source software like Hadoop. (Intel paid $$$$$$$ for a chunk of Cloudera)

      Looking at AMD entails risk in an environment that is supposed to be risk adverse. So AMD has to offer something over and beyond that risk. Can they do it? We'll have to see what SuperMicro or some of the other well known server board / white box manufacturers do.

      I wish them luck, we need that competition to keep things evolving.

  4. GrumpenKraut
    Gimp

    OMG! This is seriously good news.

    I hoped AMD would have something like this at some point in time. But this good and quick, wow!

    OK, fanboi icon ----->

  5. W Donelson

    Incredible achievement.

    I wonder what their yield is...

  6. Anonymous Coward
    Anonymous Coward

    I need a dual 32 core

    I need a dual 32 core if only to have 128 threads, why I don't know but there is always use for more power, better economy, and competition in the CPU market.

  7. Fenton

    Ryzen

    So why do we not have any articles on Ryzen that was released last week?

    1. Known Hero

      Re: Ryzen

      This has been already asked !! and somewhat answered.

      https://forums.theregister.co.uk/forum/1/2017/02/28/Known_Hero_Lacklustre_reporting/

    2. David Roberts
      Coat

      Re: Ryzen

      No good Ryzen?

    3. Francis Boyle Silver badge

      Probably

      because the hardware section has been replaced by the devops section.

  8. Alistair
    Windows

    Hrrrm

    looks at HDFS cluster.

    Can I get that creature in a quad CPU board please. Hell Yes that would help out with Harduup. <sic, dev somewhere in Philippines>

    I'd go for a refresh of the 35 original nodes with that and move them from 15 4Tb drives to the 24 8Tb drives.

    And the namenodes would *LOVE* the extra cores.

  9. Solarflare

    No jokes

    about blowing bloody doors off?

  10. This post has been deleted by its author

    1. theblackhand

      Re: Multicore Performance Improvement for the PC ?

      In a standalone server/PC, probably not unless you take special cases such as high load web servers or other tasks that are well threaded.

      From a server standpoint, it's more capacity to spin up either more or bigger VM's on a single server to use al those cores. Or stick to cheaper 2 socket platforms instead of moving to 4 sockets.

    2. Frumious Bandersnatch

      Re: Multicore Performance Improvement for the PC ?

      I was thinking about this as I read the article. Although this new AMD offering has better memory bandwidth than Intel's, the bandwidth per core is less. It's similar to the situation with AMD's desktop range: decent compute power, but not quite as good memory bandwidth (which matters for, eg, games that need to transfer a lot of texture data). At least that was the case when I bought my at-the-time top-of-the-range AMD A10-7870K.

      Of course, it's all about making engineering trade-offs, and I think that this is something that AMD does very well. Depending on the amount of L1 cache (L2 is usually* shared across cores, so it doesn't have quite as big an impact, but is still important) and the particular workload, it should be possible for this new offering to out-perform the Intel part quite a lot of the time, as AMD is claiming with their "seismic data" chart.

      As for the OS, I know that Linux (can't speak for Windows or others) scales pretty well when you throw extra cores at it. The main overheads will come from the algorithms used in applications: their memory access requirements, inter-thread/-process synchronisation patterns and whether they're written to be cache-aware.

      * Actually, following a link to a previous article here, it appears that the L2 is 512Kb per core, not shared. There's also 8Mb of L3 (shared), so I could imagine using this sort of system to run a Docker farm (probably not the right word). The base OS + Docker + shared guest binaries could easily fit in L3. With a more heterogeneous collection of components (different VMs with different guest OSes, effectively like a shared-hosting scenario), there won't be as much duplicated code (or static data) so there'll be many more page faults needing to access real memory.

      1. This post has been deleted by its author

    3. bombastic bob Silver badge
      Devil

      Re: Multicore Performance Improvement for the PC ?

      "Not sure as not a computer science graduate, but will the operating system be able to benefit from all those cores ?"

      Probably NOT if it's a Microshaft OS. However, I know for a _FACT_ that FreeBSD would benefit, depending on what you're doing. And most likely Linux as well.

      Example, use 'make -j n' where 'n' is the # of simultaneous 'jobs' you want to run. I usually pick a value that's at least 50% more than the # of cores I'm running (so that it takes advantage of idle time waiting for I/O and stuff like that). it can make builds go significantly faster, up to 'm' times faster when 'm' is the # of cores you have (yeah, duh).

      I understand that there are mpeg encoding libraries now that can ALSO take advantage of multiple cores. I do not know if mpeg DEcoding can use multi-core, but it wouldn't surprise me.

      And, not to forget mention of, GAMES. But they're generally OS-agnostic as far as how the game maker wants to implement things.

      Worth pointing out, CLASSIC X11 [and _NOT_ Wayland] is a CLIENT/SERVER model, which theoretically runs the graphics in one core, and the application in another one. So by design, an X11 system is _ALREADY_ configured to benefit from multi-core, though the total # of cores that give you a measurable benefit seems to be small (like, 2, maybe?)

      Linux also has kernel threads (BSD as well) and whenever you have multiple threads and multiple cores, possibly processing multiple simultaneous I/O requests without blocking one another, you get performance benefit from multi-core. 32 cores, as compared to 4 or even 2, might not make much of a difference, though.

      Anyway, aside from algorithms specifically written to leverage multi-core (you can do it with a number of them, from DFT to qsort), most operating systems (inherently) will probably NOT have much of a performance boost between 2 or 4 cores, and 32 cores. That's my $.10 worth, anyway...

    4. PlinkerTind

      Re: Multicore Performance Improvement for the PC ?

      It depends on the workload if the OS can utilize all these threads. We distinguish between two different scaling: scale-up and scale-out.

      -Scale-out workloads run all in parallel (embarassingly parallel workloads), there is not much communication going on between the threads. This is HPC cluster number crunching territory. Typically they run a tight for loop on the same grid of points, solving the same PDE over and over again, integrating in time. Everything fits in the cache. All these servers are clusters, such as SGI UV3000, supercomputers, etc. These clusters have 10.000s of cores, as they are a bunch of PCs sitting on a fast switch. They are cheap, if you buy a large cluster, you just pay the pay the price for a individual PC x the number of nodes.

      Because all the workload fits into a cache, you never go out to RAM. Cpu cache is 10ns, and RAM is 100ns. Typically one scientist starts up a huge HPC task which takes several days to complete. So one user at a time.

      -Scale-up workloads have lot of communication going on. They typically run business ERP workloads, such as SAP, Databases, etc. These workloads always serve many users at the same time, thousands of users or more. One user might do accounting, another payroll, etc. This means all these separate thousands of users data can not fit into a cpu cache. So business workloads always go out to RAM. That means 100 ns latency or so.

      Say the cpu runs at 2 GHz. If you have 100 ns latency as you always go out to RAM, that means the 2GHz CPU slows down to 20 MHz. I dont know if you remember those 20 MHz cpus, but they are quite slow. So business workloads (communicating a lot, waiting for other threads to synch) serving thousands of users - have large problems with scaling up. Business servers maxes out at 16 or 32-socket cpus. Every cpu needs a connection to other cpus for fast access, and with 16 or 32 cpus, there will be lot of connections. Say you have 32 sockets, then you need (32 over 2) connections. That is 32*31 = 992 connections, that is quadratic growth. That is very messy. Going above 32 sockets is not doable, if you require that every cpu connects to another (which you do, for fast access). Look at all the connections for this 32--socket SPARC server:

      https://regmedia.co.uk/2013/08/28/oracle_sparc_m6_bixby_interconnect.jpg

      So large business servers maxes out at 16- or 32-sockets. Clusters can not run business workloads. The reason is clusters have far too few connections. Clusters typically have 100s of cpus, or 1000s. You can not have a direct connection between cpu to cpu with that many cpus. So you cheat, one cpu connects to a group of other cpus. So accessing a cpu to another takes long time, because you need to locate the correct group, and then go to another cpu, and another, etc until you find the correct cpu. There are many hops.. And if you try to run business workloads on a cluster, performance will drop far below 20 MHz. Maybe down to 2MHz. And that is not doable.

      So, clusters are scale-out servers typically having 10.000s of cores and 128 TB RAM or so. They are exclusively used for HPC workloads. Supercomputers belong to this arena. They typically run Linux.

      Scale-up business servers typically have 16 sockets or so. This arena belongs to RISC such as SPARC / POWER / Mainframe running Solaris, AIX, or IBM zOS. There are no Linux nor x86 here. The reason is Linux does not scale well, x86 does not scale well either. The largest x86 business server was until recently 8-sockets. Look at all the business benchmarks, such as official SAP. All top SAP spots belong to SPARC. x86 comes far far below. Business workloads scales bad, so you need extraordinary servers to handle them, such as old and mature RISC servers. RISC has scaled to 32-sockets for decades. x86 not so. The largest scale-up business server on the market is Fujitsu M10-4S, which is a 64-socket Solaris SPARC server.

      Linux does not scale well on business workloads, because until recently there did not exist large business servers beyond 8-sockets - so how can Linux scale well when there does not exist large x86 business servers?

      The business arena belongs to RISC and Unix. One IBM P595 POWER6 server costed $35 million. Yes, one single server. Business servers are very lucrative and costs very much. Scalability is very very difficult and you have to pay a hefty premium. Business servers does not cost 1 PC x 32 nodes. No, the cost ramps up quadratically, because it becomes quadratically difficult to scale.

  11. DrBandwidth

    Article misses a factor of 2 on memory BW

    The article states: "Also "Naples" supports up to 21.3GBps per channel with DDR4-2667 x 8 channels (total 170.7GBps), versus the E5-2699A v4 processor's implied 140GBps."

    This is based on a faulty interpretation of the statement that the AMD processor has "122% more memory bandwidth". The author apparently interpreted this as 1.22x as much memory bandwidth, to compute 170.7GB/s/1.22 = 140 GB/s. The correct interpretation of "122% more" is "2.22x", yielding 170.7 GB/s vs 76.8 GB/s. This implies 4 channels of DDR4-2400 on each socket of the Xeon E5-2699 v4, which is consistent with Intel's published specifications. The use of "1866 MHz" for the Xeon E5-2699A v4 in an earlier slide may be correct for a configuration with multiple DIMMs per channel -- the details vary by product and are not easy to look up.

  12. Michael H.F. Wilkinson Silver badge
    Thumb Up

    Very interesting

    Currently testing a new MPI algorithm for seriously big images (max 38.6 Gpixel and growing) on our cluster, and getting some pretty good speed-up up to 64 and even 128 and 256 processes (reaching rates of >300 Mpixel/s for a complex image processing task). A cluster of beasts like this would be very nice to test this on further.

    Very nice indeed

    1. Korev Silver badge
      Boffin

      Re: Very interesting

      Would you be able to share with us some more details about what you're doing? We have some similar size images from Digital Pathology and are always on the lookout for better ways to process them.

  13. retired_in_london

    Oh, I wish, I wish, I could believe those numbers you show us, Lisa Su.

    But you have proven over, and over again to be dishonest and an outright liar about AMD's very selective performance benchmarks.

    Fool us once Lisa Su, shame on us. Fool us twice...

    1. Fenton

      Check out the Ryzen Benchmarks

      You will see that in most workloads except Gaming the Ryzen pretty much equals Intel and beats Broadwell-E.

      It looks to be a very good multithreader, even with a little less IPC it appears to be far more efficient

      in multi-threading workloads than intel.

  14. Throatwarbler Mangrove Silver badge
    Meh

    We'll see . . .

    Many years ago, we deployed Opteron servers on the basis of their superior core count and performance per watt, but we discovered that real-world performance was worse than Xeons even of the previous generation, excepting certain narrow graphical workloads. While this current batch looks amazing, I would want to see how they actually perform before committing to large-scale deployment.

  15. Tsunamijuan

    The current numbers might look good for cpu performance but what about bus performance.

    While the current cpu benches we are being fed look good for performance. It really makes me wonder how strong the pcie bus performance is. Not to mention performance outside the main interconnect in general.

    The APU's from AMD have had great internal interconnect performance. But once you leave that space and start trying to add discrete cards or components to the mix it went into the toilet fast. Not to mention stability problems.

    I just don't trust AMD at this point with how far behind they have been, and the attitude of well we will fix this next chip. Or we left this out cause it cost us to much despite the demand for it, or already being implemented by our competitors.

    if the real world performance ends up fitting the application, and its stable than by all means I will suggest it/implement it as a solution. But at this point I just don't have the trust that i once did in AMD.

  16. Alistair
    Windows

    I'm asking hardware vendor reps if I can get hands on test boxes. (HP and IBM).

    Trevor: any word from your white box rep?

  17. Roj Blake Silver badge

    But What About Purley?

    So the AMD processor due out in the Summer outperforms Intel's current offering.

    Fair enough. Well done AMD.

    But the real question is: how will it compare against Intel's Purley series, also due out in the Summer?

    Hint: Purley will wipe the floor

    1. Fenton

      Re: But What About Purley?

      I assume you mean the Skylake server release?

      Not a massive jump in IPC from Broadwell to Skylake and Skylake to Kaby Lake even less so.

      The E5 2699v5 is rumored to only have 28 cores (although a 32core rumor has also been spotted).

      But the clock speed rumor is only 1.8GHz, So I think this will be neck and neck, Skylake with a little higher IPC but fewer cores, is may just come down to Multi-threading efficiency and frequency)

      1. Roj Blake Silver badge

        Re: But What About Purley?

        Yes, I'm talking about the Skylakes. I'm not sure what rumours you've heard, but the road map that I may or may not have seen has no mention of an E5-2699v5, mainly because they're merging E5 and E7 and completely changing the nomenclature.

        1. Fenton

          Re: But What About Purley?

          From what I have read, the are creating a common socket for E5/E7 yes they may get a different name, but the E5 equivalent will have the AVX accelerator and the E7 the XML acceerator. Personally I'd love them to merge the lines as currently the software I deal with in only certified on E7 and the associated servers carry a large price premium.

          1. petef

            Re: But What About Purley?

            In that video the unreleased Naples was tested against Broadwell-EP which has been out for about a year. It will be fairer to compare against Skylake-EP when it is released. That will have AVX-512 and presumably the other performance lifts we expect with a new generation.

            1. Fenton

              Re: But What About Purley?

              We can get a good indication of the performance differences when we look at the Ryzen benchmarks against the desktop skylake processors that include the new AVX instructions.

              As an example in Cinebench

              Ryzen 1800X achieves a single threaded score of 162 and a multithreaded score of 1637 (16 threads)

              10:1 MT/ST ratio

              Intel 6900K Also achieves 162 and a multithreaded score of 1490

              9.12:1 MT/ST ratio

              Intel i7-7770k (skylake) achieves 201 and 985 (with 8 threads only)

              Lets assume a 16 thread skylake will run at the same 4.2-4.5Ghz and has double the multithreaded performance of the i7-7770k the multithreaded result would be 1970

              9.8 MT/ST ratio

              But are we likely to see a 16 thread skylake run at that frequency range?

  18. Mikel

    Pics

    Didn't mind netbook girl at all.

  19. Coolbikerdad

    Wrong city

    <pedant> "Italian Job"? The classic movie was set in Turin, not Naples </pedant>

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like