16TiB of RAM? Is that all?
*Twitch*
Amazon Web Services is working on new instance types that will offer either eight or 16 terabytes of random access memory. You read that right: 8TB or 16TB of RAM, with the target workload being in-memory computing and especially SAP HANA. The cloud colossus is also working on HANA clusters that span 17 nodes and pack a …
I know for sure IBM can do 32TB in a single system, Sun/Oracle goes at least to 24, HPE - ah, Itanium. Ok, thats sad. But if you really need more than 20T ram you could do better than with one or two dozen nodes. Beancounters might disagree, though as you´re here anyway, so your time is free, right?
SPARC M10-4S goes up to 64TB RAM. And 64-sockets. It is not a cluster, it is a business server.
https://www.oracle.com/servers/sparc/fujitsu-m10-4s/index.html
And remember, the Linux SGI UV3000 servers are clusters, as they only run clustered workloads. The latency to nodes far away are terrible which means they can only do HPC number crunching work with little to no communication between the nodes. The opposite is business workloads, who communicate a lot between the nodes - so business workloads stops at 16- or 32-socket servers because latency will kill it, if the server has more cpus.
Yes, thought so Sun has a larger machine, And I completely agree with the scalability issues. This was (one of) the point(s) I was trying to make with the 17 node HANA clusters AWS now offers. Judging by the downvotes this did not go down well with the general audience :)
HPE Itanium systems... yeah only 8TB in a Superdome2 server, but then that's likely constrained by demand rather than capability - why spend the time certifying bigger DIMM sizes when folks aren't asking for it?
x86 Systems (which is where this action is really at)? HPE Superdome X goes to 24TB with 64GB DIMMs and 48GB with 128GB DIMMs.
Of course the prototype of "The Machine" has 160TB, and that will be (maybe not yet) persistent memory as well.
UNIXland does it big, but honestly (and it pains me as an ex-UNIX admin to say this), it's just not relevant any more.
@dedmonst
When you compare how x86 scales, vs Unix - the x86 falls flat. It is only recently the first 16-socket x86 servers have arrived to the market. Earlier the largest x86 server was an ordinary 8-socket Oracle/Dell/HP server. When you benchmarked x86 linux vs Solaris on similar hardware, the Solaris server was faster despite using lower clocked cpus (check for instance official SAP benchmarks). Linux scales bad on x86 with more than 8-sockets.
My point is that these large x86 servers are first generation and does not perform well. For instance, Solaris 11 and IBM AIX had a rewritten memory system to handle large RAM servers. I doubt Linux can do that, even though you can put in 32 TB in a server - Linux can not drive it. Linux scales bad cpu wise too, and cannot utilize 16-sockets well. Why should Linux utilize 32 TB RAM well?
So, I say that Unix/RISC scales far better than Linux/x86. Just check the benchmarks, typically x86 has bad scaling. For instance, SAP benchmarks. All the top SAP or TPC scores all belong to Unix/RISC. Linux on 16-socket x86 is far below and not near the top. So I dont agree. Linux/x86 gives too bad performance on large 16-socket servers with large RAM. It is no match for Unix/RISC.
"A humble old 8086 would take about 6 months just to zero that lot."
Too often people forget this. The current generation of in-memory systems that this box is aimed at can quite easily become memory bandwidth limited. There's a tendancy to assume in-memory will be fast and you can use simple data structures (looking at you SQL Server and TimesTen), but at this scale that's not necessarily true. That Intel CPU will do 85GB/sec according to the spec sheet (not sure if that's bytes or bits) so it'll take a minute and a half to table scan 8TB - which is a long time for someone expecting interactive performance.
"That Intel CPU will do 85GB/sec according to the spec sheet (not sure if that's bytes or bits) so it'll take a minute and a half to table scan 8TB - which is a long time for someone expecting interactive performance."
The memory would be divided for several CPUs. A single Intel 8880 v3 can address a maximum of 1.54TB of memory - scanning even that would still take about 20s at 85GB/s.
On a 8-way system you'd need to fire that scan with threads running on each CPU - then that 8TB would take just 11 seconds (under optimal settings). If you'd scan all that 8TB via a single thread and through QPI (NUMA) it would take over 2 minutes. (57.6GB/s with 3xQPI)
According to STREAM ram bandwidth benchmarks, Intel typically does 60 GB/sec in real life benchmarks. 85 GB/sec is more a theoretical limit.
https://blogs.oracle.com/bestperf/memory-and-bisection-bandwidth:-sparc-t7-and-m7-servers-faster-than-x86-and-power8
SPARC M7 does 130GB/sec in same benchmarks.
I think there is much cheaper optimization that a lot of people tend to forget. Look at which part of that data actually is needed for real-time.
I worked on a few SAP HANA enterprise projects recently and I managed to convert 2 supposed 4TB in-memory databases to 512gb in-memory databases simply asking business which data is a need for interactive analyses and cross-referencing that to database IT side of the house was trying to load in memory. 512 is still a good chunk of RAM but now I have a DB that can fit any decent x86 box rather than these enterprise models which take whole SMB shop server budget per unit...
This post has been deleted by its author