* Posts by Dusk

17 publicly visible posts • joined 14 Nov 2014

Hitachi exits mainframe hardware but will collab with IBM on z Systems

Dusk

Re: HDS were the only other makers of IBM plug compatible mainframes outside Amdahl.

Unisys moved to 100% emulation in 2015. Libra 880 and Dorado 880 were the last true Unisys mainframe hardware, and have been removed from marketing.

Bull GCOS is also available as an emulated product (Novascale Gcos family of x86 servers, running V7000 for emulation.) So is ICL VME, courtesy of the Fujitsu "Supernova" emulator.

Dusk

Re: HDS were the only other makers of IBM plug compatible mainframes outside Amdahl.

Good point, although BS2000 isn't particularly IBM-compatible - it's on 390-based hardware (the same CPUs as GS21 actually) but the software environment is totally different, unlike GS21 where MSP looks like alternate-universe MVS and is kinda-sorta compatible.

Outside of IBM and Fujitsu, there's also NEC's domestic mainframe business, although it's not IBM-compatible (it's forked from Bull's GCOS 7 mainframe line.) Still hard to believe there's only three mainframe hardware vendors still standing...

No, Microsoft is not 'killing Windows 10 Mobile'

Dusk

Not quite...

"Sometime this year (or next), Microsoft will have the opportunity to tell a better story, as new ARM processors begin to support x86 instructions."

"New ARM processors" have nothing to do with it. The change is that Microsoft is going to be using a software layer to translate x86 ops to ARMv8A ops, not that something about the processors themselves is changing.

Oracle and Fujitsu SPARC up M12 big iron

Dusk

Re: SpecCPU claims with nothing to back them up

Except a) it was obvious to anyone that "here's SPECint and SPECfp results:" followed by a list of clearly marked _rate scores referred to the rate, as confirmed by the Register; b) for a 32socket machine, most users are going to care about SPECrate more than SPECspeed; and c) the use of autoparallelization in SPECspeed numbers - which is fully legal under SPEC rules - means it's not measuring "single-threaded code running on a single CPU" at all. When an Intel system with two CPUs and 8 cores runs SPEC subtests with OMP_NUM_THREADS to 16 and invokes icc with -parallel, nothing about it is "single CPU."

Your own i7 link shows no sign of setting core affinity with taskset or numactl, which is generally marked in the result and NOT done by the SPEC run harness itself. In addition, it uses autopar and sets OMP_NUM_THREADS.

Dusk

Re: SpecCPU claims with nothing to back them up

There's a couple weeks lead time between hardware release and showing up on spec.org, IME.

Anyway, I think ST's larger complaint was that the article said SPECint when it meant SPECint_rate, which is the same benchmark run differently (with multiple parallel copies, to benchmark whole systems.) It's a minor thing, but evidently important to him.

Dusk

Re: SpecCPU claims with nothing to back them up

I emailed the writer of the article. This was the reply:

"Hi Kira,There is a table from an Oracle Fujitsu doc in the article and that lists the SPECint and SPECfp benchmarks. I've attached it here;

They are 5 SPECxx _rate 2006 ones.

Cheers,

Chris.

Chris Mellor Storage writer at The Register. "

It confirms that they're just referring to the _rate scores after all, as I stated.

I humbly ask for your apology for the "Alternative Facts" remark, as it was not me introducing them.

Dusk

Re: SpecCPU claims with nothing to back them up

The article said "Oracle has provided SPECint and SPECfp benchmark information, saying these servers have set records:" followed by a table of _rate figures. It's very obvious the Register is referring to those _rates, not to some magical reference to SPECspeed numbers that neither Oracle nor Fujitsu seems to be claiming. You're being very pedantic, and it's frankly a bit bizarre.

Dusk

Re: SpecCPU claims with nothing to back them up

> Is that your assertion, or do you have do you have publicly available documentation to back it up?

Look it up. No other result currently surpasses M7 on SPECint_rate, WHICH IS WHAT I SAID - "M7 is currently the highest-scoring rate result."

By the way, it's what Oracle has claimed as well - let me know if you can find a single example of Oracle claiming to have a non-rate record. THIS is what they claim: "The single-processor, SPARC M7-based system set new records for SPECint_rate2006, SPECint_rate_base2006, SPECfp_rate2006, and SPECfp_rate_base2006, demonstrating that SPARC T7-1 can complete a job very quickly and is able to process faster than any other single-socket system." The Register was clearly not referring to single-copy SPECcpu here. Your increasingly-pathetic nitpicking aside, M7 holds the per-processor record for int_rate. I have no fucking clue how it performs on single-thread; my guess is somewhere in the 25-30 int specspeed range, but I haven't run SPEC on M7 myself, so that's an estimate.

Dear God, you're like a reverse Kebabbert.

Dusk
Thumb Down

Re: SpecCPU claims with nothing to back them up

I'm not comparing SpecInt to Specint_rate. I'm comparing rate to rate. M7 is currently the highest-scoring rate result. Therefore, it holds the record. Period. I'm not saying it's the Second Coming. It's ungodly expensive and primarily runs a doomed OS.

And yes, non-rate Specint and Specfp aren't currently relevant, due to broken subtests (462.libquantum, most egregiously) and abuse of autoparallelization (2006 reporting rules allow autopar to an extent spec2000 reporting rules did not; this was a mistake). No non-x86 vendor has published a non-rate SPEC result in several years, as far as I know. Or do you seriously think that Intel processor performance has improved since Core2 by the many thousands of percent implied by their libquantum scores?

SPEC2006 non-rate scores are pretty useless at this point, and a new SPEC revision can't come soon enough. (Among other things, to fix how weirdly their toolchain tends to behave when not using the prebuilt spectools...)

Dusk

Re: SpecCPU claims with nothing to back them up

First off, that's a seven-year-old result. Second, it's a single-threaded score, not a throughput score, so your complaints about "less than half the performance of a consumer system with only one chip!" are nonsensical.

I hate Oracle as much as the next girl, but their SPEC record is legitimate. Please look properly before going off on rants...

https://spec.org/cpu2006/results/res2015q4/cpu2006-20151026-37722.html

To the best of my knowledge, this is currently the highest-performing processor on SPECint_rate. High-end Power and Xeon, last I looked, top out at 900-1000 rate per socket. Oracle, of course, conveniently avoids publishing single-threaded numbers - but my guess is that they range between "mediocre" and "okay, I guess."

IBM to launch cheap 'n' cheerful Power server for i and AIX userbase

Dusk

Re: Performance

I suspect I'm replying to Kebabbert, which never goes well, but here goes...

First off, these systems are for companies that are already in the ecosystem. iSeries users tend to have minimal compute needs (a huge percentage of the iseries installed base is on one-socket Power5 machines; some are on older systems) and a single-core iseries-capable P8 lets them get a fully-licensed turnkey machine for a relatively low price, while still getting a performance upgrade over legacy machines. They can then keep up with newer OS releases and the like. In general, folks moving to a single-core iSeries machine are not going to be greenfield customers.

As to Power8 in general - we run Linux on Power8 because the perf/$ is actually pretty damn good. 8-core S812LC starts under US$5k. For our workload, this Power8 CPU config beats an Intel Xeon E5-2650v4, partially due to the much larger caches and extremely high (over 90GB/s STREAM Triad!) amount of memory bandwidth. There's also some really nice edge features (memory compression) that are kinda cool, even though their support in Linux can be kludgey - for instance, Power8 memory compression basically runs as a hardware assist to zswap, as opposed to on AIX, where the system actually sees more available RAM. We'll probably grab Power9 too, since early performance estimates look superb, depending on whether IBM improves their horrendously unpleasant procurement process for OpenPower systems.

I'm not saying the Big UNIX stuff is competitively priced with x86, because it isn't - but then, neither is anybody else's. But for OpenPower machines, pricing really isn't terrible.

'Neural network' spotted deep inside Samsung's Galaxy S7 silicon brain

Dusk
Thumb Down

Re: Most Surprised

Spectacularly misinformed post...

The vast majority of high-performance ARM processors - including Apple's - use all the features you're bitching about. Branch prediction is basically an absolute necessity for any high-performance design - high clock requires a long pipeline; without a branch predictor, a bubble is created in the pipeline which leads to a stall during branch resolution. This is a major performance issue, and one that a branch predictor with high accuracy resolves. As for your comment about real-time applications, a worst-case time is not impossible to predict; microarchitectures have documented branch mispredict recovery times, usually on the order of 10-20 cyc. This, by the way, is basically no less deterministic than cores with caches, which you seem to have no problem advocating for - if a load hits cache, it might take 5 cyc to complete; if it misses cache and hits main memory, it might take 150 cyc.

Decode/microcode: Decode doesn't mean what you think it is; it's an essential part of any CPU design, RISC or CISC, as decode controls things like "what functional unit does this op go to?" and "what operands does this op use?" Microcode was mentioned nowhere. I suspect you're confusing use of micro-ops - ie, internal basic operations in a long fixed-length format - with microcode, ie lookup of certain complex operations in a microcode ROM at decode time. The first does not imply the second. Most fast processors have a complex decoder for operations that are more efficient to break into 2-3 uops, and this doesn't hit microcode. The M1 core may or may not have microcode - since it doesn't mention a ucode engine in the decode slides, and it wasn't mentioned in the presentation (I was there) I suspect it does not. Even in ARM there are ops that can be beneficial to crack into multiple uops - reg+reg addressing for instance (one uop for the reg+reg calculation, one uop for the load/store.) There are even more examples in other RISC ISA's - take a look at the manual for a modern PowerPC core, for instance, and check out the number of ops that are cracked or microcoded!

As for out-of-order execution, it's an extremely helpful technique for exposing memory-level parallelism (by, for instance, continuing to run code during a cache miss) for surprisingly little additional overhead. Additionally, it takes the number of architectural registers out of the equation by renaming them onto a broad set of physical registers - as a result, in an OoO machine, architectural register count is almost never a hard limitation; false dependencies are eliminated and instructions run when their operands become available, not when a previous non-dependency operation completes so its scratch register can be used. This can improve power efficiency at a given performance target, because an in-order machine generally has to clock higher to get the same level of performance.

Again, Apple does these things too - they have an aggressively out-of-order machine with branch prediction register renaming too (in fact, more aggressively out-of-order than the M1 in the article!) http://www.anandtech.com/show/9686/the-apple-iphone-6s-and-iphone-6s-plus-review/4 has a nice summary of Apple's current uarch.

Please do more research before making this kind of post...

US Senate strikes down open-access FBI hacking warrant by just one honest vote

Dusk

Sixty votes are normally required in the Senate due to cloture rules. It got 58; McConnell opted to vote against it for procedural reasons, or it would have been 59.

Unisys releases its ClearPath MCP OS for VMs or x86

Dusk

Re: Neat, but how useful is this?

Fujitsu, NEC, and Hitachi continue development of proprietary mainframe processors and operating systems. Stratus continues development of the ftServer V line for their VOS operating system - while x86-based, it is far from vanilla PC-compatible.

Not just IBM.

Deutsche Bank to axe 'excessively complex' IT, slash 9,000 jobs

Dusk

Just sticking to systems with reasonable share in Europe...

IBM z/OS

IBM z/VM

IBM z/VSE

Bull GCOS 7

Bull GCOS 8

Fujitsu BS2000 (almost its entire customer base is in Germany)

Fujitsu VM2000

Unisys OS 2200

Reliant UNIX (SINIX)

Tru64

AIX

HP-UX

Solaris

MP-RAS

Linux

HP Nonstop

Stratus VOS

IBM iSeries

Microsoft Windows

Fujitsu VME (unlikely in Germany tho)

HP VMS

Nixdorf Niros family

UXP/DS

MPE/iX

And that's where I start to run out of systems that are likely to be common in a bank.

Samsung grows 'custom ARM' brains to outsmart arch-nemesis Apple

Dusk

"Other mobile chippery firms have experimented with custom application cores, too, but none has made them in the quantities that Qualcomm has. Huawei produces Kirin cores, for example, while LG Electronics and Mediatek have brought us Nuclun and Helio, respectively."

No, not really. All of the above are licensed ARM cores on custom SoCs. Nuclun is Cortex-A15+A7[1], Helio is a high-clocked A53[2], and Kirin has been several things, all of them licensed. [3][4][5]

What Qualcomm and Apple have historically done, and what Samsung is on its way to doing, is using a custom ARM-compatible core design, rather than one licensed directly from ARM Ltd.

[1] http://www.lgnewsroom.com/newsroom/contents/64743

[2] http://www.gsmarena.com/mediatek_helio_is_a_new_family_of_high_end_mobile_chipsets-news-11731.php

[3] http://www.gsmarena.com/huawei_ascend_p7_pops_in_antutu_kirin_910_chipset_gets_tested-news-8345.php

[4] http://www.gsmarena.com/octacore_huawei_kirin_920_chipset_goes_official-news-8720.php

[5] http://www.androidheadlines.com/2015/04/report-huaweis-hisilicon-kirin-930-processor-built-using-28nm-process.html

Exit the dragon: US govt blows $325m on China-beating 300PFLOPS monster computer

Dusk

Source?

What's the source for this using Power9, rather than a Power8 rev with NVlink? I don't see any reference to Power9 in relation to this project from either IBM or Nvidia, and none of the other sources reporting about the new machines mention it.