A few notes from pre (windows) history.
Both Amiga and Commodore boxes had *substantial* ASICs to off load tasks for like video, memory management and sound (as did the Archimdes). Anyone remember what *standard* sound support in a PC was in the late 80s? Not much in the way of "off loading" going on there. Even the Mac had a fair bit of ASIC support, *despite* the way Apple liked to talk about it as "A processor and a bitmap".
I don't think Motorola *ever* supplied an MMU to support the baseline M68000.
ARM designers were able to check out most of the high end processors of the day by hanging boards off the "Tube" interface in the BBC (2nd processor bus). There conclusions ( the money being charged just did not give you the kind of performance boost they *expected* for the clock rate and the 16 bit 6502 they wanted was *years* late). They'd some experience of VLSI's design tools and reckoned they were up to the challenge.
The CISC/RISC example in the article is *very* poor. Historically RISC has been *strong* on *internal* data movement between registers, limiting I/O to a few *specific* instructions.
CISC normally used the idea of "microcoding" where a "short" instruction is actually the start *address* of a short program (in on chip memory)in a simpler processor whose instruction set width is *much* wider. The Alto workstation main memory (which has a nice report available about it) was 16bits but its *microcode* was 32 and partly writeable (if you were, err "bold" enough to do so). The Transputer *deliberately* split instructions into bytes so (in principle) a 32 bit transputer got a 4 instruction look ahead buffer for free.
The Z80 was also a microcoded design.
A RISC goal was *direct* interpretation. Instruction set bits *directly* routing hardware within the CPU. This was difficult (BTW the 6502 did this and it was laid out by *hand*. The delay to the 16 bit version in the Apple IIGS demonstrated what a monumental PITA this is without software tools). IIRC most went with some direct routing and other functions being controlled by signals derived from the bits in the instruction set being fed into a Programmable Logic Array.. VLSI logic supplied tools to help take logic equations and do the layout automatically.
Another goal was architecture that compiler *writers* could get the best out. So developers would want to migrate *onto* the architecture and could do so easily. IIRC *all* the core MSDOS programs that were successes were assembler coded, things like Lotus 123 (remember that?). They thought developer costs were going to rise so better to write clean code in HLL and have the compiler do the heavy lifting, the mad fools.
*All* processor architectures *evolve* over time. Original RISC's have sprouted FPU support, special data type support etc. Intel has *internally* been re-architectured to run *common* instructions *much* faster, making optimisation rules of thumb obsolete on new versions (yes, in some cases your code runs *slower*). At it's core is still the random logic replacement device developed for traffic light control.
The hits you taking reading and writing to/from the outside world are the reasons *all* RISC's have big register sets (SPARC anyone?).
RISC CPU's stressed "orthogonality" where *all* instructions handle *all* data types (typically 8,16 and 32 bits. Possibly 4 if BCD is accepted as well). No tricky op codes to benefit just 1 type that *might* be used sometime (Decimal Adjust anyone?). Likewise *one* instruction format IE *all* 1 word long, not a mix of 8,16,24 or 32 (or more) at random.
Mfg bang on about how many more transistors they can stuff on a chip but if you've doubled the the transistor count without *halving* the power level (or *better*) your power consumption is only going one way.
IIRC Dick Pountain's article on the ARM 1 (can't be a**sed to dig it out) said it was something like 2 micrometres and 25k transistors and about the size of the die for a 6502 (as reported up the thread).