back to article Ghost of DEC Alpha is why Windows is rubbish at file compression

Microsoft's made an interesting confession: Windows file compression is rubbish because the operating system once supported Digital Equipment Corporation's (DEC's) Alpha CPU. Alpha was a 64-bit RISC architecture that DEC developed as the successor to its VAX platform. DEC spent much of the late 1990s touting Alpha as a step …

Page:

  1. FF22

    Obvious bull

    " Which is a fine way to match compression to a machine's capabilities, but a lousy way to make data portable because if a system only has access to algorithm Y, data created on an algorithm-X-using machine won't be readable. Which could make it impossible to move drives between machines."

    Right. Because you couldn't have possibly included (de)compression code for both algorithms in all versions of the OS, and you couldn't have possibly used an extra bit or byte in the volume descriptor or in the directory entries of files to signal which particular method was used to compress them.

    1. Ole Juul

      Re: Obvious bull

      And that effort, writes Microsoftie Raymond Chen, is why Windows file compression remains feeble.

      Perhaps he should have added "for given values of why".

    2. Kernel

      Re: Obvious bull

      "Right. Because you couldn't have possibly included (de)compression code for both algorithms in all versions of the OS"

      No, wrong - because nowhere has it been stated there would only be two versions of the algorithm required needed - it should be quite obvious to most people that X and Y are examples to keep things simple and that in reality each cpu architecture would require its own algorithm - and at the time there was more than two architectures in play.

      1. Adam 1

        Re: Obvious bull

        Let's not confuse algorithm and file format. The language used seems very loose to me. The algorithms are simply the methodology taken to transform one byte stream to another. It stands to reason that different architectures will be better at some algorithms than others because of the various sizes of caches and buses involved. Some lend themselves to larger dictionaries and better parallelism than others. There's no reason other than priorities as to why they haven't switched to something more suited to x86 in newer versions.

    3. Ken Hagan Gold badge

      Re: Obvious bull

      "Because you couldn't have possibly included (de)compression code for both algorithms in all versions of the OS"

      Right, and if MS had delivered disc compression that meant that discs written on one system would totally suck on performance when plugged into another system, no-one on these forums would have written long rants on how this epitomised MS's cluelessness about "portability".

    4. richardcox13

      Re: Obvious bull

      > Right. Because you couldn't have possibly included (de)compression

      This is covered, but it is assumed know compressed files are updatable (this is another restriuction of on the compression algorithm: you need to be able to change parts of the file without re-writing and compressing the whole thing).

      So one scenario is file is created on a x86 box and then updated on an Alpha box, So the Alpha system has to be able to compress in the same way, while still meeting to meet the performance criteria.

      1. Charlie Clark Silver badge

        Re: Obvious bull

        So one scenario is file is created on a x86 box and then updated on an Alpha box, So the Alpha system has to be able to compress in the same way, while still meeting to meet the performance criteria.

        Why would that matter? If the OS is doing the compression then it will present the compressed file to any external programs as uncompressed. Another box accessing the file over the network would see it as uncompressed (network compression technology is different). And this still doesn't really back up the fairly flimsy assertion that Alpha's weren't up to the job. I certainly don't remember this being an issue when the architectures were being compared back in the day. It's certainly not a RISC/CISC issue. At least not if the algorithm is being correctly implemented. But I seem to remember that many of the performance "improvements" (display, networking, printing) in NT 4 were done especially for x86 chips which suck at context-switching. Coincidentally NT 4 was seen as MS abandoning any pretence of writing a single OS for many different architectures.

        I'm not a whizz at compression but I think the common approach now is to use a container approach as opposed to trying to manage compression on individual files in place.

    5. Mage Silver badge

      Re: Obvious bull

      Yes, given that that there was other CPU specific code and HAL.

      Actually too, the original NT was for 32 bit Alpha. The 64Bit NT4.0 was for Alpha 64 only and came out later after regular Alpha 32 NT4.0. I never saw the 64bit Alpha of Win2K, though I may have existed, it's not on my MSDN collection.

      1. joeldillon

        Re: Obvious bull

        Well, uh. The Alpha (like the Itanium) was a pure 64 bit chip. There was no such thing as a 32 bit Alpha. A 32 bit build of NT for the Alpha doing the equivalent of the x32 ABI, maybe...

      2. Nate Amsden

        Re: Obvious bull

        Maybe I am wrong but I recall original NT being for... i98x cpus or something like that (not x86 and not alpha). X86 alpha mips ppc were added later.

        I used NT 3.51 and 4 on my desktop(x86) for a few years before switching to linux in 1998.

        The article doesn't seem to mention who actually uses ntfs compression. I've only seen it used in cases of emergency where you need some quick disk space. I seem to recall patch rollbacks are stored compressed too (I always tell explorer to show compressed in different color)

        Even back when I had NT4 servers 15 years ago, never used compression on em. Saw a spam recently for Diskeeper, brought back some memories.

        1. patrickstar

          Re: Obvious bull

          i960, code-name "N-10"

          Hence Windows NT as in 'N-Ten' (or rather NT OS/2 as in 'OS/2 for N-10' but that's another story...).

    6. Rob Moir

      Re: Obvious bull

      Because you couldn't have possibly included (de)compression code for both algorithms in all versions of the OS, and you couldn't have possibly used an extra bit or byte in the volume descriptor or in the directory entries of files to signal which particular method was used to compress them.

      And get more complaints about "Windows bloat"? Especially when there's more than one CPU instruction set involved. Windows NT supported x86, PowerPC and Alpha around the NT 3.51/NT4 days iirc, and developments of that system subsequently went to to Itanium, x86-64 and ARM (windows RT)

    7. Shaha Alam

      Re: Obvious bull

      hard disks were a lot smaller back then so some sacrifices had to be made for the sake of practicality.

      there were already complaints about bloat in windows.

    8. Planty Bronze badge
      WTF?

      Re: Obvious bull

      I wonder what Microsoft's excuse is for their abysmal Explorer zip support?

      I mean why does Windows Explorer take 30 minutes to unzip a file that 7Zip can manage in 30 seconds??

      Perhaps they have some lame excuse for that prepared?

      1. smot

        Re: Obvious bull

        "I mean why does Windows Explorer take 30 minutes to unzip a file that 7Zip can manage in 30 seconds??"

        And why does it unzip first into a temp folder then copy into the destination? What's wrong with putting it straight into the target? Saves space and time and avoids yet more crud in the temp folder.

    9. TheVogon

      Re: Obvious bull

      File system compression usually isn't the way to go these days as disk is relatively cheap and because it can have a high overhead.

      At the SME / enterprise level where storage savings can stack up, Thin provisioning and maybe deduplication is often more what you need...

      1. Eddy Ito

        Re: Obvious bull

        I think that's a point of confusion here. Chen was talking about "file system compression" but the article repeatedly says "file compression" which is a very different beast. As written, it makes it sound as if LZMA may only work on some platforms which is silly.

      2. Robert Carnegie Silver badge

        Re: What's cheap

        Spinning rust is cheap, network bandwidth may not be, SSD certainly isn't cheap.

        One recent Windows clever idea is to supply the entire operating system pre-compressed. Much space saved.

        Your monthly patches, however, aren't compressed. So the disk fills up with operating system files anyway.

        Maybe they will get around that by reinstalling the entire operating system from time to time, but calling it an update. Or maybe they already have.

        1. Ian 55

          Re: What's cheap

          "One recent Windows clever idea is to supply the entire operating system pre-compressed. Much space saved."

          You mean like Linux 'live CDs' for well over a decade?

          1. Anonymous Coward
            Anonymous Coward

            Re: entire operating system pre-compressed. Much space saved

            And in 1999, you could get a compressed OS, drivers, GUI, and browser. On a 1.44MB floppy.

            http://toastytech.com/guis/qnxdemo.html

            Try telling that to the young people of today [etc].

      3. AndrueC Silver badge
        Boffin

        Re: Obvious bull

        File system compression usually isn't the way to go these days as disk is relatively cheap and because it can have a high overhead.

        Ah but in some scenarios file compression can improve performance. If you are disk bound rather than CPU bound you can trade spare CPU cycles for I/O instructions. I do that with my source code because Visual Studio is usually held up by the disk more than the CPU. And the NTFS compression algorithm is actually byte-based, not nibble based. It's almost ideally suited to source code.

        1. Anonymous Coward
          Anonymous Coward

          "file compression can improve performance"

          But then you would do it at the application level because it does know the internal file format and can use the best compression strategy - i.e. a database engine which does compress its pages.

          A generic compression strategy probably would not help much but for sequential access - like reading a text file for display, and rewriting it fully for any change. If you need more random and granular access, you need to move the compression to the application.

    10. Anonymous Coward
      Anonymous Coward

      Re: Obvious bull

      Right. Because you couldn't have possibly included (de)compression code for both algorithms in all versions of the OS, and you couldn't have possibly used an extra bit or byte in the volume descriptor or in the directory entries of files to signal which particular method was used to compress them.

      Yes, that must be why I can't use Zip or Rar files on a different PC than the one that created them.

      1. Anonymous Coward
        Anonymous Coward

        Re: Obvious bull

        Read the first article of that series (there's a link in the Chen post). On-the-fly file/filesystem compression needs to works differently from an "archive" format like zip. You have time constraints, and you need to be able to read/write blocks of the file without the need to decompress and then re-compress it fully. Try random access to a zip (or rar) compressed file...

      2. toughluck

        Re: Obvious bull

        Yes, that must be why I can't use Zip or Rar files on a different PC than the one that created them.

        Stop using floppy disks.

    11. patrickstar

      Re: Obvious bull

      Funny how none of the assume-everything-MS-does-is-crap crowd has actually read the article. Where it is clearly explained that, yes, it supports multiple formats, but also that certain performance requirements had to be met by all of them on all architectures.

      Reason having more to do with bounds on total time spent in the kernel when reading/writing data rather than perceived performance. This is not a design decision you can reasonably comment on without having a full view of the actual, intended, and possible uses of NT on various architectures in various configurations at that time - which you don't, and MS presumably did.

      1. toughluck

        Re: Obvious bull

        @patrickstar:

        Funny how none of the assume-everything-MS-does-is-crap crowd has actually read the article.(...)

        This is not a design decision you can reasonably comment on without having a full view of the actual, intended, and possible uses of NT on various architectures in various configurations at that time - which you don't, and MS presumably did.

        Well, assume everything MS does is crap and you arrive at the conclusion that they also didn't have that full view.

  2. Anonymous Coward
    Anonymous Coward

    The biggest problem with the alpha chip was yield

    I have worked with production engineers who worked on the Alpha chip. 35% on a good day is not cost effective.

    1. Arthur the cat Silver badge

      Re: The biggest problem with the alpha chip was yield

      As I recall, the first Alpha chips also had a bottleneck in the memory access pathways. Once the data and code had been loaded into cache it went blindingly fast (for the time), but filling the caches or writing dirty caches back ran like an arthritic three legged donkey.

      1. Anonymous Coward
        Anonymous Coward

        Re: bottleneck in the memory access pathways.

        "the first Alpha chips also had a bottleneck in the memory access pathways. Once the data and code had been loaded into cache it went blindingly fast (for the time),"

        Citation welcome. See e.g. McCalpin streams memory performance benchmark(s) from the era.

        Bear in mind also that in general, early Alpha system designs (with the exception of 21066/21068, see below) could be based on 64bit-wide path to memory, or 128bit. 128bit was generally faster. The presence or absence of ECC in the main memory design could also affect memory performance.

        You *may* be thinking of the 21066/21068 chips. These were almost what would now be called a "system on chip" - a 21064 first-generation Alpha core (as you say, blindingly fast for its time), and pretty much everything else needed for a PC of the era ("northbridge","southbridge", junk IO), all on one passively-coolable 166MHz or 233MHz chip. Just add DRAM. Even included on-chip VGA. In 1994. I'll say that again: in 1994.

        Unfortunately it had a seriously bandwidth-constrained DRAM interface, which was a shame.

        The 21066/21068 were used in a couple of models of DEC VMEbus boards and the DEC Multia ultra-small Windows NT desktop, which was later sold as the "universal desktop box", because it could run NT (supported), OpenVMS (worked but unsupported), or Linux (customer chooses supported or not). The 21066/21068 weren't used in any close-to-mainstream Alpha systems, not least because of the performance issues.

        Alternatively, someone may be (mis?)remembering that early Alpha chips also didn't directly support byte write instructions and the associated memory interface logic, which meant that modifying one byte in a word was a read/modify/rewrite operation. I wonder if this is confusing the picture here.

        The compilers knew how to hide this byte-size operation, but in code that used a lot of byte writes, the impact was sometimes visible, which was why hardware support was added quite quickly to the next generation of Alpha designs, and DEC's own compilers were changed to make the hardware support accessible).

        As for Mr Chen's original Alpha-related comments: (a) it's hearsay (b) it's somewhat short of logic (at least re hardware constraints), and even shorter on hardware facts.

        References include:

        Alpha Architecture Handbook (generic architecture handbook, freely downloadable) see e,g,

        https://www.cs.arizona.edu/projects/alto/Doc/local/alphahb2.pdf

        Alpha Architecture Reference Manuals (2nd edition or later, if you want the byte-oriented stuff)

        Digital Technical Journal article on the 21066:

        http://www.hpl.hp.com/hpjournal/dtj/vol6num1/vol6num1art5.pdf

    2. cageordie

      Re: The biggest problem with the alpha chip was yield

      Take a look at the early days of any technology and yield is poor. Right now I work on a next generation Flash architecture, almost nothing about it works. It's incredibly poor compared to current Flash. But next year it will be in the stores. That's just how development goes at the cutting edge, and that's what the Alpha once was.

      1. toughluck

        Re: The biggest problem with the alpha chip was yield

        @cageordie: It depends. These days everyone sort of expects a new node to just work and to get 20%+ yields even from the first wafers. And 35% isn't bad as far as yields go.

        Plus, even cutting edge process technology doesn't always mean starting with nothing, although it depends on the specific design.

        Compare AMD's 4770 and Evergreen series to Nvidia's Fermi.

        AMD first did a midrange chip to learn and understand TSMC's 40 nm process node and got good yields before going for larger chips.

        Nvidia decided to go for a large chip as their first 40 nm design and got lousy yield, basically all of their GF100 chips effectively yielded 0% because they needed to fuse off portions of each manufactured chip. Starting with a large chip meant it was harder for them to understand the process, and the problem repeated when working on smaller GF104/106/108 chips. It still didn't bring them down at all.

      2. Anonymous Coward
        Anonymous Coward

        Re: The biggest problem with the alpha chip was yield

        "That's just how development goes at the cutting edge, and that's what the Alpha once was."

        Correct. As I mentioned earlier, the 21066/21068 "Low Cost Alpha" chips had a disappointing memory interface performance. One of the reaons for that was packaging, Chip and board designers hadn't got round to surface mount chips with high pin density (this was in the early Pentium era) and to keep costs down the package had to be small, which kept the number of pins unrealistically low.

        Today if someone were to try a similar trick with SMD packaging, there'd be no priblem. Heck every smartphone for years has used SMD packaging on its SOC. But back then it wasn't an option.

  3. martinusher Silver badge

    This explanation explains everything

    There's only one way to describe this explanation about software design -- the programmers are crap, they haven't a clue how to layer software design. But then we more or less guessed this might be the case.

    Incidentally, the x86 isn't a particularly efficient architecture, its just very well developed so it can 'brute force' its way to performance. A wide machine word is also meaningless because unlike more efficient architectures x86 system are fundamentally 8 bit (16 if you want to be generous) so support misaligned / cross boundary code and data accesses.

    1. Brewster's Angle Grinder Silver badge

      Re: This explanation explains everything

      These days, cross boundary access is a performance boost: it means you can pack data structures tighter so the data is more likely to be in the cache.

    2. patrickstar

      Re: This explanation explains everything

      There is only one explanation for your comment - you didn't read the actual, linked, article before posting it.

      Also that you have no experience with the design or implementation of the NT (Windows) kernel, subsystems, drivers or filesystem. As they are all very well layered in precisely the way you just claimed the authors were unable to.

      1. AndrueC Silver badge
        Boffin

        Re: This explanation explains everything

        At first sight maybe almost too well layered. When I first read Inside Windows NT many years ago I was amazed how modular things were and how object oriented the kernel is. In fact as a young software developer in the early 00s I kind of thought that was why NT performance was a bit poor. It felt to me like too much overhead and baggage for an OS.

        A common discussion around that time was whether C++ sacrificed performance compared to C because of it's OOP features. And here was an operating system that used encapsulation, had methods and even ACLs for internal data structures.

        Quite an eye opener for someone used to working with MSDOS :)

  4. Nicko

    Thanks for the memories...

    Still got an AXP 433 NT4 workstation in my workshop loft somewhere - in its day, it was an absolute beast. ISTR that MS had a whole building on DEC's campus (or was it the other way round), just to make sure that the Windows builds could be processed quickly - forget not that Dave Cutler, one of the (lead) progenitors of NT, OpenVMS (or just plain "VMS" [previously "Starlet"] as it was at the start) and was heavily involved in the Prism project, one of several that eventually led to the development of the AXP. He also drove the NT port to AXP and later, to AMD architectures.

    He got around a bit. The only downside was the truly abysmal DEC C compiler, which he co-authored. I had to use that for a while and it was a dog - Cutler admitted none of them had written a compiler before, and it showed.

    Anyone want a lightly-used AXP-433 - was working when last used in about 1999...

    1. Anonymous Coward
      Anonymous Coward

      Re: Thanks for the memories...

      We had a dual-boot box (VMS and NT) and I still have it (wrapped in a nice plastic bag). Having lost the password to the latter and no media, I overwrote it with NetBSD. One day, I will reactivate my Hobbyist llicence for OpenVMS...

  5. Anonymous Coward
    Anonymous Coward

    So why not create a new v2 compression scheme?

    How often are you going to remove an NTFS drive from a Windows machine and try to install it in an older Windows machine? Push out patches for all supported versions of Windows (7+, Server 2008+) to understand the v2 compression, provide an option to force usage of v1 if you want to be sure you can remove the drive and put it in an outdated Windows machine and problem solved.

    If Microsoft thought this way with everything they'd still be defaulting to the original 8.3 FAT filesystem...

    1. Adam 1

      Re: So why not create a new v2 compression scheme?

      When you burn a CD you get to choose whether to support multiple sessions on the disc to allow subsequent changes or whether to burn as a single finalised session for compatibility.

      Very good compression with ultra low CPU overhead algorithms exist. The only reason I can see for wanting to avoid it would be for more efficient deduping.

    2. Ken Hagan Gold badge

      Re: So why not create a new v2 compression scheme?

      The "v2" question is addressed in the blog and the answer is quite simple: who gives a flying fuck about disc compression these days? You queue up the I/O and wait for DMA to deliver the data and then re-schedule the thread. Meanwhile, there's 101 other things the CPU can be doing.

      Back in the 80s and 90s it probably meant something because: (i) there weren't 1001 other threads in the waiting queue, and (ii) the disc access probably wasn't a simple case of "send a few bytes to the controller and wait". Both factors meant that the CPU was probably kicking its heels whilst the I/O happened, so reading less data and burning the wasted cycles on decompression was a win.

      These days, file-system compression is just making work for yourself (the compression) so that you can force yourself to do other work (the decompression) later.

      1. Roger Varley

        Re: So why not create a new v2 compression scheme?

        and not to forget the eye watering cost of hard drives in those days as well

        1. Wensleydale Cheese

          Re: So why not create a new v2 compression scheme?

          "and not to forget the eye watering cost of hard drives in those days as well"

          Let's not also forget that the other main argument for disk compression back in the 90s was that less physical I/O was required to access compressed files.

          This was a pretty powerful argument for those of us who weren't particularly short of disk space.

          A by product on NTFS was reduced fragmentation.

      2. Adam 52 Silver badge

        Re: So why not create a new v2 compression scheme?

        "You queue up the I/O and wait for DMA to deliver the data and then re-schedule the thread. Meanwhile, there's 101 other things the CPU can be doing"

        For an interactive system "wait" means poor user experience.

        Processor performance has improved much faster than disc IO. A modern system spends huge amounts of time hammering the disc - that's why we all upgrade to SSDs whenever we get the chance.

        1. Anonymous Coward
          Anonymous Coward

          For an interactive system "wait" means poor user experience.

          Yes, when the main thread blocks waiting for I/O to complete. That's why async I/O, queues, completion ports, and the like are welcome - you say the OS "read/write this data, and notify me when you did, meanwhile I can still interact with the user, for a good experience"

      3. Loud Speaker

        Re: So why not create a new v2 compression scheme?

        Both factors meant that the CPU was probably kicking its heels whilst the I/O happened, so reading less data and burning the wasted cycles on decompression was a win.

        This was, of course, true of Windows up to at least XP.

        It was almost certainly not the case with VMS (I used it, but was not familiar with the code). However, Unix was around from before 1978 (when I first met it) and most definitely could do proper DMA transfers and multi-threading of tasks where the hardware permitted. I know: I wrote tape and disk drivers for it (in the 80's).

        And I am bloody sure that Alpha assembler could do bit twiddling with little more pain that x86. I have written assembler for both. C compilers might have been less good at it.

        The Alpha architecture allows you to write and load microcode routines at run time - so you could implement the bit twiddling instructions yourself and load them when your program loaded. Great for implementing database joins, etc. Of course, you have to know what you are doing to write microcode. This might be the real problem.

        1. Wensleydale Cheese

          Re: So why not create a new v2 compression scheme?

          "And I am bloody sure that Alpha assembler could do bit twiddling with little more pain that x86. I have written assembler for both. C compilers might have been less good at it."

          IIRC with the move from VAX to Alpha, the Macro assembler became a compiler rather than a simple assembler.

          Data alignment was also important on Alpha: MSDN article "Alignment Issues on Alpha".

      4. Brewster's Angle Grinder Silver badge

        Re: So why not create a new v2 compression scheme?

        "...the disc access probably wasn't a simple case of "send a few bytes to the controller and wait"..."

        Ah, yes, the joys of "Programmed IO" (PIO) -- "rep insb", "rep insw", "rep outsb" and "rep outsw"; your single threaded processor would be tied up shunting sectors through a port, byte by byte.

    3. Mage Silver badge

      Re: So why not create a new v2 compression scheme?

      Indeed. NTFS itself has already different versions. I think NT3.51 can't read an XP / Win 2K format. I forget when the major change was.

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like