back to article Windows is now built on Git, but Microsoft has found some bottlenecks

Microsoft has adopted Git to manage the vast collection of code that is Windows' source, and has shared performance issues it's had to fix along the way. The state-of-the-nation report for what Microsoft calls the “largest Git repo on the planet” follows on from its launch of the “fat Git repo” handler, the Git Virtual File …

Page:

  1. Anonymous Coward
    Anonymous Coward

    GVFS sounds super dumb

    Help me out here, what's the point? It sounds like Git's antiparticle.

    1. Voland's right hand Silver badge

      Re: GVFS sounds super dumb

      It smells of ClearCase.

      I frankly, do not see the point.

      1. stephanh

        Re: GVFS sounds super dumb

        There are tons of companies which store source to all their products, and presumably their tax returns and porn stash, in a single giant repo. Perforce and ClearCase actually encourage such a way of working.

        Now those companies may want to use "git", but of course not to the extent that they would change their way of working and split up their repo a bit. So now they can buy "Microsoft git" which presumably has some token integration with actual git, but for the 30gb repo support you have to use Visual Studio and not the normal git client.

        I suppose "embrace and extend" is still a thing at MS.

        1. 9Rune5

          Re: GVFS sounds super dumb

          "but for the 30gb repo support you have to use Visual Studio and not the normal git client."

          First of all, you can check out the gvfs source code yourself on github

          Secondly, VS 2017 simply uses git.exe to do git related tasks. VS 2015 took a different approach, putting all the git functionality inside a dll, which was probably convenient at the time but ate quite a lot of memory (the VS team's biggest sin is ignoring 64-bit support for over a decade now).

          AFAICT gvfs is simply a layer under git that allows the developers to avoid pulling in the entire repository. Few developers are likely to touch the entire code base, yet the build servers probably need the whole thing.

          OTOH, according to the github page, the latest version of Windows 10 *is* a requirement. So some OS support seems to be needed for this to work. I have no idea if this can be ported to other operating systems.

          Is it more feasible to force the build servers into pulling thousands of repositories at build time? It would surprise me if the answer is 'yes'.

          1. Jamie Jones Silver badge
            Joke

            Re: GVFS sounds super dumb

            I don't know what's wrong with:

            cp file file.old

            ...

            mv file.old file.older; cp file file.old

            ...

            mv file.older file.older-still; mv file.old file.older; cp file file.old

            etc

            1. Bronek Kozicki

              Re: GVFS sounds super dumb

              I think someone at Microsoft missed "sparse checkout" feature in git

              1. gnasher729 Silver badge

                Re: GVFS sounds super dumb

                "I think someone at Microsoft missed "sparse checkout" feature in git"

                They missed the "sparse clone" feature. I must have missed it as well.

            2. oldcoder

              Re: GVFS sounds super dumb

              Space for one. Each cp duplicates.

              No auditing for another. git tracks changes, who made them when, and why.

              Very important when you have 100s or thousands or 10s of thousands of updates going on.

              1. gnasher729 Silver badge

                Re: GVFS sounds super dumb

                Now oldcoder, your comment is really, really dumb.

                git is running. All the time. All the change tracking is happening. If you ever need the information, it will be downloaded when you need it. When you don't need the information, it's not downloaded. It's still all there in git.

                You have 100 teams working on different things. They all use one git repository. Everything any team member looks it is always there - only the things they don't look at are not downloaded.

            3. hplasm
              Holmes

              Re: GVFS sounds super dumb

              "I don't know what's wrong with:

              cp file file.old"

              Remember what you're dealing with...

              C:\>cp file file.old

              'cp' is not recognized as an internal or external command,

              operable program or batch file.

              1. This post has been deleted by its author

              2. h4rm0ny

                Re: GVFS sounds super dumb

                >>'cp' is not recognized as an internal or external command,

                I don't know when the last time you used Windows was, but "cp" works. Open up Powershell and try it.

                1. Orv Silver badge

                  Re: GVFS sounds super dumb

                  Powershell is the bee's knees. I was late to come around to it, but I'll never use cmd.exe again.

                  1. david 12 Silver badge

                    Re: GVFS sounds super dumb

                    >Powershell is the bee's knees. I was late to come around to it, but I'll never use cmd.exe again.<

                    And look how many keystrokes I can save by typing 'cp' instead of some verbose COBOL crap like 'copy', intended to make scripting 'easy', so that 'you don't need to be a dev to do scripting'.

                    As if making readable scripts ever worked. That's the problem with readable scripts: it makes people think that anyone can do it.

        2. This post has been deleted by its author

          1. Orv Silver badge

            Re: GVFS sounds super dumb

            To be honest, I still don't understand the fascination with Git. The only thing it has going for it (that I can see) is it's free.

            Cult of St. Torvalds, I think.

            My impression of git is that it feels like something a programmer whipped up in a week or so to scratch an immediate itch, without any thought to user-friendliness or scaling. Which is of course exactly how it originated. He needed to get off BitKeeper ASAP when their license terms became onerous, so he threw something together.

            1. Bronek Kozicki

              Re: GVFS sounds super dumb

              "My impression of git is that it feels like something a programmer whipped up in a week or so to scratch an immediate itch, without any thought to user-friendliness or scaling"

              It scales better than SVN and the design is pretty neat - if you bother to understand it. Which takes some effort, as it is indeed quite unconventional (think two-dimensional hierarchy, where one dimension are files and other is commit history). However you make a good point that it was indeed whipped in a hurry, hence upvote.

              1. Lee D Silver badge

                Re: GVFS sounds super dumb

                Give the guy his due.

                He wanted to continue using Bitkeeper. Lots of people in/around Linux used it and paid for it (even if they didn't always have to).

                Then the owner of the company that make Bitkeeper decided to be a twat because someone from Samba fame started to reverse-engineer it's proprietary formats so they could integrate with it.

                He pulled the rug, the software was made unavailable.

                So Linus knocked up an alternative in a few days, that pretty much sent Bitkeeper scrambling and now even Microsoft use it, and Bitkeeper is nowhere to be heard of. Since the very early days, it's been almost entirely other people - including Microsoft - developing git, but you have to admire the way that was done.

                "Okay, you won't play ball any more, despite it being nothing to do with us kernel developers at all? Okay, I'll write an alternative that's more focused on our process, better for us, and does things yours can't. Oh, look, there it is, done. Bye!".

                There aren't many people who can re-write an independent implementation of a large commercial product overnight, that ultimately leads to nobody even touching the other software any more, and Microsoft basing product lines and their entire development process on it.

                1. JLV

                  Re: GVFS sounds super dumb

                  Heres an interview w Linus about it

                  https://www.linux.com/blog/10-years-git-interview-git-creator-linus-torvalds

                  I like git, mostly. I like that it tries to parallel the file system in its use. I like that it makes sense on the command line, doesn't _need_ a daemon and can just be moved by file copies. I am sure that some other version controls do some things better. But it's free, pretty good at what it does and allows you a lot of growth if you want to become expert at it. Subversion never clicked with me, Sourcesafe sucked and Clearcase makes me wonder how its creators feel about creating such a loathed piece of software. So best, in my limited experience, by a long shot

        3. gnasher729 Silver badge

          Re: GVFS sounds super dumb

          You are being daft here. There is no "Microsoft git". There is git, with all the git commands that you know, using a virtual file system on the client. I'm right now making a living using a git repository of 100 MB. These guys have a repository of 300 GB. Without that virtual file system, git can't handle it. I'll congratulate Microsoft for using a very smart approach to a difficult problem.

      2. Anonymous Coward
        Anonymous Coward

        Re: GVFS sounds super dumb

        I too thought of ClearCase when I read this, with rather mixed memories. I do remember working on a large team where ClearMake really came into its own though, pulling in libraries on the fly that others had compiled. I wonder whatever happened to it... but not enough to google and find out.

    2. Michael B.

      Re: GVFS sounds super dumb

      According to the linked article a standard git checkout would take 3 hours and git clone would take more than half a day with their size of codebase. That is what they are trying to get around with GVFS.

    3. Ken Hagan Gold badge

      Re: GVFS sounds super dumb

      I can think of several possible points:

      1) If your git repository is 300GB (perhaps because you have several decades of spaghetti dependencies in there) then you don't want to pull it all in at once. The usual DVCS approach of "grab the repo and party on dude" doesn't scale. (Yes I've heard of re-factoring and technical debt. Apparently, despite re-writing Windows from the ground up with every major release, MS haven't.)

      2) If your toolchain doesn't support git, you need to make it look more like a normal part of the file system, because everyone supports "normal files". So MS have written a filesystem driver that does that. (According to the blog, they intend to ditch this approach in the longer term, in favour of building git support into NTFS. What ... the ... fsck! Can you spell "retrograde"?)

      3) Having done 1 and 2, your next problem is that you don't have all the files locally and still need wire access to the originals, so some kind of proxy might be nice.

      I can see that purists might reckon that all this is solving the wrong problem, but if the Right solution is quickly re-factor 300GB of source code then I can also see that MS might be forgiven for pursuing this approach. When you are up to your nose in shit, opening your mouth to call for help isn't necessarily the thing you do first.

      1. Lee D Silver badge

        Re: GVFS sounds super dumb

        Embrace.

        Extend.

        Extinguish,

        Welcome to step 2.

        Or are we not supposed to dredge that up with "new" Microsoft that's releasing SQL Server for Linux, Visual Studio for Linux, etc.

        It's almost like they want to grab those "developers, developers, developers"...

        1. Jonathan 27

          Re: GVFS sounds super dumb

          SQL Server for Linux is essentially running on a compatiblity layer (line Wine, but not Wine) and Visual Studio for Linux is Xamerin Studio renamed. Microsoft isn't even really attempting to change, they're just pretending they are. Using GIT is just the best option for them at the moment, they've used a lot of source control systems in the past.

      2. Anonymous Coward
        Anonymous Coward

        Re: GVFS sounds super dumb

        ".....despite re-writing Windows from the ground up with every major release, MS haven't."

        <citation please>

        I've never heard this from Microsoft. That would be like saying Linus writes Linux from the ground up with every major release. It's just utterly stupid and incorrect.

      3. Anonymous Coward
        Anonymous Coward

        Re: GVFS sounds super dumb

        > perhaps because you have several decades of spaghetti dependencies in there

        300GB of spaghetti dependencies? There be dragons in there. Spaghetti dragons?

        Great, now I'm hungry...

    4. DrXym

      Re: GVFS sounds super dumb

      The achilles heel for Git is that you must pull ALL the repository in order to use any of the respository. Various ways exist to work around this issue - shallow clones, submodules, subtrees, repo etc. but nothing is very good.

      I suppose the idea for GVFS is that when you do a clone of Windows, you don't transfer 300GB of crap to your machine before you even start. Instead you "clone" and the filesystem looks like the files were fetched but the fs only fetches a file's contents on first read. So if you're working on one DLL with 100 files you don't need to download the gazillion other files in the codebase.

      Clearcase (contender for the worst source control system ever invented) did this too with a thing called a dynamic view. The difference in Clearcase's case was the dynamic view could change while you were using it if someone else committed files to the same view. Enjoy trying to debug problems when header and sources keep changing underneath you.

      At least GVFS would behave like Git in that what you see isn't going to change unless you pull / fetch / merge. I'd like to see how MS intend to open this up outside of themselves though.

      1. stephanh

        Re: GVFS sounds super dumb

        FWIW Clearcase lets you put a "timestamp" in your configspec so you are isolated from other people's check-ins. But yeah, otherwise not a big fan of the complexity of Clearcase. It can work with 300GB repos, though.

      2. macjules

        Re: GVFS sounds super dumb

        Well, I suppose that you could just clone from a shallow depth, such as git clone --depth=10, or from the last stable release.

      3. JLV
        Trollface

        Re: GVFS sounds super dumb

        >Clearcase (contender for the worst source control system ever

        Contender? You're being unduly generous and magnanimous. More like so far ahead that no one else is in the same game.

        Add to it quite possibly the worst GUI ever inflicted on users. And the crappiest and flakiest backend Windows services.

        1. Mikko

          Re: GVFS sounds super dumb

          Clearcase the worst? Pfeh, I see you never tried Visual SourceSafe. Combine the primitivity of RCS with the complexity of Clearcase - or perhaps it was just the Microsoft's MFC-era designers' capacity to overcomplicate things by exposing the wrong things to the user - and you are close.

          Ful disclaimer: I have only ever encountered VSS briefly because, well... see point above. Clearcase might have caused more problems to the world by luring a team in until it is too late, but we are talking about the worst source control system, not the most evil.

        2. DuncanL

          Re: GVFS sounds super dumb

          @JLV - You've evidently never been forced to work with Rational Synergy...

  2. John Smith 19 Gold badge
    Coat

    So to paraphrase "MS Using Fat Gits for development."

    Yes I can believe that.

    2000 devs. 300GB of code.

    It must be good.

    Never mind the bugs, feel the bigness.

    1. getHandle

      Re: So to paraphrase "MS Using Fat Gits for development."

      "feel the biglyness" - ftfy

      1. John Smith 19 Gold badge
        Headmaster

        ""feel the biglyness"

        I stand corrected.

      2. nijam Silver badge

        Re: So to paraphrase "MS Using Fat Gits for development."

        "feel the buglyness" - ftfy

  3. localzuk Silver badge

    Monolithic

    Eugh. A single 300GB codebase. That just seems inefficient in so many ways.

    1. Anonymous Coward
      Anonymous Coward

      Re: Monolithic

      No, it's a 300GB repository, not a 300GB code base - which includes all the branches, and large teams like that working on Windows usually do an extensive use of branches, unlike most smaller projects mostly working on a single one, and maybe just using branches only to mark releases and some maintenance.

      1. stephanh

        Re: Monolithic

        That's actually a bit unclear, if the total repo is 300gb or a single branch. But note that git pulls one branch at a time so the relevant number for scalability is the size of a single branch.

        To put into perspective how ridiculously large this is: the source code for the entirety of Debian is about 270GB. And that contains a vast suite of applications: everything from EDA tools to several office suits to multiple browsers to compilers to FPS games. A total of 28 thousand different packages. Windows is big but not that big

        Given this, it is almost a certainty that the 300GB is not just source code. Perhaps it contains the entire build chain. Perhaps they are storing build artefacts in the repo.

        1. John Smith 19 Gold badge
          Thumb Up

          "Perhaps they are storing build artefacts in the repo."

          I guess it depends if you only do "source code" control or "whole version" control.

          It seems likely that they hold everything in there so you can track the code, the compiler settings, the resources and of course the test results.

          As others have noted there will likely be different branches for "Home" "Small Business" "Enterprise" editions as well

          OTOH I'm not so sure that includes Office, Dynamics or the languages.

        2. Anonymous Coward
          Anonymous Coward

          Re: Monolithic

          But Debian doesn't handle, for example, the whole Linux kernel repository, and all its commits/branches, I guess it just pulls some of it. The same is true for other projects, when they are hosted elsewhere and not directly by Debian.

          I have some open source project inside my VCS for libraries and applications I need to build latest versions not directly supported by Debian - but I just pull the stable releases, not the whole commit history.

          Inside the Windows repository there are probably all the version of Windows they need to support (which may stretch down to XP, if it's still on paid support), the upcoming ones, plus the SDKs and related development and build tools.

          Two different businesses, working in a different way.

          1. stephanh

            Re: Monolithic

            Debian redistributes all the source code of all upstream packages. So you can download the entire source of any Debian package using apt. The 270 GB number is the size of all that source code. It is not just the size of Debian patches and build scripts.

        3. Wensleydale Cheese
          Joke

          Typo of the week?

          "the source code for the entirety of Debian is about 270GB. And that contains a vast suite of applications: everything from EDA tools to several office suits to multiple browsers to compilers to FPS games."

          I'd love to be able to modify the behaviour of certain office suits.

          Where do I get the sources?

        4. Anonymous Coward
          Anonymous Coward

          Re: stephanh: Monolithic

          > But note that git pulls one branch at a time ...

          Hmmm, not exactly. When doing a "git clone" (by default) it will grab the entire remote repository (all branches) and set that up locally.

          "One branch" is only what gets pulled after that when you're updating thing (eg "git pull"). And that's only if you've not set up further tracking between your local branches and remote ones (eg: git checkout -b somebranch --track).

          So, "one branch at a time" is kind of yes and no, but mostly not really. ;)

    2. DrXym

      Re: Monolithic

      To add to other comments, I would not be surprised if a very large chunk of that is binary blobs, images, audio etc.

    3. Anonymous Coward
      Anonymous Coward

      Re: Monolithic

      You were thinking one per file?

  4. Anonymous Coward
    Anonymous Coward

    SourceSafe

    What, you mean Visual SourceSafe can't handle this? </joke>

    1. Anonymous Coward
      Anonymous Coward

      @AC

      Yeah, I was about to comment on that myself.

      I think it's a sad display if you're selling items and then don't use them for your own setup. I mean: doesn't that tell us something about the items you're trying to sell us? I'm always very keen on that myself.

      Back in the iPaq days the CEO of Compaq would give speeches and all and what was that one small detail which managed to caught my eye? He didn't use an iPaq, no way: he often used pen and paper to jot down notes. Errr, ok.... So it wasn't that revolutionary product which everyone could use afterall, eh?

      Microsoft, back in the days (1990 - 2000), relied on Unix (Sendmail) to handle all their e-mail. Because Exchange just couldn't handle it, rumor even has it that they had tried to implement it a couple of times but that Exchange completely crashed because it simply couldn't handle the load. Now: in all honesty we need to keep in mind that Exchange was more than an MTA alone, so my example is a little bit flawed, But even so...

      And there are tons of example. When a company tries to sell you a product after which it turns out that they're not using it themselves then I think something isn't quite right with the product ;)

      1. This post has been deleted by its author

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like