back to article Windows is now built on Git, but Microsoft has found some bottlenecks

Microsoft has adopted Git to manage the vast collection of code that is Windows' source, and has shared performance issues it's had to fix along the way. The state-of-the-nation report for what Microsoft calls the “largest Git repo on the planet” follows on from its launch of the “fat Git repo” handler, the Git Virtual File …

  1. Anonymous Coward
    Anonymous Coward

    GVFS sounds super dumb

    Help me out here, what's the point? It sounds like Git's antiparticle.

    1. Voland's right hand Silver badge

      Re: GVFS sounds super dumb

      It smells of ClearCase.

      I frankly, do not see the point.

      1. stephanh

        Re: GVFS sounds super dumb

        There are tons of companies which store source to all their products, and presumably their tax returns and porn stash, in a single giant repo. Perforce and ClearCase actually encourage such a way of working.

        Now those companies may want to use "git", but of course not to the extent that they would change their way of working and split up their repo a bit. So now they can buy "Microsoft git" which presumably has some token integration with actual git, but for the 30gb repo support you have to use Visual Studio and not the normal git client.

        I suppose "embrace and extend" is still a thing at MS.

        1. 9Rune5

          Re: GVFS sounds super dumb

          "but for the 30gb repo support you have to use Visual Studio and not the normal git client."

          First of all, you can check out the gvfs source code yourself on github

          Secondly, VS 2017 simply uses git.exe to do git related tasks. VS 2015 took a different approach, putting all the git functionality inside a dll, which was probably convenient at the time but ate quite a lot of memory (the VS team's biggest sin is ignoring 64-bit support for over a decade now).

          AFAICT gvfs is simply a layer under git that allows the developers to avoid pulling in the entire repository. Few developers are likely to touch the entire code base, yet the build servers probably need the whole thing.

          OTOH, according to the github page, the latest version of Windows 10 *is* a requirement. So some OS support seems to be needed for this to work. I have no idea if this can be ported to other operating systems.

          Is it more feasible to force the build servers into pulling thousands of repositories at build time? It would surprise me if the answer is 'yes'.

          1. Jamie Jones Silver badge
            Joke

            Re: GVFS sounds super dumb

            I don't know what's wrong with:

            cp file file.old

            ...

            mv file.old file.older; cp file file.old

            ...

            mv file.older file.older-still; mv file.old file.older; cp file file.old

            etc

            1. Bronek Kozicki

              Re: GVFS sounds super dumb

              I think someone at Microsoft missed "sparse checkout" feature in git

              1. gnasher729 Silver badge

                Re: GVFS sounds super dumb

                "I think someone at Microsoft missed "sparse checkout" feature in git"

                They missed the "sparse clone" feature. I must have missed it as well.

            2. oldcoder

              Re: GVFS sounds super dumb

              Space for one. Each cp duplicates.

              No auditing for another. git tracks changes, who made them when, and why.

              Very important when you have 100s or thousands or 10s of thousands of updates going on.

              1. gnasher729 Silver badge

                Re: GVFS sounds super dumb

                Now oldcoder, your comment is really, really dumb.

                git is running. All the time. All the change tracking is happening. If you ever need the information, it will be downloaded when you need it. When you don't need the information, it's not downloaded. It's still all there in git.

                You have 100 teams working on different things. They all use one git repository. Everything any team member looks it is always there - only the things they don't look at are not downloaded.

            3. hplasm
              Holmes

              Re: GVFS sounds super dumb

              "I don't know what's wrong with:

              cp file file.old"

              Remember what you're dealing with...

              C:\>cp file file.old

              'cp' is not recognized as an internal or external command,

              operable program or batch file.

              1. This post has been deleted by its author

              2. h4rm0ny

                Re: GVFS sounds super dumb

                >>'cp' is not recognized as an internal or external command,

                I don't know when the last time you used Windows was, but "cp" works. Open up Powershell and try it.

                1. Orv Silver badge

                  Re: GVFS sounds super dumb

                  Powershell is the bee's knees. I was late to come around to it, but I'll never use cmd.exe again.

                  1. david 12 Silver badge

                    Re: GVFS sounds super dumb

                    >Powershell is the bee's knees. I was late to come around to it, but I'll never use cmd.exe again.<

                    And look how many keystrokes I can save by typing 'cp' instead of some verbose COBOL crap like 'copy', intended to make scripting 'easy', so that 'you don't need to be a dev to do scripting'.

                    As if making readable scripts ever worked. That's the problem with readable scripts: it makes people think that anyone can do it.

        2. This post has been deleted by its author

          1. Orv Silver badge

            Re: GVFS sounds super dumb

            To be honest, I still don't understand the fascination with Git. The only thing it has going for it (that I can see) is it's free.

            Cult of St. Torvalds, I think.

            My impression of git is that it feels like something a programmer whipped up in a week or so to scratch an immediate itch, without any thought to user-friendliness or scaling. Which is of course exactly how it originated. He needed to get off BitKeeper ASAP when their license terms became onerous, so he threw something together.

            1. Bronek Kozicki

              Re: GVFS sounds super dumb

              "My impression of git is that it feels like something a programmer whipped up in a week or so to scratch an immediate itch, without any thought to user-friendliness or scaling"

              It scales better than SVN and the design is pretty neat - if you bother to understand it. Which takes some effort, as it is indeed quite unconventional (think two-dimensional hierarchy, where one dimension are files and other is commit history). However you make a good point that it was indeed whipped in a hurry, hence upvote.

              1. Lee D Silver badge

                Re: GVFS sounds super dumb

                Give the guy his due.

                He wanted to continue using Bitkeeper. Lots of people in/around Linux used it and paid for it (even if they didn't always have to).

                Then the owner of the company that make Bitkeeper decided to be a twat because someone from Samba fame started to reverse-engineer it's proprietary formats so they could integrate with it.

                He pulled the rug, the software was made unavailable.

                So Linus knocked up an alternative in a few days, that pretty much sent Bitkeeper scrambling and now even Microsoft use it, and Bitkeeper is nowhere to be heard of. Since the very early days, it's been almost entirely other people - including Microsoft - developing git, but you have to admire the way that was done.

                "Okay, you won't play ball any more, despite it being nothing to do with us kernel developers at all? Okay, I'll write an alternative that's more focused on our process, better for us, and does things yours can't. Oh, look, there it is, done. Bye!".

                There aren't many people who can re-write an independent implementation of a large commercial product overnight, that ultimately leads to nobody even touching the other software any more, and Microsoft basing product lines and their entire development process on it.

                1. JLV

                  Re: GVFS sounds super dumb

                  Heres an interview w Linus about it

                  https://www.linux.com/blog/10-years-git-interview-git-creator-linus-torvalds

                  I like git, mostly. I like that it tries to parallel the file system in its use. I like that it makes sense on the command line, doesn't _need_ a daemon and can just be moved by file copies. I am sure that some other version controls do some things better. But it's free, pretty good at what it does and allows you a lot of growth if you want to become expert at it. Subversion never clicked with me, Sourcesafe sucked and Clearcase makes me wonder how its creators feel about creating such a loathed piece of software. So best, in my limited experience, by a long shot

        3. gnasher729 Silver badge

          Re: GVFS sounds super dumb

          You are being daft here. There is no "Microsoft git". There is git, with all the git commands that you know, using a virtual file system on the client. I'm right now making a living using a git repository of 100 MB. These guys have a repository of 300 GB. Without that virtual file system, git can't handle it. I'll congratulate Microsoft for using a very smart approach to a difficult problem.

      2. Anonymous Coward
        Anonymous Coward

        Re: GVFS sounds super dumb

        I too thought of ClearCase when I read this, with rather mixed memories. I do remember working on a large team where ClearMake really came into its own though, pulling in libraries on the fly that others had compiled. I wonder whatever happened to it... but not enough to google and find out.

    2. Michael B.

      Re: GVFS sounds super dumb

      According to the linked article a standard git checkout would take 3 hours and git clone would take more than half a day with their size of codebase. That is what they are trying to get around with GVFS.

    3. Ken Hagan Gold badge

      Re: GVFS sounds super dumb

      I can think of several possible points:

      1) If your git repository is 300GB (perhaps because you have several decades of spaghetti dependencies in there) then you don't want to pull it all in at once. The usual DVCS approach of "grab the repo and party on dude" doesn't scale. (Yes I've heard of re-factoring and technical debt. Apparently, despite re-writing Windows from the ground up with every major release, MS haven't.)

      2) If your toolchain doesn't support git, you need to make it look more like a normal part of the file system, because everyone supports "normal files". So MS have written a filesystem driver that does that. (According to the blog, they intend to ditch this approach in the longer term, in favour of building git support into NTFS. What ... the ... fsck! Can you spell "retrograde"?)

      3) Having done 1 and 2, your next problem is that you don't have all the files locally and still need wire access to the originals, so some kind of proxy might be nice.

      I can see that purists might reckon that all this is solving the wrong problem, but if the Right solution is quickly re-factor 300GB of source code then I can also see that MS might be forgiven for pursuing this approach. When you are up to your nose in shit, opening your mouth to call for help isn't necessarily the thing you do first.

      1. Lee D Silver badge

        Re: GVFS sounds super dumb

        Embrace.

        Extend.

        Extinguish,

        Welcome to step 2.

        Or are we not supposed to dredge that up with "new" Microsoft that's releasing SQL Server for Linux, Visual Studio for Linux, etc.

        It's almost like they want to grab those "developers, developers, developers"...

        1. Jonathan 27

          Re: GVFS sounds super dumb

          SQL Server for Linux is essentially running on a compatiblity layer (line Wine, but not Wine) and Visual Studio for Linux is Xamerin Studio renamed. Microsoft isn't even really attempting to change, they're just pretending they are. Using GIT is just the best option for them at the moment, they've used a lot of source control systems in the past.

      2. Anonymous Coward
        Anonymous Coward

        Re: GVFS sounds super dumb

        ".....despite re-writing Windows from the ground up with every major release, MS haven't."

        <citation please>

        I've never heard this from Microsoft. That would be like saying Linus writes Linux from the ground up with every major release. It's just utterly stupid and incorrect.

      3. Anonymous Coward
        Anonymous Coward

        Re: GVFS sounds super dumb

        > perhaps because you have several decades of spaghetti dependencies in there

        300GB of spaghetti dependencies? There be dragons in there. Spaghetti dragons?

        Great, now I'm hungry...

    4. DrXym

      Re: GVFS sounds super dumb

      The achilles heel for Git is that you must pull ALL the repository in order to use any of the respository. Various ways exist to work around this issue - shallow clones, submodules, subtrees, repo etc. but nothing is very good.

      I suppose the idea for GVFS is that when you do a clone of Windows, you don't transfer 300GB of crap to your machine before you even start. Instead you "clone" and the filesystem looks like the files were fetched but the fs only fetches a file's contents on first read. So if you're working on one DLL with 100 files you don't need to download the gazillion other files in the codebase.

      Clearcase (contender for the worst source control system ever invented) did this too with a thing called a dynamic view. The difference in Clearcase's case was the dynamic view could change while you were using it if someone else committed files to the same view. Enjoy trying to debug problems when header and sources keep changing underneath you.

      At least GVFS would behave like Git in that what you see isn't going to change unless you pull / fetch / merge. I'd like to see how MS intend to open this up outside of themselves though.

      1. stephanh

        Re: GVFS sounds super dumb

        FWIW Clearcase lets you put a "timestamp" in your configspec so you are isolated from other people's check-ins. But yeah, otherwise not a big fan of the complexity of Clearcase. It can work with 300GB repos, though.

      2. macjules

        Re: GVFS sounds super dumb

        Well, I suppose that you could just clone from a shallow depth, such as git clone --depth=10, or from the last stable release.

      3. JLV
        Trollface

        Re: GVFS sounds super dumb

        >Clearcase (contender for the worst source control system ever

        Contender? You're being unduly generous and magnanimous. More like so far ahead that no one else is in the same game.

        Add to it quite possibly the worst GUI ever inflicted on users. And the crappiest and flakiest backend Windows services.

        1. Mikko

          Re: GVFS sounds super dumb

          Clearcase the worst? Pfeh, I see you never tried Visual SourceSafe. Combine the primitivity of RCS with the complexity of Clearcase - or perhaps it was just the Microsoft's MFC-era designers' capacity to overcomplicate things by exposing the wrong things to the user - and you are close.

          Ful disclaimer: I have only ever encountered VSS briefly because, well... see point above. Clearcase might have caused more problems to the world by luring a team in until it is too late, but we are talking about the worst source control system, not the most evil.

        2. DuncanL

          Re: GVFS sounds super dumb

          @JLV - You've evidently never been forced to work with Rational Synergy...

  2. John Smith 19 Gold badge
    Coat

    So to paraphrase "MS Using Fat Gits for development."

    Yes I can believe that.

    2000 devs. 300GB of code.

    It must be good.

    Never mind the bugs, feel the bigness.

    1. getHandle

      Re: So to paraphrase "MS Using Fat Gits for development."

      "feel the biglyness" - ftfy

      1. John Smith 19 Gold badge
        Headmaster

        ""feel the biglyness"

        I stand corrected.

      2. nijam Silver badge

        Re: So to paraphrase "MS Using Fat Gits for development."

        "feel the buglyness" - ftfy

  3. localzuk Silver badge

    Monolithic

    Eugh. A single 300GB codebase. That just seems inefficient in so many ways.

    1. Anonymous Coward
      Anonymous Coward

      Re: Monolithic

      No, it's a 300GB repository, not a 300GB code base - which includes all the branches, and large teams like that working on Windows usually do an extensive use of branches, unlike most smaller projects mostly working on a single one, and maybe just using branches only to mark releases and some maintenance.

      1. stephanh

        Re: Monolithic

        That's actually a bit unclear, if the total repo is 300gb or a single branch. But note that git pulls one branch at a time so the relevant number for scalability is the size of a single branch.

        To put into perspective how ridiculously large this is: the source code for the entirety of Debian is about 270GB. And that contains a vast suite of applications: everything from EDA tools to several office suits to multiple browsers to compilers to FPS games. A total of 28 thousand different packages. Windows is big but not that big

        Given this, it is almost a certainty that the 300GB is not just source code. Perhaps it contains the entire build chain. Perhaps they are storing build artefacts in the repo.

        1. John Smith 19 Gold badge
          Thumb Up

          "Perhaps they are storing build artefacts in the repo."

          I guess it depends if you only do "source code" control or "whole version" control.

          It seems likely that they hold everything in there so you can track the code, the compiler settings, the resources and of course the test results.

          As others have noted there will likely be different branches for "Home" "Small Business" "Enterprise" editions as well

          OTOH I'm not so sure that includes Office, Dynamics or the languages.

        2. Anonymous Coward
          Anonymous Coward

          Re: Monolithic

          But Debian doesn't handle, for example, the whole Linux kernel repository, and all its commits/branches, I guess it just pulls some of it. The same is true for other projects, when they are hosted elsewhere and not directly by Debian.

          I have some open source project inside my VCS for libraries and applications I need to build latest versions not directly supported by Debian - but I just pull the stable releases, not the whole commit history.

          Inside the Windows repository there are probably all the version of Windows they need to support (which may stretch down to XP, if it's still on paid support), the upcoming ones, plus the SDKs and related development and build tools.

          Two different businesses, working in a different way.

          1. stephanh

            Re: Monolithic

            Debian redistributes all the source code of all upstream packages. So you can download the entire source of any Debian package using apt. The 270 GB number is the size of all that source code. It is not just the size of Debian patches and build scripts.

        3. Wensleydale Cheese
          Joke

          Typo of the week?

          "the source code for the entirety of Debian is about 270GB. And that contains a vast suite of applications: everything from EDA tools to several office suits to multiple browsers to compilers to FPS games."

          I'd love to be able to modify the behaviour of certain office suits.

          Where do I get the sources?

        4. Anonymous Coward
          Anonymous Coward

          Re: stephanh: Monolithic

          > But note that git pulls one branch at a time ...

          Hmmm, not exactly. When doing a "git clone" (by default) it will grab the entire remote repository (all branches) and set that up locally.

          "One branch" is only what gets pulled after that when you're updating thing (eg "git pull"). And that's only if you've not set up further tracking between your local branches and remote ones (eg: git checkout -b somebranch --track).

          So, "one branch at a time" is kind of yes and no, but mostly not really. ;)

    2. DrXym

      Re: Monolithic

      To add to other comments, I would not be surprised if a very large chunk of that is binary blobs, images, audio etc.

    3. Anonymous Coward
      Anonymous Coward

      Re: Monolithic

      You were thinking one per file?

  4. Anonymous Coward
    Anonymous Coward

    SourceSafe

    What, you mean Visual SourceSafe can't handle this? </joke>

    1. Anonymous Coward
      Anonymous Coward

      @AC

      Yeah, I was about to comment on that myself.

      I think it's a sad display if you're selling items and then don't use them for your own setup. I mean: doesn't that tell us something about the items you're trying to sell us? I'm always very keen on that myself.

      Back in the iPaq days the CEO of Compaq would give speeches and all and what was that one small detail which managed to caught my eye? He didn't use an iPaq, no way: he often used pen and paper to jot down notes. Errr, ok.... So it wasn't that revolutionary product which everyone could use afterall, eh?

      Microsoft, back in the days (1990 - 2000), relied on Unix (Sendmail) to handle all their e-mail. Because Exchange just couldn't handle it, rumor even has it that they had tried to implement it a couple of times but that Exchange completely crashed because it simply couldn't handle the load. Now: in all honesty we need to keep in mind that Exchange was more than an MTA alone, so my example is a little bit flawed, But even so...

      And there are tons of example. When a company tries to sell you a product after which it turns out that they're not using it themselves then I think something isn't quite right with the product ;)

      1. This post has been deleted by its author

        1. Roland6 Silver badge

          Re: @AC

          Re: so it's more than likely they're actually using Team Foundation Server which happens to support Git as one of the underlying VCSs.

          The absence of any mention of TFS, gives rise for concern: is this the first indication that firstly MS will be discontinuing TFS support for non-Git VCSs and secondly, does this mean that TFS will become even more of a MS shim over Git?

      2. John Smith 19 Gold badge
        Unhappy

        "Microsoft, back in the days (1990 - 2000), relied on Unix (Sendmail) "

        IIRC they also used to run an AS400 for their warehouse management, back when they were still monopolizing shelves with (mostly) empty boxes.

        Of course now they've had 17 years to integrate the 2 software packages that make up MS Dynamics I'm sure it's up to the job

        Probably.

      3. oldcoder

        Re: @AC

        Crashed exchange servers were very common.

        Specially when it ran out of storage. For some reason it never could seem to send a reject message when it had insufficient space for the message - and crashed instead.

        Took out an entire organizations mail system for about a week with that one - 5 redundant servers, all crashed with the same message, and only required about 15 minutes to do.

        Got accused of attacking their servers... until it was pointed out that the message crashing them came from their own staff sent to our server to forward to theirs, and that the message was requested by one of the managers in their organization. (it happened to be an 8MB photograph of the staff).

        1. Anonymous Coward
          Anonymous Coward

          Re: @AC

          Anybody who runs out of disk space on a server because it's not monitoring it deserves whatever may happen. When you have virtual memory backed by disks, transactions needing to write to log files first, etc., what you don't want is to find yourself with disks full.

          1. Orv Silver badge

            Re: @AC

            Some exchange versions had the charming feature of being able to run out of storage space even if the disk WASN'T full. For example, the Small Business Edition of Exchange 4.0 was limited to 16 GB. If you hit that cap, the server simply keeled over, and you had to use external tools to remove enough messages to get the database down to size.

      4. Orv Silver badge

        Re: @AC

        Microsoft, back in the days (1990 - 2000), relied on Unix (Sendmail) to handle all their e-mail.

        It used to be good practice to put a separate MTA between Exchange and the outside world, because it really wasn't designed to cope with the wild world of the Internet. Exchange 4.0, for example, would only reject mail for invalid users *after* completing the SMTP session, making it a huge source of backscatter spam. I used to run an Exim server that would query the exchange server and then reject mail to invalid accounts before the SMTP session ended. It was also a handy place to do spam filtering.

    2. Anonymous Coward
      Anonymous Coward

      Re: SourceSafe

      Visual SourceSafe has been discontinued many years ago. Anyway, like the article says, it was never much used within Microsoft, SourceDepot was used instead. Anyway, in late 1990s, VSS was still better than nothing - as I saw many teams working without a VCS at all... and its GUI was a good way to make people learning version control, instead of just using clumsy command lines, which usually lead people think version control was difficult, merging very dangerous, etc. etc.

      1. Anonymous Coward
        Anonymous Coward

        Re: SourceSafe

        Visual Sourcesafe got re-branded and hidden inside TFS. It's called TFSVCS..... It's still as broken as bad as ever. It's slightly better at not corrupting itself, but it's got all the limitations it's always had...

        If you are using TFS and didn't pick GIT backend, you are essentially running Sourcesafe with a new UI....

        1. Anonymous Coward
          Anonymous Coward

          Re: SourceSafe

          VSS was essentially a file-based solution that worked through shares - and that was one of its main weaknesses. It was also quite unusable and unreliable over slow connections for that reason.

          It required a quite careful maintenance, avoiding large repositories, and a quite reliable network to minimize issues.

          TFS is database-based, and works through HTTP. I wouldn't be surprised if they re-cycled part of the VSS code, though.

          Not that CVS was that astounding piece of code, back then, and didn't have its own issues. CVSNT was slightly better, but it was essentially a one-man product and delivered its shares of trouble as well.

          More expensive solutions like Perforce were better, but far less available, especially in small companies and small teams.

          Anyway, back in those days even VSS or CVS (SVN would have been available only in 2004) were far better than *no VCS at all* - I saw more than one team just making copies of files to some server shares - the worst situation I encountered was when the shares were on the department manager PC, on a single disk, with no redundancy at all....

      2. iron Silver badge

        Re: SourceSafe

        > Visual SourceSafe has been discontinued many years ago.

        Are you sure of that? There's this little product called Team Foundation Server that is really VSS under the hood. Give me Bazaar, Subversion or if necessary Git over it any day.

        1. Guido Brunetti
          Megaphone

          Re: SourceSafe

          SourceSafe had the same issue as Access: Not being a real server, so relying on the network file system to handle multi-user concurrency - bad idea.

          TFS on the other hand uses a proper SQL Server Database to store things. It has absolutely nothing to do with VSS anymore and in fact Git is now the preferred source control system in TFS. All in all TFS is probably the best and fastest developing ALM tool around today.

          1. Anonymous Coward
            Anonymous Coward

            Re: SourceSafe

            TFS needs to be the fastest, it's years behind other offerings like the atlassian suite of products.

            Issue tracking in TFS 2017 is still horrible, and TFS web UI for pull requests is clunky, ugly and intuitive. The whole lot feels like a quickly lashed up product where the glue is falling apart.

            A well setup JIRA/Bitbucket/Bamboo/Confluence suite whilst might be a little less integrated (although the do integrate pretty well considering they are all stand-alone), will still vastly outperform and outspec TFS, and cost less (and in our experience) have vastly superior uptime and far better user feedback.

        2. d3vy

          Re: SourceSafe

          "Are you sure of that? There's this little product called Team Foundation Server that is really VSS under the hood. Give me Bazaar, Subversion or if necessary Git over it any day."

          VSS was discontinued, the last released version was 2005.

          Saying that TFS is really VSS under the hood is like saying a proton is a lotus under the hood because some small parts are common.

          For a start TFS is not just source control, secondly users can choose a source control provider for TFS (GIT is one of the options)

          That aside TFS does include TFSVCS which probably does share some small parts with vss but to say it's the same is plain wrong.

      3. Roo
        Windows

        Re: SourceSafe

        "Anyway, in late 1990s, VSS was still better than nothing"

        My mileage varied. Back in '98 a PHB decided to tidy up our VSS repo by deleting source to EOL products... Sadly he didn't know that deleting a file in VSS meant that the file, it's entire history and all previous revisions were also deleted. CVS was (and still is) better than that and it costs nothing.

    3. JimmyPage Silver badge
      Unhappy

      Re: SourceSafe

      Possibly the *worst* source control package ever.

      One thing which strikes me - as someone whose job has descended into asking awkward questions of the marketing brigade ...

      Where is the "AI" fairy dust for source control ?

      Even going back 15 years, I was looking for source control systems that understood the semantics of the source they were shepherding, and were able to think not in file terms, but module, procedure and function terms.

      We need a new icon ... "the future has failed us"

      1. Jeremy Lloyd

        Re: SourceSafe

        Plastic SCM's Semantic Merge: http://semanticmerge.com/

    4. Tom 7

      Re: SourceSafe

      Their methodology worries me - I 'touch' thousands of files but I dont get push them back unless I've contributed something I want to actually think is needed.

      1. This post has been deleted by its author

  5. Anonymous Coward
    Anonymous Coward

    I was under the impression Windows was built on sand.

    1. Joseph Haig

      "built on sand"

      Well that's where the silicon comes from so you could say that the whole industry is built on sand.

      1. Anonymous Coward
        Anonymous Coward

        Re: "built on sand"

        >Well that's where the silicon comes from so you could say that the whole industry is built on sand.

        The rest of the industry uses building sand, Microsoft thought they could get away with using cheaper sinking sand.

        1. Anonymous Coward
          Anonymous Coward

          Re: "built on sand"

          > Microsoft thought they could get away with using cheaper sinking sand.

          If you're moving fast enough then even sand is like concrete.

          In Microsoft's case, hopefully the "moving fast enough" is of the terminal impact variety.

  6. Anonymous Coward
    Anonymous Coward

    Useful I guess if your repo is 300gb

    But it does sound like they have some serious structure issues. Have they not heard of git submodules? https://git-scm.com/book/en/v2/Git-Tools-Submodules

    For the real world, standard git is pretty fast, a 3m LOC project, 40 developers, 4 years worth of branch history, pure source, no binaries, a clone from fresh is about 30 seconds, branch switch is instant. This is luxury compared to our previous SCM IBM Synergy, where a clone would take about 3 hours (same network!!!), And a reconfigure -a branch switch (which would usually fail), about 20 minutes (when it failed , you deleted and spent 3 hours checking out a fresh copy).

    GIT and an enterprise wrapper to provide user control, pull requests workflows and structure, it's light years ahead of other offerings. It's also very clear Microsoft are dumping TFSVC (which is essentially sourcesafe) as the backend and moving towards GIT as TFS favored backend.

    1. gnasher729 Silver badge

      Re: Useful I guess if your repo is 300gb

      "But it does sound like they have some serious structure issues. Have they not heard of git submodules? https://git-scm.com/book/en/v2/Git-Tools-Submodules"

      You see, submodules are a hack that is needed because git cannot handle 300 GB repositories. (Actually, it can handle them just fine, but on my work machine I cannot even clone a 300 GB repository, and it would saturate the network for ages if I bought a bigger machine).

      What Microsoft has done is just a clever trick to use the whole complete git, without having to do stupid things like submodules that you don't actually want. Why would you have thousands of developers learn about submodules, getting them right, when they can just work with the whole thing?

      1. Anonymous Coward
        Anonymous Coward

        Re: Useful I guess if your repo is 300gb

        If your repo is 300GB, you are doing something wrong, storing binaries and deltas of binaries or something else silly.

        Linux repository is less than 200MB, that's all sourcecode and all branch history.

        1. Anonymous Coward
          Anonymous Coward

          Re: Useful I guess if your repo is 300gb

          This must be some other linux from some other dimension as the current kernel branch is 184MB (compressed) 730MB uncompressed, not including all branch history, not including all the rest of the OS and all the applications and services and drivers that come with the OS.

        2. Sandtitz Silver badge

          Re: Useful I guess if your repo is 300gb @AC

          "Linux repository is less than 200MB, that's all sourcecode and all branch history."

          Latest kernel 4.11.3 takes about 650MB when unpacked. There are several kernel branches supported concurrently so that 650MB is still way off.

          I'm sure just the Windows kernel itself would be in the same ballpark. That 300GB also contains the user interface, web browsers, all sort of applications that come with a Windows install and so forth. Maybe even bitmaps too. Does it also contain every supported Windows version as well (XP, 7, 8.1, 10) and the different branches for each one? The article didn't mention.

          Perhaps a more proportional comparison would be to check the default installation of Ubuntu (or Mint or some other popular distro intended for general populace) and count the total size of the repos for all those programs, libraries and such.

    2. Orv Silver badge

      Re: Useful I guess if your repo is 300gb

      For the real world, standard git is pretty fast, a 3m LOC project, 40 developers, 4 years worth of branch history, pure source, no binaries, a clone from fresh is about 30 seconds, branch switch is instant.

      The fast switch works because you have to download the entire repository first, so it's really just shuffling files around on local disk. That's pretty brilliant for ~600 MB of kernel source code. It's less brilliant when that means downloading and storing 300 GB on each and every workstation.

      My experience with 'git clone' on a project of any substantial size is that it's best to start it and then go do something else for a while.

      1. Anonymous Coward
        Anonymous Coward

        Re: Useful I guess if your repo is 300gb

        All the branches are of course deltas.

        Our codebase of 1 branch is about 300mb, the history for ALL the deltas for all the branches is about 70mb extra. That 70mb overhead allows me to sit on my boat and work, branch, merge, work totally offline, switch branches to my heart's content and basically at the end of the day push my work.

        Cheers. Feel sorry for the suckers on TFSVCS in their hot cube farm... They have to deal with TFS crap model, and need a permanent connection to work!

  7. Missing Semicolon Silver badge
    Happy

    Re-inventing the wheel

    "“O(modified)” – instead of the number of files read, key commands are proportional to the number of files a user has current, uncommitted edits on"

    Well, that's what Perforce does. The server keeps track of what you have and what you change. It thus only transfers in new/changed files on sync, and only uploads changed files on checkin.

    And guess what "Source Depot" is? MS bought a source license for P4 years ago, and SD is the result.

    So, basically, having gone all agile-y and adopted Git, they are retrofitting it with all the grown-up features that P4/SD had.

    1. oldcoder

      Re: Re-inventing the wheel

      More likely refitted with the grown-up features that git has.

  8. Simon Harris
    Joke

    Microsoft has the biggest Git on the planet...

    I thought this article was going to be about Steve Ballmer.

  9. roytrubshaw
    Coat

    What will Gnome do now...

    ... that GVFS has been hijacked by Microsoft?

    I now have visions of tiny gnomes being plagued by troll-like gits as they go about their busy lives just try to make ends meet...

    I'll get my coat, the one with the Gnome V2 installation

    1. Ken Hagan Gold badge

      Re: What will Gnome do now...

      Since Microsoft are open-sourcing all this, it won't be long before someone ports this new filesystem to Linux. Then, since the trend seems to be to always prefer to run filesystems in user-space, someone will create a gfvs-gvfs package.

      1. Anonymous Coward
        Anonymous Coward

        Re: What will Gnome do now...

        > Since Microsoft are open-sourcing all this, it won't be long before someone ports this new filesystem to Linux.

        Please, someone add a hard dependency on systemd to it!

  10. Frumious Bandersnatch

    premature optimisation

    Is the root of all evil

  11. Kristian Walsh Silver badge

    300 Gb isn't everything...

    Some divisions of Microsoft are not part of this project. Based on a comment in an earlier posting about this migration, the Bing group does use git, but doesn't keep its code in this super-repo. Apparently, they use sub-repos in their setup, and again apparently it has been a major pain in the metaphorical ballsack to keep everything in sync.

    This is the big problem with sub-repos; you end up having to manually keep things in sync, whereas a "one big repo" solution makes it a lot harder for a dev to commit something to the head of their component that won't build against the head of every other component.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like