Over the past 3 months, we have largely completed the rollout of Git/GVFS to the Windows team at Microsoft.
As a refresher, the Windows code base is approximately 3.5M files and, when checked in to a Git repo, results in a repo of about 300GB. Further, the Windows team is about 4,000 engineers and the engineering system produces 1,760 daily “lab builds” across 440 branches in addition to thousands of pull request validation builds. All 3 of the dimensions (file count, repo size and activity), independently, provide daunting scaling challenges and taken together they make it unbelievably challenging to create a great experience. Before the move to Git, in Source Depot, it was spread across 40+ depots and we had a tool to manage operations that spanned them.
As of my writing 3 months ago, we had all the code in one Git repo, a few hundred engineers using it and a small fraction (<10%) of the daily build load. Since then, we have rolled out in waves across the engineering team.
K.I.S.S. … (keep it simple stupid!)
this is one beast that needs a serious diet…
Agree, feeling lucky being a linux user
However, in general the git fs is a good approach. In git you have the repo database and a working dir checkout. With a git fs you safe the space for the working dir checkout. Furthermore, you could mount multiple branches or revisions without additional costs. So definitely a nice thing to have…
Linux would also be a huge project if it already included a full desktop system instead of just the kernel and cared about compatibility of existing compiled software.
AFAIK that is the total size of the codebase stretching back 20+ years. Also remember that Windows includes much more than e.g. Linux (which is mostly the kernel) and that this includes a lot of variants for e.g. Xbox etc.
But I agree that with a so large a project (projects?) there are more chances for bugs to hide.
Should the tree be pruned of no longer used once in a while to keep it manageable?
IMHO no, having a comprehensive development history is worth a lot.
…no Visual SourceSafe?
MS needs to dump 95% of the developers and rewrite 90% of the code.
Can you also specify which 5% of developers can stay and which 10% of the code is perfect?
Of course not, you just came to troll with a completely useless comment
(good luck rewriting 90% of 300 GB with 5% of the developers!)
The code should never have reached 30GB – let alone 300GB. Code should have been constantly refined instead of adding layer upon layer of crap produced by underpaid outsourced code monkeys.
Carmakers routinely replace every major component in a model over a 20 year period. Nothing gets added until it is thoroughly tested and anything defective gets improved or replaced. A carmaker will spend millions of dollars to save a couple kilograms weight or improve fuel eficiency by 1%.
And how often do carmakers completely rewrite their software?
—————
And how has the churn associated with ALSA -> PulseAudio, X11 -> Wayland, SysV Init -> systemd etc gone over with Linux developers and admins?
Edited 2017-05-25 13:28 UTC
When someone detects they are cheating
Did you have access to the Windows source code in order to strongly assert it is a lot of crap?
Did you receive some complaints about some Windows coder being underpaid?
IMHO, they did a very good job implementing all its backwards compatibility. They could not be so irresponsible in order to let legacy code broken and not running just because some jerks think the newer stuff is always better. They did an amazing job keeping their basic APIs (Win32) and ABIs completely stable.
Raymond Chen explains a lot of decisions taken into the OS to support backwards compatibility in his blog:
https://blogs.msdn.microsoft.com/oldnewthing/
ebasconp,
I used to say this as well, but in the past decade backwards compatibility has become worse. I’ve seen a lot more breakages even in corporate environments where critical business apps stop working. Backwards compatibility is no longer something windows users can brag about, IMHO. Or if they do, it’s less true than it used to be.
The NT series hardware requirements have increased by almost three orders of magnitude in 30 years. That is a sure sign of massive bloat and crappy code.
It is the code including history!!
Just to get an idea: Our RTOS repo needs already 1.1GiBytes (unpacked). Whereas the current branch has only 170MiBytes. And that’s only with a history of 15years!
So 300GB is not really much.
Just as I said, you were just trolling and cannot specify which 5% of developers can stay.
“The code” is clearly not 300 GB of active code. Compiling 300 GB of active code into the 4 GB that is a current Windows install.wim would be an amazing feat! That 300 GB is all the history and also includes things like testfiles which can be far larger than actual code. To give you an idea, I probably will not write more than 1 MB of actual code in my whole life, just like most writers will not write more than 1 MB of actual book in their whole life….but the PDF with our company logo is probably close to that 1 MB already.
Your car comparison is just silly and even false. Just like we do different things with computers compared to 30 years ago (4K full screen video on the internet vs postage stamp local videos and “HTML/CSS/JavaScript vs RTF”), the same is unfortunately true for cars so their weights have gone up and fuel efficiency has gone down: http://www.nytimes.com/2004/05/05/business/average-us-car-is-tippin… and that wasn’t so much related to technical features but very much linked to oil/gas prices and comfort!
I’d say 1MB isn’t that much. For example my last C++ project representing less than a year of coding time is at 701k. This is just source code with no media and excluding 3rd party libraries.
Well, dumping an arbitrary number of developers and rewriting an arbitrary amount of code CERTAINLY won’t introduce any new bugs.
You must be an MBA.
How the fuck can a project blow out so much that it literally requires 1000x the hardware resources, probably 100x the developers and still be arguably less stable than it was 30 years ago?
In any other engineering discipline you would eventually realise you have a disaster, cancel the project and start with a clean sheet.
https://en.wikipedia.org/wiki/Windows_NT#Hardware_requirements
As you can see you can blame a lot on Vista, but there is no “1000x times increase” over the last 30 years and in the last 10 years there has basically been no increase in hardware requirements at all even though hardware DID actually get faster. Conclusion: no absolute bloat, less relative bloat, happier users!
Also, comparing NT 3.1 to “10” is pretty much like comparing an arrow to a machine gun. All that extra code and used resource didn’t just go to waste and a “10” machine now is a whole lot cheaper than a NT “3.1” machine used to be
The Google codebase includes approximately one billion files and has a history of approximately 35 million commits spanning Google’s entire 18-year existence.
The repository contains 86TB of data, including approximately two billion lines of code in nine million unique source files.
https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billi…
Not in one repository.
The linked article says otherwise. It’s even in the URL.
That’s not using Git though.
Why do you put it in one repo ?
That’s not how you should use git.
There are arguments for having all of your software components in one repo.
Facebook does it: https://goo.gl/yMT9ZY
I do not necessarily agree with this approach but it’s actually no more right or wrong than using multiple repos.
It’s not the smart thing to do, it even seems even more silly because git allows for sub-modules and they clearly already had thing split up.
https://www.youtube.com/watch?v=4XpnKHJAok8&t=43m11s
Lennie,
That’s a very insightful link. I hadn’t seen it before, but in summary Torvalds thinks putting everything in one repository is “not very smart”.
I’d point out that it’s not actually an unreasonable thing to do in principal, but that git itself doesn’t scale the same way as other systems do. Torvalds sort of acknowledges this as well.
Not needing such a tool anymore would probably be a good enough reason for this change
Edited 2017-05-29 09:54 UTC
This is basically a very long revision history of the equivalent of: “Linux kernel+ Xorg + Gnome/KDE + systemd +pulseaudio + openssl + libpng + webkit + gtk/Qt + and many others”.
300GB isn’t too far from what you may expect from the sum of many projects under one company’s development infrastructure. If they are keeping all the history since Windows 2000, it is likely around 120GB in compressed revision history and 180GB in fully pulled files.
Basically 90% of Windows development is done by Gits?
I already knew that.