If you have followed Windows 10 at all during the last few years, you know that the Windows Subsystem for Linux, or WSL for short, is the hot topic among developers. You can finally run your Linux tooling on Windows as a first class citizen, which means you no longer have to learn PowerShell or, god forbid, suffer through the ancient
CMD.EXE
console.Unfortunately, not everything is as rosy as it sounds. I now have to do development on Windows for Windows as part of my new role within Azure… and the fact that WSL continues to be separate from the native Windows environment shows. Even though I was quite hopeful, I cannot use WSL as my daily driver because I need to interact with “native” Windows tooling.
I believe things needn’t be this way, but with the recent push for WSL 2, I think that the potential of an alternate world is now gone. But what do I mean with this? For that, we must first understand the differences between WSL 1 and WSL 2 and how the push for WSL 2 may shut some interesting paths.
I was only vaguely aware of the fact WSL 2 switched to using a virtual machine instead of being an NT subsystem as WSL 1 was. There’s arguments to be made for and against either approach, but the NT subsystem approach just feels nice, more holistic to me – even if it is way more work to keep it in step with Linux.
Link seems broken at time of posting.
Anyway – the WSL1 approach seems cleverer and more native to me – the VM solution is necessarily limited in platforms it can work on, can’t coexist with other virtualization solutions, doesn’t allow them to as deeply integrate with Windows (instead relying on hacks to pass stuff from the VM Linux world to the real Windows world)… it’s more work but it’s a better solution overall.
I lamented the death of Interix too.
Re-implement a bloated kernel on top of another, implement both the existing bugs of said kernel plus gazillion of your own, and don’t even talk about the bottomless pit of undiscovered security holes. To be honest, the product manager who authorized WSL1 sounds like a complete idiot and it is no wonder it’s being replaced in a hurry.
In no way WSL1 sounds like the “better solution overall”, instead it’s a pipedream that might work if Microsoft dedicated half of its Windows team on it, but of course that is not viable.
Don’t forget that NT is Posix, which means that a big part of that API was already implemented.
Also, the WSL1 approach was the original idea in the NT design, more than two decades ago, so this made sense.
I suspect Docker was the dealbreaker for the original approach, they couldn’t figure out a way to add support for that. If people just want to lift and shift their old Unix or Linux server apps to Windows, they can use Cygwin or MKS Toolkit, (etc) as others here have pointed out.
Bottom line is MS made the right call. I don’t know that WSL 1 was a mistake because MS probably learned a ton about the differences between the Linux and Windows kernels particularly with regards to system call semantics, which will be very useful if they’ve managed to capture and store that knowledge.
I always figured Docker (for Azure) was exactly why WSL1 continued to exist past the death of the android-on-windows-mobile efforts. Running docker and other Linux based services on windows core hosts in Azure is quite a thing to be able to boast about.
With Hyper-V increasing being the cure-all to future Windows development though (there’s a bunch of Windows Server features MS has deprecated with “move it to Hyper-V VMs” as the replacement) I’m not really surprised that WSL2 went virtual-machine. Much less “hard” work to do for a “close enough” result.
Except that _Linux_ is not POSIX, and in number of places differs significantly in semantics from how Windows implements POSIX. Yes, worrying about things like threads having their own PIDs sounds stupid, but there are applications that care.
In a sense, when you run Win32 apps on NT family OSes you’re kinda running a reimplemented Win9x “kernel” on top of the NT kernel too (obviously more involved than that but it’s analogous)
NT was built to layer different interfaces between userspace and the real kernel, and WSL1 more closely followed that structure than WSL2 (and Interix followed it even more closely).
Running it as a VM just makes it… well, another virtualization system. sure there’s some nice integration work to hide the fact that it’s a VM but you could do that regardless.
Despite the performance impact, the VM redesign is much better from a security perspective. I was not looking forward to the next decade of WSL-related syscall vulnerabilities. An entire new subsystem would have doubled the attack surface and we have not even achieved “high confidence”for the NT layer yet.
WSL2 will be faster than WSL1 for many development related workloads like cloning large git repositories or building projects with thousands of files because WSL1 inherits NT’s file handling semantics instead of Linux’s. I’ve benchmarked those tasks (same project, same repository, same distro) on the same laptop on WSL1 and Linux in Virtualbox, and Virtualbox was faster.
“which means you no longer have to learn PowerShell or, god forbid, suffer through the ancient CMD.EXE console.”
I know it’s far from perfect, but why do so many of these folks fail to acknowledge that Cygwin has been a thing for what seems like forever?
“but why do so many of these folks fail to acknowledge that Cygwin has been a thing for what seems like forever?”
The reality is you miss a bit of history.
http://www.colinux.org/
This died because they could not get what they need signed by Microsoft. Notice after all the time using Cygwin people went to using a Linux kernel running cooperative with windows. The same compatibility problems that hit Cygwin hit WSL1 and caused colinux in the first place.
https://www.apharmony.com/software-sagacity/2014/06/linux-vs-windows-file-permissions/
Notice this above is before the 2016 release of WSL1. The miss alignment of file system logic well and truly made programs ported with Cygwin at times fail. This is only a tip of a very big iceberg.
Anyone who had done their history of failures with Cygwin vs colinux would have seen that WSL1 was doomed to failure due to there just being way to many differences. Please note NTFS and the new REFS cannot in fact store and process Linux style permissions exactly right. Wine and Samba on the Linux side also run into the problem that you cannot store Windows permissions exactly right on Linux but the incompatibility is not as bad. Linux has the more restrictive permission model its simpler to reduce restrictions than it is add restrictions on file system. Adding restrictions equals lots more overhead.
Even so wine is getting like case insensitive support added to linux native file systems to avoid massive overheads fixing that.
Horrible as it sound Microsoft has two really choices either go the colinux/wsl2 route or start major altering how file systems and other parts of Windows works and hope not to break anything in the process so WSL1 and cygwin can in fact work right all the time.
I use them all cygwin, wsl1 , wsl2, powershell, docker. ect ect. None of them by and of its self is good enough. Cygwin doesn’t have a large number of services I want to run locally, isn’t always up to date, but I do have to say, when nothing else is working cygwin usually does, its really hard to screw up a cygwin install.
For wsl2, it might be perfect eventually, but I keep running into unexplained network issues. I can get it to work fine for a while before it all crashes and network stops working. It might one of the many security suites I have running 0ver reacting and then just not telling me its doing something. I don’t know. its frustrating.
Agree with the general concept that no one solution is perfect, in general I find trying to restrict myself to a single platform just inflicts excessive pain, I’d be a failure as a masochist!
I’ve never understood the platform apparatchiks, happily making their life a misery so they can save on a simple install!
cpcf,
With time, I find that I’m becoming the opposite actually. Not because I believe in the supreme platform, but because my attention is spread so thin that it’s difficult to be a master of all. There’s almost enough platforms, frameworks, databases, etc that I could pick up a new one for every single project but then I’d be at an amateur at everything, haha.
There’s pros and cons everything, but it’s important to me anyways to seek out paths that protect our long term digital independence and freedoms. For better or worse this excludes some platforms right off the bat.
For me WSL was close to becoming my gateway drug into Linux. Not focusing on easy graphical application support with sound disenfranchised some coming from the Windows world. Can’t really say Linux is any better. VMWare removed Unity support so Windows apps can’t act like native apps. WINE continues to be hit or miss.
https://docs.oracle.com/en/virtualization/virtualbox/6.0/user/seamlesswindows.html
VMWare might have dropped Unity mode but virtualbox has not dropped their seamless mode.
Oh, wow! I haven’t checked out Virtualbox in quite awhile. Thank you for the suggestion. Looks like it has some form of 2d and 3d acceleration nowadays. My other beef with virtual machine software is badly synced streaming video. I’ll give this a shot.
Long story short, A person hired to do development on a Microsoft property would rather use Unix system tools to manage a Windows machine. At least one of the five things the person listed isn’t currently possible even with WSL1. The author appears unaware of Cygwin and is dismissive of any modern shells for Windows.
That is so true! All the similar articles stem from ignorance about the possibilities of windows OS and usually come with someone with only knowledge of one OS.
For example, a modern CMD replacement is the yori (https://github.com/malxau/yori) with excellent bat script compatibility.
Powershell has unique pipe capabilities with its advantages and disadvantages. But for the goal that it was created (manage windows system), it is close to perfect.
And of course, the modern shell replacement the Windows Terminal (https://www.microsoft.com/en-us/p/windows-terminal/9n0dx20hk701?activetab=pivot:overviewtab) which already is close to parity with the Linux/iTerm2 terminals. But then, we already had the conememu (https://conemu.github.io/).
The thing is, you have been able to run Linux in VM under Windows for quite a while. WSL2 doesn’t offer anything substantial in this regard. It’s even counter-productive because it requires Hyper-V and consequently you cannot run another hypervisor (VirtualBox’s support for Hyper-V notwithstadning – it sucked big time last I tried it). Instead of WSL2 MS should have specified some virtio-like solution to implement the required functionality and let you run WSL2 with any hypervisor. For me WSL2 is a complete no-go.
WSL1 was novel and worked quite well. It is hampered by Windows’s woefully inadequate file IO performance, but so was Cygwin. The latter was my go-to solution for making Windows bearable but I find myself using WSL1 quite a bit more these days because it does what I need Cygwin for and it does it better. It needs a better terminal window and once I was able to set up the mingw terminal as a terminal for WSL but I’m unable to find the instructions on how that was done. But select-to-copy-and-middle-click-to-paste make life so much easier.
BTW, what happened to coLinux?
Yes you can. VMWare, QEMU and VirtualBox all take advantage new Windows Virtualization Platform features that allow other hypervisors to run along side Hyper-V. Haven’t tried it with VirtualBox, but VMWare and QEMU run really well. You lose the ability to run nested virtual machines, but beyond that it works as expected.
I wrote already that VirtualBox did not work well with Windows Virtualization Platform last time I tried it. And you don’t elaborate on why I should settle for Hyper-V to start with. I don’t want to use Hyper-V, ok? I have a soft spot for something called Open Source. Feel free to not care about it the way I do.
You’re not *really* running VMware/Virtualbox when you use the Virtualization Platform stuff though. The Virtualization Platform is basically HyperV with API hooks. Running VMware Workstation on the WVP essentially means you’re running HyperV with a VMware management UI. There’s a bit more in there to load in vendor-specific disk images and config files, but ultimately it’s Hyper-V doing the virtualization, not VMware/VirtualBox.
(I’m ignoring QEMU because that’s pretty much what QEMU does anyway, at least on Linux historically)
Running VMware/VirtualBox on WVP means you’re stuck with the limitations and bugs of Hyper-V, combined with any issues resulting from the vendor trying to map their implementations to Hyper-V’s.
It’s an okay compatibility hack, but it’s a step backwards overall.
> WSL2 doesn’t offer anything substantial in this regard
I’m not a sysadmin nor a developer, just a user, so I’m missing something – or a lot – for sure here.
But I’m been accustomed to the Linux command line for a couple of decades, and in my perspective WSL gave me what Cygwin and VMs never did: an *easy* and painless way to use the Linux terminal software I’m used to *in Windows*, and with this I mean, I can right-click in Explorer and “open a WSL shell here” and use the fish shell and vifm and find etc. to manage my files, move them, rename them, edit them, convert PDFs, grep text files, and use my fish and bash scripts etc.
“and use the fish shell and vifm and find etc. to manage my files, move them, rename them, edit them, convert PDFs, grep text files, and use my fish and bash scripts etc.”
You can do that with Cygwin. I don’t know of a solution for right-click to “open shell here” but you can navigate the file hierarchy by hand. That’s the primary reason Cygwin is the second thing I install on a fresh Windows (the first is Firefox which I then use for all downloads)
WSL1 does not require constantly tracking the linux kernel ABI (which is a moving target by design)… You track the userland ABI which doesn’t change very much.
WSL2 is quite a regression, even ipv6 networking is totally missing but worked fine in WSL1. If you’re just going to run a linux vm you’ve been able to do so with vmware for 20 years now.
‘Missed opportunities’
What, like malware being able to more easily be cross-platform? I mean, it’s not like there’s an actual issue with Windows malware actually working on Linux systems because of Wine…
All that aside, the simple fact is that WSL 1 could not support what many people actually wanted to do with it. Lack of Docker and/or QEMU support is a major strike against it given modern development workflows, and there are plenty of edge cases where the approach taken by WSL 2 (which, I might add, is actually a proven approach, Docker Desktop has been doing the exact same thing for years now and it works great) actually makes things work correctly more efficiently than would ever be possible with WSL 1 because of differences in semantics of how the system works at a very low level (yes, I know, ‘NT is POSIX compliant’, except that Linux is not and both differ, usually differently from each other and sometimes significantly, from most other POSIX systems in many places where behavior is left up to the implementation).
Could a better approach have been taken? Probably. Were it up to me I most likely would have gone for porting UML to Windows instead of going for a VM (that would get you reasonably secure encapsulation and a full kernel ABI without requiring virtualization or incurring the overhead of copying data back and forth to and from a VM). But the approach WSL 2 takes still provides what people actually wanted to use WSL for.
Of course, they’re also missing a lot of other opportunities regardless of the implementation. For example, WSL 1 and WSL 2 are both essentially useless for data recovery purposes right now because they provide no way to access storage devices or partitions without cloning them elsewhere first.
“Docker Desktop has been doing the exact same thing for years now and it works great) actually makes things work correctly more efficiently than would ever be possible with WSL 1 because of differences in semantics of how the system works at a very low level”
I use Cygwin and WSL to work with Docker for Windows. There’s absolutely no reason to go to WSL2 for that.
For awhile I used WSL1 quite a bit as my default terminal in Windows do IT/admin stuff. It works pretty well, except for being very slow on disk-intensive stuff (oddly, Ansible was slow enough that I often thought it was broken or the target host was down). Otherwise, having a full package manager and the ability to run regular Linux binaries (like openshift stuff) was really nice. This put it ahead of Cygwin and Mobaxterm which had lots of tools but not some of the odd-ball stuff I wanted to run. Plus, I could use WSL+VcXsrv+terminator to get a nice but slightly buggy approximation of my Linux box. It’s a lot better with WSLterm and tmux if you don’t want to go the X route.
WSL felt kinda like Microsoft testing something for containers. The stuff that didn’t work inside it (nmap, VMs, containers, etc.) is pretty similar to what doesn’t work inside a Linux container. I could have seen them working this as a possible solution for running Linux containers on Windows server.
I’m not sure I’d bother with WSL2 if it’s just a VM+hacks to share things. At that point, I’d probably just use a VM I spin up myself and make do with scp/rsync/vscode’s remote mode for files, and SSH into the VM for terminal
Some node.js based tools like Meteor were slower by a factor of like 14! That disk IO problem is no joke in WSL1. It’s only better in WSL2 in that they made getting access to the VM disk area a little easier.
CaptainN-,
I’ve heard this a lot of places, but can someone explain exactly why it occurs? Is it because POSIX semantics requires windows to flush sectors when it otherwise would not? Of course there’s the notorious mismatch between UNIX permissions and windows ACLs, but I’m not sure why that would affect performance so much. In my opinion windows has the superior model. Linux technically supports ACLs but when I tried to use them exclusively in my linux distro some software broke because they hardcoded security checks using the inferior permission bits. This created softwared bugs even when I had the correct permissions set using ACL. It seems we’re unlikely to ever get rid of legacy POSIX designs. Some POSIX apis are even bad on linux too.
The microsoft FAQ says WSL2 is faster using a virtual disk, but why the heck would I want to do that? I’d prefer cygwin because I obviously don’t require a virtual disk to use it. To me a virtual disk seems to defeat a large purpose of using WSL in the first. I’d really want to work along side windows processes and access the same files. And the FAQ says if you do need to access the same files that WSL2’s performance is even worse than WSL1, but that makes much more sense because file requests have to get proxied through the VM boundary.
Anyways, I’m curious about the exact reason for WSL1’s inefficiencies. Why was it that microsoft couldn’t fix it?
ACL under Linux just like Linux are DAC(discretionary access control). So ACL are technically not superior to the standard posix permissions. There is a trap posix ACLs cannot do somethings the posix standard permission bits can. Please note NTFS ACLs cannot do this either.
SUID and SGID bit these features cannot be done by POSIX ACL or NTFS ACL only by the standard raw posix permisions. Most likely it was not hardcode security checks more likely removing privilege raising/change user. SUID is run as user of the binary in the old posix permissions. SGID run with the group on the binary in the old posix permissions. POSIX ACL does not support run as. NTFS ACL does not have a run as either. So technically both POSIX ACL and NTFS ACL is missing a feature.
Removing and reducing the risk of SUID bits gets you into setcap on Linux this is again another set of features NTFS does not have the means to store effectively. There is no replacement to the SGID bit.
Most of the programs you find checking base posix permissions and throwing a error when they are not set in fact need either SGID or SUID to work of course setting user and group in ACL as it was is not going to work.
Alfman really I would like to know what applications broke if it was not this. The reality is any properly written posix program is not meant to check permissions themselves. Think when you turn on mandatory access control(MAC) from selinux/apparmor/….. this on Linux overrides all the DAC system yes the old posix permissions and posix ACLs can say one thing and if the MAC system says something different its the MAC permissions that will be obeyed.
oiaohm,
Typical linux distros still rely on the old posix permissions that I personally find inferior. Obviously the linux kernel supports ACL like windows, but it has low adoption among unix distros and developers.
I don’t think there’s a technical reason these bits couldn’t be added as an ACL, it’s just a matter of implementation.
I remember several programs breaking due to hard coded checks in userspace, this was probably a decade ago so I’m fuzzy about the details now. I recall openssh was one of them.
https://stackoverflow.com/questions/9270734/ssh-permissions-are-too-open-error
There’s no technical reason ACLs couldn’t impliment SGID as well. Personally I’d rather see the posix bits be deprecated and use the more flexible ACLs consistently. I know this isn’t going to happen and a lot of unix folks don’t like the idea of changing it. Everyone has their own opinions.
They’re technically valid “posix programs”, but I agree it’s a bad practice.
SElinux and apparmor are a whole other thing. I find they break a lot of software in places that the software don’t expect to break. I prefer using file system permissions.
“”” I don’t think there’s a technical reason these bits couldn’t be added as an ACL, it’s just a matter of implementation
There is a technical and a historic implementation reason why they are not. SUID must only be done once. ACL is designed for a list of different settings. Keeping the old posix permissions around for SUID avoided having to create a mandatory first entry in the ACL system for SUID. SGID override also is must only be done once.
By the way where SUID/SGID must only be done once comes from a historic POSIX ACL implementation. One system in history in POSIX ACL implemented different SUID/SGID values based on user running the binary that is a pure nightmare for support for pure in fact reasons “It worked for X user but then totally failed for Y” now your are the person on the support desk getting that. I should have been more correct you don’t want the POSIX ACL implementation of SUID/SGID its insane so the choice was made to leave this with the old posix bits. Basically early prototypes that lead to MAC and MAC is horrible at times but more sane than this.
There is a old saying if it is not broken don’t fix it. Sometimes more flexible does not equal better.
—ssh “permissions are too open”—
I really would hope it was not this one ACL being DAC are additive not subtraction. So a pure ACL system you would want default posix permissions set to all 0000 if they are in fact disabled. For ssh to come up “permissions are too open” the default posix permissions were not disabled. Then you also have to think having to process the ACL list to check if the file is secured correctly could take some time.
–They’re technically valid “posix programs”, but I agree it’s a bad practice.–
No they are not technically valid. Technically checking permissions and presuming you have the correct value by posix standard is invalid. So its not only bad practice its not posix standard conforming.
Of course I can understand why sshd might break this rule any program breaking the rule better have a solid justification as it is break of standard.
–Typical linux distros still rely on the old posix permissions that I personally find inferior.–
I can see this point of view. For a lot of core application stuff the old posix permissions are really enough.
There is some advantages you miss. Since posix permissions are not ACL if someone screws up posix ACL settings you can remount drive with ACL permission processing disabled and dig you way out using the posix permissions. If you have not noticed you can also mount lot of drive under Linux with a posix permission override.
Some ways splitting the core OS and what the user will want to setup can be a really good thing.
oiaohm,
Obviously it was done this way historically. However they could have implemented all the needed semantics under ACLs that would have been more flexible, so that’s not a con. Of course POSIX took the route of adding ACL later on, but things would have been better in terms of tooling and consistency had we started with ACLs up front. Now that the legacy POSIX cruft is here, it will remain for the long haul.
For backup & nas administrators it kind of is broken & frustrating though. I encounter multiplatform issues frequently enough. I was collaborating with a coworker using git and every time he committed I would loose the posix bits because he was on windows and I was on linux. This is a regular problem with file archives too.
Granted you can place blame wherever you want, you could blame windows for not having the posix bits, but as someone who prefers the windows way ideally for me unix would be the one to change. I know it wont change, but that’s my opinion.
I don’t really see it as a problem. Ether the ACL permissions are right, in which case there’s no reason to disable them, or they’re wrong, in which case you correct them and there’s still no reason to disable them. Am I missing something? As a windows administrator I’ve needed to recursively replace ACLs on a branch of the file system, but that’s not particularly hard to do. I just granted/denied permission as necessary and that was that.
When I moved to linux this was an area of major frustration since the tooling wasn’t on par with windows. Even right now I’m trying to set ACLs under dolphin and it’s failing silently, I don’t know why.
Anyways my gripes are kind of tangential to the topic. I was really more curious why WSL1 file operations were so difficult for microsoft to implement efficiently? I’d really like someone to explain why WSL1 file operations are slow because that’s something I don’t understand.
Microsoft did implement SUID/SGID support in NTFS and the filesystem stack, it was exposed under the old Services for Unix/Interix system as of SFU3 (and there was an option at install to enable/disable)
I’m not entirely clear how the underlying implementation worked (for example, was it a new bit on the special OWNER pseudo-user ACL, or was it a new extended attribute/alternate stream?) but it was implemented and used.
I can’t find documentation for this, but from memory alone SFU also implemented directory SUID/SGID/Sticky semantics, though I may be wrong on that.
Microsoft still has an NFS server service that stores and reports those permissions too, and it used to do so in a manner compatible with the SFU/Interix subsystem, so it’s likely the implementation still exists and is in partial use.
Windows has very different locking and file handle semantics. You can’t delete an in-use file for example (whereas in Linux you just unlink it and the using process keeps accessing the orphaned file). Emulating Linux/Unix style file semantics utilizing the underlying Windows semantics is slow.
I use cygwin for about 100 years (feels like this, at least since NT 3.5). And it serves best to have both worlds: *nix and Windows on one platform. _But_ I do install it in “C:\” and not “C:\cygwin” even though the installer does not recommend it.
The great advantage: All my cygwin path are like my Windows paths.
Of course, I use also one single files system in Windows and use junctions to map in other drives and folders.
Only draw back: Speed. I have some projects that build 4 times quicker in a Linux VM (VirtualBox) than on Windows with cygwin. But, I can easily switch. So WSL1 or WSL2? No.