Microsoft’s PowerToys for Windows 11 and Windows 10 has been updated with a new feature called ‘File LockSmith’. So what exactly is File Locksmith? In technical terms, it is a Windows shell extension that lets you check which files are in use and by which processes.
Up until today, it was not possible to find out which particular process is using the file on Windows. While Task Manager lets you eliminate processes, it cannot tell you what’s using your files or preventing file transfer. In fact, File Explorer will block your attempts to delete a file or folder in use by a process or app.
I lost count of how many times Windows would just stubbornly refuse to delete a file or directory because it was in use by some process, while not telling me which damn process we’re dealing with. Isn’t it absolutely bananas that it’s 2022 and you have to download some shell extension to get this basic functionality?
SysInternals Process Explorer has had this option for over a decade if not two.
Process Explorer makes it possible to find these but it’s not in the context of the thing being deleted; you have to open up process explorer, search for it, then kill the process.
OMG, so complicated.
In reality it’s a few mouse clicks away and tiny bit of typing.
What’s more important is that it requires no installation and has no chance of breaking stuff.
I used process explorer for years to handle this task until I discovered Unlocker and never looked back.
I’ve been using “Unlocker” for years. It even lets you just nuke the file handle, kill the process, or even schedule on-restart deleting for pesky file locks. Nice to see this is getting some first-party attention,
I’m still surprised that, in 2022, they don’t have any apparent migration path to the UNIXy OS approach of “If the file is in use, then ‘deleting’ it is really enqueueing it for deletion when it ceaces to be in use” that falls naturally out of hardlinks and file descriptors both being strong references to inodes.
Isn’t that solution about 50 years old now?
While that is generally better, it is still a trade-off.
The main issue is it allows race conditions on file names. Two processes then might be reading two different files with the same path (hence possible security issues if code is not careful on this specific issue).
Sudo for example had this:
https://www.sudo.ws/security/advisories/path_race/
Of course it can also happen in Windows, too. But files themselves are locked (shared or exclusive depending on read/write modes). On Linux side, this requires a separate lock file, or network lock service (nfslock, samba hacks, etc).
Bottom line: there is no the solution, but rather different design concessions.
sukru,
I agree, there’s definitely a tradeoff. I honestly don’t mind the windows approach of locking files, it makes sense. However IMHO it is a pretty dumb limitation that the UI doesn’t 1) clearly show what’s holding a lock and 2) let you kill it.
I do prefer being able to move/rename open files(ie libreoffice files, pdf files, log files, etc) as linux allows. This greatly simplifies log rotation. The inability to rename files has bitten me on windows many times. As for deletion, to be honest the linux approach of inodes lingering after deletion is somewhat less intuitive. One might reasonably expect the resources of a deleted file to be freed immediately, but it’s not the case on linux.
On a similar note I’d add that linux has some pretty major issues with open files locking mount points. One may not experience these under normal conditions, but say you’re using a network file system and need to unmount a FS, linux can be a royal pain about this because there’s no way to force an unmount operation. There are race conditions in the kernel that can lock up the machine,
I encountered this the other week, I was running a VM and mounted a network share from the VM onto the host. While it might seem silly, I had a reason to do it but I forgot to unmount the share. When I shut down the host, the VM shut down as expected, but it caused the host’s umount operation to fail and consequently the host locked up during the shutdown process. Even the onscreen timeout countdown failed…I was able to open another console, but it didn’t help because umount would just freeze. I had to force a reboot to clear the kernel condition. Bah.
Alfman,
I agree, Windows could have done better on file handle ownership. I think this was discussed here before, but essentially Win32 file handles are not always unique, and can be shared among processes: https://learn.microsoft.com/en-us/windows/win32/fileio/file-handles
Any child and duplicated process might inherit them (fun side note: someone copied Cygwin’s fork() code as a generic Win32 helper: https://github.com/kaniini/win32-fork/blob/master/fork.c)
Anyway, they would make the actual owner hard to find (there would be a bunch of different owners). Also add in things like cross-process interactions (OLE, shell extensions, …) this might get more interesting (but I am not sure how exactly those are handled).
If we had a time machine, they could have fixed child processes. Duplicate ones (fork()) would be not be possible, but at least it would narrow it down a lot.
And, on the other side: Yes, Linux has implicit file locks, but only across mount points. Previously lsof was very helpful for me in identifying straggler jobs holding onto those. But recently it did not help anymore (I think it has something to do with Kernel namespaces, possibly docker and/or systemd, but not sure).
sukru,
The data structures may not be efficient (ie maybe it requires a scan of all processes), but in principal it should always be possible to find the relationship between open files and processes in a kernel driver that has access to kernel structures. I’m not sure of OLE’s details either but I don’t think it matters since a process’s file table either contains the requested file or not. The bigger dilemma might be how to appropriately force a file to be closed while minimizing side effects.
1) Should the processes be killed?
2) Should the file be closed on behalf of the application such that the application receives invalid handle errors?
3) Should the open file be turned into a tombstone?
Say we’re talking about outlook or windows explorer, killing the process seems bad, but not killing the process and exposing it to zombie file handles could expose even deeper issues. No matter what you may end up with undesired application behavior. So I think killing the processes is probably appropriate.
I don’t like fork semantics and prefer explicitly spawning children. Fork is very clever but it’s highly dependent on over-provisioned memory and the OOM killer, which have real cons. Even with copy-on-write the inefficiency of forking large processes can become problematic.
Yes, lsof is useful. Also “fuser -k -m mountpoint” to kill processes that are keeping a mountpoint locked. It can be useful for automation scripts, but these utilities are still affected by the kernel bugs responsible for lockups and there are times when nothing can be done short of a reboot.
I mentioned network file systems earlier, but it can also happen with bad media. As an admin you may want to force an unmount, but instead the linux driver locks up, which is obnoxious! I’ll use “mount -l” to perform a “lazy unmount”, which at least hides the bad mount point, although it doesn’t fully clear it from the kernel.
Alfman,
Except the kernel makes these available and has done so for years, which is how PowerToys is able to do it without a driver. See its implementation at https://github.com/microsoft/PowerToys/blob/main/src/modules/FileLocksmith/FileLocksmithLibInterop/NtdllExtensions.cpp .
For the fork() discussion, I think it’s a bit of a red herring. Without fork(), there’s still going to be a need to support things like piping data from one process to another, or redirecting output to a file. Currently that works via either implicit handle inheritance (which doesn’t depend on fork()), or explicit handle inheritance in CreateProcess, or launching a child process as suspended and duplicating handles into it. If the goal were to eliminate any possibility of two processes having access to the same handle object, I think we’d be limited to the last route, with calls like CreatePipeSpecifyProcesses that returns two handles in two arbitrary processes. But, I’d argue, even if we had that it doesn’t really solve the problem – it means one process is pushing handles into another process, so identifying which process is actually responsible for leaking handles is not clearly defined. (It would indicate which process to kill to release them however.)
malxau,
I’m not sure why you say “except” here. I agree with you and I’ve used those userspace tools as well. My point was that no matter what, it should always be possible to correlate files to the processes that have them open regardless of how the handles were acquired.
I agree. I strongly prefer explicitly passing file handles to the child. IMHO implicitly cloning them into children as fork does is extremely bad practice. Now days linux has added O_CLOEXEC to prevent file inheritance on select files, but it has to be explicitly set on every opened handle and this is very easy to overlook. It should be the default but unfortunately making it the default would break backwards compatibility. It’s impossible to do the right thing when you’re using 3rd party libraries that you may not have any control over.
https://github.com/brettwooldridge/NuProcess/issues/13
On top of this, if you are multithreading, then even O_CLOEXEC is broken since two threads could legitimately be preparing to fork simultaneously and the children can end up inheriting file descriptors meant for the other child.
All software that forks is potentially affected and developers who haven’t thought about this are very likely leaking handles without knowing it.
issues.apache.org/jira/browse/MESOS-8913
github.com/nodejs/node-v0.x-archive/issues/6905
… We could add thousands of examples here.
It’s ugly as heck, but the only generic solution I’ve seen covering all corner cases is to iterate over all possible file handles and explicitly tell the kernel to close all unwanted handles from within the child after a fork.
hg.openjdk.java.net/jdk8/jdk8/jdk/file/d94613ac03d8/src/solaris/native/java/lang/UNIXProcess_md.c#l737
On windows we have a better option because the CreateProcess call lets us disable file handle inheritance altogether and we have other methods of explicitly passing handles as you mentioned. Accidental leaks are far less likely to happen on windows.
https://learn.microsoft.com/en-us/windows/win32/api/handleapi/nf-handleapi-duplicatehandle
Two things.
First, Windows does not generically prevent operations because a file is “in use.” Windows lets the thing opening the file specify intended behavior via FILE_SHARE_DELETE. Developers often don’t specify this when they should, including in popular cross-platform abstraction layers including the C runtime, which leads to people believing Windows has a rigid behavior. The cross-platform abstraction layer thing drives me crazy because the abstraction layers are (typically unwittingly) the things causing Windows to not behave like other platforms.
As far as the migration path, see FILE_DISPOSITION_POSIX_SEMANTICS. This is used widely inside the OS today. It provides the semantics of queuing it for deletion when it ceases to be in use, but the FILE_SHARE_DELETE requirement is still present (if an application is explicitly locking by not indicating delete compatibility.) That part is really the elephant in the room.
malxau,
You are correct. And it is not surprising that Windows kernel would offer more choices than Linux.
During WSL 1.0 times, they added a new persona to their micro kernel for answering Linux syscalls. So, at least for a while, Windows kernel API was basically a strict superset of Linux counterpart.
(Now they prefer to use virtual machines for performance reasons).
sukru,
Not necessarily, there may be instances where WSL1 just emulated syscalls that aren’t in the kernel the way cygwin does.
I’ve seen benchmarks, but I’m curious about what exactly is behind the performance gaps? I’ve asked before yet nobody could give an answer. I wonder if it’s because the WSL2 VM is acting like a ram disk whereas windows may be committing more to disk earlier? Does anyone know if there have been studies to get to the bottom of this?
For me personally I find that running linux applications on a virtual disk in WSL2 looses a principal advantage of having one common namespace for all software. Of course WSL2 supports access to the host FS via the 9g protocol or CIFS shares, but they both perform worse than WSL1 did.
https://vxlabs.com/2019/12/06/wsl2-io-measurements/
https://neilbryan.ca/posts/win10-wsl2-performance/
Well, I don’t think the WSL 1.0 layer had every single syscall, but yeah. The capabilities of the two kernels aren’t that different.
This gets into a bit of semantics. WSL 1.0 was a kernel driver that implemented a Linux compatible syscall interface. It’s true that the driver was more than a translation layer in parts, but being in kernel mode, it was natively implementing the syscall interface in a way that Cygwin … didn’t. I mean, WSL 1.0 would use a drop-in binary glibc. On some level it comes down to how you’d describe a kernel driver providing kernel facilities that’s not part of the core kernel binary.
It’s the file system stack and namespace operations, yes. The full answer is multi-faceted, but if you compare file system namespace operations on Windows and Linux, Linux is substantially faster. WSL 2.0 gives the Linux VM a block device, leaving it to own its own file system namespace, avoiding using the Windows one. This isn’t really about committing to disk any differently though.
There’s a lot of reasons that Windows is slower. One big reason is it has an extensible filter stack where many drivers can plug-in to observe/intercept file system operations. It ships with around a dozen of those, including a built-in real time antivirus. So traversing the Windows file system includes a real-time AV (and other things), where the Linux stack does not. This gets particularly expensive for operations that can be cached, because on Linux there’s a syscall to a nearby data structure, on Windows an Irp is passed between multiple drivers before returning a few bytes of RAM, so the percentage overhead is high.
WSL 1.0 also had some challenges around supporting Linux file system features on NTFS (which didn’t always natively have them.) That led to using EAs to store extra metadata, but that also means more updates. Or, if you access the drive directly, you just lose those features and get a sub-POSIX file system featureset.
malxau,
Do we know that for a fact or is that just an educated guess?
The difference between WSL1 native ext4 (700MB/s) and WSL2 native ext4 (917MB/s) is hard to explain if the IO patterns are truly the same. The abstraction should be capable of handling several gigabytes per second even if the WSL 1 layer added another 100% CPU overhead, I’d still expect disk IO to be the primary bottleneck rather than the CPU. So this leads me to believe that one of the following is true:
1) WSL1 CPU overhead is absolutely horrendous for some reason.
2) the WSL1 implementation is committing data differently (ie less caching, less buffering, more frequent commits, different cluster size/collation/read-ahead etc) than WSL 2 does with the virtual disk.
Maybe I could conduct some tests to figure out what the bottleneck is, but I was hoping someone may have done it already.
Possibly, but it doesn’t explain why the NTFS performance loss would be greater for WSL1 programs that windows native ones. Also things are still slow when AV filters are disabled so it doesn’t seem like the root cause here. It would be possible to test though: run the benchmarks using extremely slow media. If a large discrepancy still exists, it proves that the bottleneck is in disk IO access patterns. But if the discrepancy disappears, it proves that WSL1 is experiencing some kernel bottleneck.
Yes, that seems probable. I suspect WSL1 has to keep it’s own file descriptor table because linux uses file descripters differently than the windows primatives (like sockets, epoll, timerfd, eventfd, etc). But even with a little indirection I’d still expect the file IO operations to be a magnitude faster than the underlying disk, so it’s surprising to me that there’s such a noticeable performance loss (unless it can be explained by different disk access patterns).
Alfman,
That much I know for a fact. (I’d generally avoid saying things I don’t know or clearly label when I’m speculating.) I wasn’t part of the discussions to move to WSL2 though, so it’s possible that there are other performance issues, and there are definitely compatibility issues, that influenced that decision.
I don’t understand this part. WSL1 is a shim layer to the Windows kernel, so there was no ext4 driver to use. This feeds back to the “compatibility” point above – it turns out there’s a big difference between running Linux programs and having good operability with a Linux system or environment. One big example was not having FUSE support in WSL1.
Note that my point is really focused on namespace operations too, and these numbers suggest something IO heavy. Things like “git status”, “git checkout”, “git clone”, or even unzip were held up as bad examples. These need to visit many files and do a small amount of (typically cached) IO on each. Git for Windows has evolved to work around this, with a usermode file system cache and leveraging Windows facilities for change notification. Running Linux git on a Windows file system means having all of the overhead without the mitigations.
To give a concrete example of how this evolved, Linux has a very frequently used stat() system call which takes a path name and returns information about a file. NT expects callers to open a file, query information, then close the file. This type of mismatch manifests as overhead just because the expectations of one ecosystem are not the semantics of the other. That in turn led to NtQueryInformationByName to try to reduce this gap. Unfortunately since filters typically used file open and close and string parsing to identify which files to operate on, they had to filter this operation too, which still leads to much more overhead compared to Linux.
That’s definitely going to happen because the caches are owned by different kernels, although the results weren’t as dramatic as namespace heavy operations.
On that manner, it is actually not much different than how NT provides Win32 API. The microkernel (through NTDLL) provides different subsystems, including Windows (kernel32), and Linux (WSL), and in the past, this included OS/2, and POSIX (Interix). Maybe DOS, too was this way (NTVDM?)
So, essentially, Linux apps became native under NT (but not Win32).
I think Windows and Linux kernels have very different I/O and VMM semantics. And that might have caused most of the discrepancies. And of course, there is some translation one way or another. And it might not have been written in the optimum way possible.
At the end of the day, WSL was implementing Linux binary ABI on top of NTDLL. And had to translate fork() or other design choices into multiple unrelated calls:
https://learn.microsoft.com/en-us/archive/blogs/wsl/windows-subsystem-for-linux-overview
And to be fair, yes, somethings were not supported (like docker).
So it was an uphill battle to keep WSL both compatible and fast, while Linux itself was moving forward. Naturally, they decided to go with a clean slate, and a vm.
malxau,
Yes. I understand your points about namespace operations. The overhead for file metadata operations will be more affected by in-memory operations and not merely disk speed. It would be neat to do file meta-data benchmark as well. However it is the file IO performance numbers I am curious about because those didn’t seem to add up to me.
I ran some benchmarks on my system using the same benchmark parameters as posted earlier…
fio -name=random-write –ioengine=windowsaio –rw=randwrite –bs=64k –size=256m –numjobs=16 –iodepth=16 –runtime=60 –time_based –end_fsync=1
Note I didn’t do multiple runs and there could be run-to-run deviations, however this was just to get a general idea of what the performance looks like between WSL1 & 2.
The performance costs of AV security is significant. When that overhead is disabled though there doesn’t seem to be any performance advantage for WSL2 + ext4 on my computer and in fact WSL1 + NTFS seems to be a bit faster under this benchmark.
https://postimg.cc/n9cjgH97
The overhead of the 9p protocol that WSL2 uses to access the host’s NTFS file system is very significant. It is so slow that even an NTFS ramdisk using 9p is slower than WSL2 mounting ext4 on a physical disk.
However these tests suggest that drvfs + NTFS used by WSL1 is easily capable of 2+GB/s and should not be a major bottleneck for throughput unless you have very fast storage. And on my system this combination is a bit faster than WSL2 + ext4 unless I enable the windows AV. I think this answers all my questions from earlier.
Unlocker stopped to work after Windows 7.
In Microsoft defense most people today don’t know what a file or a process is.
Nice one.
And the trend is towards ignorance, with the percentage of users aware of these concepts falling.
I replace in use system files all the time, you just rename the file in use. Replace with a new one….. It’s an easy enough to figure out