KDE’s Usability & Productivity initiative is now almost two years old, and I’ve been blogging weekly progress for a year and a half. So I thought it would be a good time to take stock of the situation: how far we’ve come, and what’s left to do. Let’s dive right in!
This initiative has been a lot of fun to follow. If you always update to the latest KDE release – for instance by using KDE Neon – you’ll see the weekly fixes and polishes highlighted in the blog posts appear on your machine very quickly.
In windows if your system becomes unresponsive or really slow you press Ctrl + Shift + Esc and you are able to close the problem.
In linux nowadays you cannot open the terminal when the system starts to run out of memory! You simply have to follow the REISUB way and force a restart. As long as the desktop does not have a way to kill the unresponsive tasks it is really destined to fail again and again no matter what the KDE team does.
Linux does have a way of freeing memory when running out. You can always shift to another console and kill processes from that, but the OOM killer should be doing it for you. Besides, you should have swap, so the computer just gets slow.
The biggest problem Linux has it that doesn’t recover well from having used swap, it just leaves stuff in the swap, instead proactively recovering, thus the responsiveness goes bad after being in a swap-using situation.
> Linux does have a way of freeing memory when running out. You can always shift to another console and kill processes from that, but the OOM killer should be doing it for you.
That’s the weird thing about Linux, it should work as you describe, but it doesn’t. On my system with 8GB of RAM and no swap, when I run out of RAM the system basically just grinds to a halt. The mouse starts jerking, and the disk I/O goes crazy, and it usually doesn’t even have enough RAM to switch to a virtual console. This is a known issue:
– https://unix.stackexchange.com/questions/373312/oom-killer-doesnt-work-properly-leads-to-a-frozen-os
– https://bbs.archlinux.org/viewtopic.php?id=233843
– https://askubuntu.com/questions/432809/why-is-kswapd0-running-on-a-computer-with-no-swap/432827#432827
– https://unix.stackexchange.com/questions/24625/how-to-completely-disable-swap/24646#24646
So I tried adding a few GB of swap, but that only delays the problem. In addition to the fact that it doesn’t clean up unused swap data, as you mentioned, it also still fails to activate the OOM killer when RAM + swap are close to full.
It’s not just about swap – thrashing also occurs when kernel drops mmapped pages with executable code of other programs. Say, you have a task consuming all available memory but not (yet) triggering an OOM killer. Then, you try to open a terminal window to kill or suspend the offending task – this forces WM, X-server, terminal application, shell, ps, etc to to re-read their program code from the disk, all while the kernel is purging it back. In fact, when you switch off swap completely, this effect becomes even more pronounced because the only thing kernel can do is to drop caches, and it does so late and aggressively.
This is not a new issue but I found it more severe than in the past. Not sure, if something has changed in the kernel, or is it simply more common now to have heavy multi-threaded tasks consuming a lot of memory and interactive sessions that run more user-mode code than before.
Linux virtual memory manager doesn’t offer much to help in this situation. One possibility is to increase vm.min_free_kbytes but I’m not sure if that really gives more memory to caches. Another option is to limit maximum amount of memory user process can allocate with cgroup, but that requires careful configuration (UI programs have to be given extra memory) and it is now more complicated because systemd has effectively hijacked cgroup for its own purposes.
https://askubuntu.com/questions/41778/computer-freezing-on-almost-full-ram-possibly-disk-cache-problem
https://www.gen.cam.ac.uk/local/it/projects/ubuntu-cgroups-and-trying-to-stop-users-making-a-system-unusable
ndrw,
You’re exactly right, and I agree this is probably what rahim123 is experiencing. The binary code of an executable behaves a lot like swap even if there is no swap. When a request is made for more memory, the system will take that away from cache and therefor begins evicting static executable code from ram. With little or no swap, the system has no choice in evicting frequently accessed code instead of rarely accessed dynamic data. The result it lots of thrashing. So somewhat ironically, adding swap would increase performance by giving the system a choice of what to evict instead of evicting the much needed caches.
You can tune the OOM killer to do it’s evil deeds sooner. Personally though I don’t even bother messing with marginal ram conditions any more, I’ve long gone the route of over-provisioning memory, it’s cheap enough and eliminates all these troubles.
Right, thanks for the good explanation. I accept the fact that miracles can’t be achieved when trying to use more RAM than the system has, and obviously the user needs to either change his application habits or add more RAM. But I still think that an OS should handle bad situations gracefully without *EVER* losing control of the kernel and other core OS components. And in this sense Linux doesn’t handle that condition well at all compared to other OSes. Even Android appears to have patches or different configurations for its Linux kernel branch, because it is famous for immediately killing userspace apps when RAM is at a premium, and by virtue of that the core system is rather stable. That’s basically the behaviour that I would expect on a desktop Linux system as well. It’s the lesser of two evils; preemptively kill a single userspace task or else lose the entire system and all other tasks.
>On my system with 8GB of RAM and no swap, when I run out of RAM the system basically just grinds to a halt. The mouse starts jerking, and the disk I/O goes crazy, and it usually doesn’t even have enough RAM to switch to a virtual console.
Yeah, that’s a pretty accurate description of what happens on my system with 8GB and a 2GB swap partition, rahim123.
It seems like it has happened less with Fedora 29/30 than it did with Debian 9.x, but that may be due to changes in my habits or something. Either way, when it does happen it’s annoying as heck.
emojim,
You don’t describe what you are running, so it’s not really possible for me to say if I think 8GB is enough, but I’ll say that 2GB swap is unusually small, how did you arrive at that? 8GB of swap would let the system use much more memory as binary cache. If you’re using LVM that’s easy to change, otherwise you could try swapping to a file.
Tools like “vmstat 1” and “iotop” may help you identify realtime swap & disk bottlenecks.
Same here, 8GB of RAM and 2GB swap. The problem sometimes occurs with a combination of too many tabs in Firefox and Chrome + Win10 running in VirtualBox. I just increased my swapfile size to 8GB per Alfman’s recommendation, we’ll see if that helps.
rahim123,
It can go both ways, I’ve had terrible swapping performance under windows too. It really depends on what you are throwing at it, in my case visual studio can really knock out windows if you don’t have tons of ram for it.
Android is using zram, which is available in mainline linux/ubuntu too.
https://en.wikipedia.org/wiki/Zram
https://www.solaris-cookbook.eu/linux/ubuntu-zram-memory-compression-compressed-swap-ram/
I haven’t played with this personally, and I don’t know why it’s not enabled by default, but you could give it a shot.
Interesting. How much memory is allocated to the guest? Do you have swapping enabled in the guest?
I’m not really sure how the dynamics play out between the guest and host using virtualbox. The worst case scenario I can think of is that both the host and guest decide to swap out the same pages and causing totally pointless IO operations. Especially if vm.swappiness triggers a swap, the likelihood of both the guest and host swapping the same pages seems pretty high under the right (ie bad) conditions. I don’t know what (if anything) virtual box does to prevent this.
I know that with linux QEMU/KVM, you can use a balloon driver to allow the guest to return unused memory to the host. I looked this feature up for virtualbox, and it has a balloon driver, but unfortunately it will not return memory to the host. So logically it means the host is forced to swap it or keep it resident even if it’s not in use by the guest
https://www.virtualbox.org/manual/ch04.html#guestadd-balloon
Before you change anything else, I am curious if increasing swap size has helped.
@Alfman
> Interesting. How much memory is allocated to the guest? Do you have swapping enabled in the guest?
Only 2GB of RAM for the guest, no swap. It appears to happen when one of the browser tabs has memory rapidly increase, or if I open a document in LibreOffice or PDF viewer without thinking; that’s when the I essentially lose control of the entire system.
rahim123,
For the record, I 100% agree with you that OS should handle OOM conditions by killing user tasks (hopefully the offending ones) rather than evict everything else from RAM. I run my systems without swap, I have more than enough of RAM, so saving several GB of space is not worth the wear and tear of my SSDs. Adding swap is a bit of a workaround – it doesn’t really solve the problem but it make it a bit less sudden, so if you react quickly you may be able to kill the task manually.
Adding RAM shouldn’t be a requirement either. I can easily saturate any amount of RAM if I try, and I would like these situations NOT to require a hard reboot. That’s like throwing a kernel panic due to an userspace error – simply unacceptable for any operating system.
The solution would be to reserve a certain amount of memory for caches, and perhaps prioritize some user processes. vm.min_free_kbytes doesn’t really help, or help enough – I still get the freezes. Perhaps cgroup would work but it is far from trivial to configure it correctly – this would be a good task for distributors to look into. So far, the only reliable way of avoiding hard reboots I have is forcing OOM killer manually with Alt+PrnScn+f but that has to be enabled ahead of time.
> Adding RAM shouldn’t be a requirement either. I can easily saturate any amount of RAM if I try, and I would like these situations NOT to require a hard reboot. That’s like throwing a kernel panic due to an userspace error – simply unacceptable for any operating system.
^ Right on.
> The solution would be to reserve a certain amount of memory for caches,
Just today I ran across a sysctl setting that claimed to be specifically for that purpose:
– https://superuser.com/a/1166629
Although the description here seems different:
– https://www.kernel.org/doc/Documentation/sysctl/vm.txt
At any rate, I tried setting it to 64M, 128M, and 256M, up from its default of 8MB, but it made no difference in my testing with `stress-ng`, the system still became unresponsive.
One of the most useful workarounds that I found was in one of the excellent links that you posted earlier:
– https://askubuntu.com/questions/41778/computer-freezing-on-almost-full-ram-possibly-disk-cache-problem#comment45713_41789
After enabling the SysRq functions on my system, I found that Alt+SysRq+F consistently and quickly forced an OOM reap of the highest memory process, allowing me to recover control of the system, including the desktop environment.
Another helpful suggestion I found today was about using zram compression:
https://www.reddit.com/r/linux/comments/aqd9mh/memory_management_more_effective_on_windows_than/egldem9
On my system with 7.5GB of real RAM, after starting the zramswap.service it automatically configured 4 compressed swap RAM devices with a total of 7.5GB additional swap space, and they automatically have a higher priority than my traditional swap file. This allowed me to throw some huge memory hog processes at the system and it did indeed remain responsive enough to at least kill them with Ctl+C or the application’s close button, as I would expect.
There’s also the “earlyoom” tool that an Arch user wrote, which forces the OOM killer if the total available RAM drops below 10%, and they say it’s very effective. But as you mentioned, any serious OS should do this at a kernel level, and I don’t like the hacky feeling of delegating such important functionality to a userland script.
ndrw,
If you had “more than enough of RAM”, then these problems should not come into play. Death by swapping suggests not having enough RAM and I’m guessing that you do not have enough RAM to cover all allocations plus memory mapped files/cache. On my computers I try to follow a rough rule of thumb, 50% memory used by applications/OS and 50% for mapping/caching, so I don’t have to face the swap penalty. Sometimes you can get by with less cache, but RAM is something I don’t recommend skimping on.
If your complaints are about linux running with insufficient RAM and you don’t want to upgrade RAM for whatever reason (ie apple laptops that don’t let you upgrade ram), then using swap is strongly advised. If you refuse to use swap (because you don’t want to increase write wear on your SSD), then you’ll end up with terrible performance since you’ll have no space left for memory mapping/caching, which fits the symptoms that you are describing.
I personally recommend avoiding this scenario in the first place, but I respect your choices and so here are three alternatives that you can look into:
https://github.com/rfjakob/earlyoom
https://github.com/hakavlad/nohang
https://github.com/facebookincubator/oomd (requires 4.2 kernel).
rahim123,
Geez, looks like everything I suggested you already posted about yesterday, but our comments were hidden because wordpress flagged them for “moderation”. This is extremely jarring to the flow of discussions… I hate wordpress sometimes, the heuristics suck. Anyways at least you have a good grasp of the subject!
The OOM killer is such a blunt instrument especially for processes that were following the rules and using resources that the OS already granted. I oppose having a kernel that kills except as a last resort. Instead of making the kernel responsible for killing sooner, I’m inclined to go the other way and make kernel cooperate more with userspace to resolve the condition. It’s a shame linux doesn’t have a standard mechanism whereby the kernel signals all processes to voluntarily release resources. This could trigger garbage collection, unload resources in hidden browser tabs, tell VM guests to release memory and defer jobs/forking/allocation until such a point when the system has more resources. Alas, while such developments could be technically feasible, software developers are a stubborn bunch so I’m skeptical of our ability to make progress.
I posted some opinions on OOM and overcommitting in the past…
https://www.osnews.com/story/27410/linux-312-released/
Alfman,
I know you mean well but such advices are supper annoying. Every time someone mentions low-memory conditions others come to derail discussion towards adding swap or RAM. Again, I have enough memory (64GB) for my system (uses <8GB) and two threads of my computing tasks. If I wanted to run 16 such threads (max for my CPU) I would have needed a different platform, but then how about running 64 threads? Swap is useless in my scenario, although it does have a positive side-effect of making thrashing more gradual, so sometimes it is possible to rescue the system if I react soon enough.
What I would really like to have is an option like "vm.min_cache_kbytes" to at keep most recently executed program code in RAM. Decent distribution settings giving extra memory to UI-related processes would be nice as well. In Windows I can at least start a task manager and kill offending processes. In Linux there is SysRq-f but it is disabled by default, available from the system console only and it uses heuristics to find the offending tasks, which does occasionally kill other programs.
Thank you for the information on earlyoom, nohang and oomd – I didn't know these tools and it didn't occur to me to look for userspace solutions. They may fix oom issues in a multi-user setup we have at work (a typical config with remote X11, NIS, NFS).
Easiest way to avoid the OOM killer is by setting vm.overcommit_(ratio|kbytes) to something sensible, and vm.overcommit_memory=2
https://www.kernel.org/doc/html/latest/vm/overcommit-accounting.html
ndrw,
Why do you feel my advice to rahim123 is super annoying? I asked what I thought were relevant questions, offered explanations, gave some suggestions and shared my experience. It might not map to your scenario, which we can discuss too. I don’t want to annoy you (although I’m sure it happens from time to time, haha), but I do feel it would be a mistake to refrain from sharing what I think on that basis. So what would you have me do differently?
Anyways, you have 64GB total RAM, 8GB used by the OS, and the remainder divided by 16 threads of your own computing tasks? I assume you really did mean threads rather than processes, and since threads share memory with each other it’s more about the memory used by the process overall. You didn’t mention how much memory your process uses, but naturally performance will deteriorate as less and less RAM is available for caching. As it reaches 55GB, the system would only have 1GB for caching. What should linux do? Is your complaint that linux hasn’t killed your process by the time it consumes ~55GB causing the system to crawl? If that’s not your complaint, then what do you think linux should do about your process?
Have you tried making your process nice/ionice?
Could you setup a watchdog to kill/regulate your process?
I don’t know what you process does, but are there ways to optimize/compress the memory that you use?
Let me know more about the process if you’d like to discuss it in more detail.
You’re welcome. You are not alone with those problems and I agree things could be better especially in terms of the OOM-killer. Hopefully one of these tools can help. You could even roll your own tool to do whatever you wanted and mlock/prioritize it so that it’s always available.
Alfman,
I was referring to your answer to me (osnews forum threading) and only the part about adding RAM and swap. I really appreciate your other comments to me and others. Nothing wrong with adding RAM or swap either, both are valid solutions, it’s just every discussion on virtual memory issues starts and often ends with that. To people seeking advice it sounds like “reboot or reinstall”.
My tasks (processes, not threads, sorry) consume up to 25GB each, depending on data. I can reliably run two of them in parallel on my system. What I meant, if I had more RAM, I’d simply run more tasks (I’m memory not CPU limited) and would still run into the same issue. The task is a data processing python script using multiprocessing.Pool() to parallelize operations, it’s far from optimum but rewriting it would take a lot more time than running it in fewer processes. Anyway, that was just an example, I was running into this exact same issue many times in other scenarios, sometimes losing data from interactive sessions etc.
I’ll try Ekorn’s advice on overcommit strategy, but I’m worried that may be interfere with normal function of the system.
As for the expected behaviour, yes I’d like OOM killer to be triggered before all caches are evicted from memory. The chance of a success (killing the right task) is higher if the system is not yet deadlocked. As strange as it sounds, there is no way to reserve certain amount of memory for caches, short of setting limits for each type of tasks with cgroup.
I’ll try nice/ionice – haven’t done it yet because I assumed the problem originates from memory management not from CPU or IO scheduling. But perhaps it will help.
Ulimit/cgroup both do the job, but only cgroup can be configured to assign more memory to certain user processes (WM, panel, task manager, terminal emulator, shell) than other processes of this or other users. At least in theory, because systemd uses cgroup for its own purposes and is categorizing tasks different way.
https://www.gen.cam.ac.uk/local/it/projects/ubuntu-cgroups-and-trying-to-stop-users-making-a-system-unusable
Last question. Is there a forum where users could ask kernel developers questions like these? Lkml should be one such place but it is focused on development and I don’t feel qualified discussing implementations.
ndrw,
I’ve also used python on a project only to learn that it has a lot of scalability and stability issues. I was surprised and disappointed. I understand that replacing it may not be viable for you. Although personally I would be inclined to reengineer anything that requires the OOM killer to function normally. It’s none of my business, but I’m curious how are your processes useful if you expect linux to kill them when they get too big?
Could you move some of state information into a database that manages memory better?
To be honest, I think not using swap can interfere with normal function of the system. Not because it technically must, but because the kernel developers made poor assumptions about swap being there. Linux relies on swap to overcommit and swappiness to keep sufficient cache. I agree with the suggestion to dial back over-commiting (IMHO the OS has no business approving requests that it can’t back) but unfortunately common unix primitives like forking rely on over-committing to run efficiently, so it might not be viable. If your large processes are forking/launching children, that could be a problem.
Regarding the cache, linux doesn’t provide the levers to do exactly what you want in kernel, as you already know. The developers assumed you’d have swap to free up cache. However I do believe the userspace tools can accomplish it.
If OOM killer is killing the wrong process, you can give hints to linux using “oom_adj”.
I’d still expect thrashing, but theoretically it could help prioritize the tasks that are most important to you.
I’ve never participated in the linux kernel mailing list, only sometimes in distro and project related forums.
I’ve gotten useful help from places like https://bugs.launchpad.net, YMMV.
ndrw: but I’m worried that may be interfere with normal function of the system.
Only problem I’ve ever encountered with running those (with sensible settings) is that some programs run with wine didn’t work (programs that allocate loads of memory “just in case they need it”, and not handling malloc giving them a negative answer). I’ve been using overcommit=2 and overcommit_ratio=95 (malloc say ok until you reach 95% of physical mem+swap) on my workstations and servers for years. YMMV, of course.
What is good about Linux compared to Windows, Linux is able to keep the system responsive even at low memory conditions longer than Windows. Which means in Windows people will notice a problem and kill some tasks. On Linux it’s more like a cliff. When you notice it, it’s probably to late.
Sounds to me like this could be easily solved by a desktop focused distribution, OOM killer and or Linux scheduler can be tweaked, to give certain processes priority. For example to prevent it killing certain other process.
As far as I know, there is no Linux scheduler tweak (other than setting up cgroup limits) that would prevent evicting all executable code from RAM. Any suggestions?
ndrw,
Just throwing some ideas out there…
ulimit – a poor man’s cgroup, but it might work in a pinch.
LXC linux containers ( technically uses cgroups under the hood)
https://linuxcontainers.org/lxc/introduction/
KVM – maybe overkill, but if you want isolation, this will get you there, haha
I’m seeing these options:
https://stackoverflow.com/a/578255
https://gist.github.com/JPvRiel/bcc5b20aac0c9cce6eefa6b88c125e03
The other problem is: This tends to happen in KDE much more than in any other desktop environment I know because it has this unbelievably memory consumption. In MATE or even in GNOME, you don’t get to the point where all the system’s RAM is consumed by your DE. That’s why this isn’t actually off-topic here, sadly.
KDE is strange. On one hand I like how it work, it is fast and responsive, has nice apps. One the other hand – UI is a mess, controll center in particular, plasmoids is a nice idea, but if you just want to work pretty much useless (at least for me), start menu tab switch by mouse movements is annoying (maybe it is possible to change that), multi-monitor configuration handling is subpar, it does not follow xdg and freedesktop in some cases or follow these specs in some strange way (user secret storage, mime handling).
There might be other issues as well, but these are that annoyed me the most.
Since yours seems to be the only on-topic comment, let me reply
Yeah, but they’re working on it and it’s getting better with each release.
Everything in Plasma desktop is a plasmoid, even panels, systray and the “start” menu, so not all of them are useless to you
Yes, you can change it, just right click on it and there’s a configure option. Honestly, I don’t know why “switch by mouse hover” is the default, I also think it’s annoying.
BTW, if you first “Unlock graphical elements” (sorry if that’s not the right phrase, my desktop is not in English) and you right click on the K menu again, there’s a “Show alternatives” where you can switch to a differen menu style.
This is also true with some other plasmoids, like the clock.
I haven’t had issues with multimonitor myself but my use case is very simple, I just sometimes connect my laptop to my TV via HDMI to watch a video/movie/series on a bigger screen. Using an Intel GPU might help my usecase (open source, upstream drivers and all that).
But yes, I see a lot of reports abut issues with different multi monitor setups from different users.
I really don’t know what you mean here, sorry.
Of course, and we all have our own. I love Plasma, been using it for ages, but there are still some paper cuts that bug me. Nothing fatal or I’d be using another DE by now
Cheers!
One thing that really surprised me in the original comment was the claim that KDE has “fast apps”. I don’t know any other UNIX desktop that is more hoggish, more ressource hungry and as a consequence, slower than KDE. I am in no way principally against it. I am on and off for many years now, waiting for the moment where KDE has become stable, polished and frugal enough to be used as my daily driver, but I always give up after a few days. I can’t afford in my daily work having to wait for hours for my mail to be updated. Why does Kmail take literally hours for stuff that Thunderbird completes in seconds? Why do I get notified every few minutes that “akondadi_imap_resource_xxx” has crashed and needs to be restarted? or that Baloo has crashed? I know that KDE offers more functionality than MATE and better configurability than GNOME but it its current stage, I would never dare to install it on the computers of friends and colleagues and tell them to work with it. I am well aware, that the consistency and stability of MATE come at a price, but in my daily work, that’s a price I am willing to pay, when this means that I am actually able to get things done rather than fighting with the system.
In theory this all sounds great. But when I try KDE/Plasma from time to time, I always give up again after a few days and revert to either GNOME or MATE, mainly for two reasons:
– It is unbelievably hoggish, even compared to GNOME, much more when compared to MATE. When I log into the desktop, it takes minutes until it is fully built up. And while it is building up, I get all kinds of error messages and popups about Kalarm not having a writable kalendar (why to I need kalarm, when I have korganizer?), then I have to enter my GPG key, because for some reason, my wallet is encrypted by it and then other things pop up which I thought I had long purged from my session (I have emptied everything in .config which looks like autostart or session)
– It is quite unstable compared to the other two platforms, I constantly see crashes, many of them related to Akonadi or baloo.
I am not a programmer, so I can only guess that Akonadi is to blame for most of the problems. E.g. with Kmail. Compared to Thunderbird, it is unbelievably slow, it takes hours to check my mail, whereas TB completes the job in seconds. Often it doesn’t show the newest mail for hours, when it has long become visible in Thunderbird. I have a vague recollection that it wasn’t like this before the move to Akonadi.
But another issue that I continue to have with KDE after many years of testing it (started around 2001) is that I still find it visually unappealing, as compared to GNOME. It seems that the themes/style used for GNOME have been worked out with much more attention to detail, and also to actual usability. In one of the styles that ship with Plasma, I found that KMail suddenly showed black text before brown background. Totally unusable. I still have to find a still where KMail’s display of folder summaries and of individual emails is as clear and friendly to the eye as thunderbird’s is out of the box. So really, the geek in me loves KDE, but the part of me which has to do productive work hates it.