After 3 months, Linus has released Linux 2.6.23. This version includes the new and shiny CFS process scheduler, a simpler read-ahead mechanism, the lguest ‘Linux-on-Linux’ paravirtualization hypervisor, XEN guest support, KVM smp guest support, variable process argument length, SLUB is now the default slab allocator, SELinux protection for exploiting null dereferences using mmap, XFS and ext4 improvements, PPP over L2TP support, the ‘lumpy’ reclaim algorithm, a userspace driver framework, the O_CLOEXEC file descriptor flag, splice improvements, a new fallocate() syscall, lock statistics, support for multiqueue network devices, various new drivers and many other minor features and fixes.
Just looking at the list of new features and changes this seems to be quite a significant update.
Nice one.
For some of us that like to play with the kernel, would be sweet to have a tool to make this kind of selection. I know that we don’t compile the kernel every day (or every month), but would be great to have an automatic method to do that. It would easy the hard disk space and compiling time, and much more important, generate a lean kernel (even if almost everything goes as module).
“make oldconfig” works fine for me…
Yes, it does if you don’t change your computers boards or if you play with the kernel on just one. I was not talking about these cases.
I’m not sure if I understand. You want a tool that would examine a system and produce the minimal kernel config that would cover all the system needs?
First of all, it’s a chicken and egg problem. I’m not sure if you can discover certain system capabilities if the kernel doesn’t already offer support for that capability. So you’d need a “full” featured kernel to produce the leaner one.
Second, some hardware is pluggable. USB printers, for instance. There’s no way for a diagnostics tool to realise you need USB printer support unless you have one plugged in and turned on at examination time and if, again, you don’t already have support for that.
So I’m afraid that in the end the human is needed to project everything that would be needed in a kernel.
I agree there has to be a full blown kernel the first time the system is runnning. Thereafter a simple parse from lsmod
lsmod:
——-
Module Size Used by
vmnet 38416 13
vmblock 15520 3
vmmon 929636 0
nvidia 6211568 24
i2c_core 21632 1 nvidia
snd_intel8x0 29852 1
snd_bt87x 14792 0
snd_ac97_codec 89632 1 snd_intel8x0
snd_pcm 63620 3 snd_intel8x0,snd_bt87x,snd_ac97_codec
snd_timer 20100 1 snd_pcm
snd 37060 7 snd_intel8x0,snd_bt87x,snd_ac97_codec,snd_pcm,snd_timer
snd_page_alloc 11272 3 snd_intel8x0,snd_bt87x,snd_pcm
ac97_bus 6016 1 snd_ac97_codec
———-
should be enough to build a kernel with a leaner config.
So I’m afraid that in the end the human is needed to project everything that would be needed in a kernel.
Yeah a human to code.
And who’s going to load just the modules that are actually needed by the machine? Modules get loaded automatically when a program attempts to access the relevant /dev devices. Some of them need configuration (module parameters and device-module aliasing) to work. Who’s going to do that? Teoretically you could design software that can probe all the /dev entries known to mankind and still not obtain 100% accurate results.
And in order to do that you’d need to do what distro’s already do (put together a full kernel), except instead of leaving it there and using what you need, you attempt to conduct a complicated hardware investigation, error prone, with the end result of deleting the full kernel you went to the trouble of producing and using a smaller subset of it.
And for what? You’re losing perspective. Just to obtain a more lightweight kernel? Why? What distro’s do today makes much more sense. Provide the full kernel and make software that looks at the hardware and uses the appropriate drivers.
I honestly don’t see what a lightweight kernel would accomplish. The only time when kernel size matters is on small boot devices that simply don’t have the space, but there are techniques to go around that.
This is not how things work on linux, at least, not since udev started to be used, it was partially true on devfs era. Today almost all computers and devices come in one or another PnP flavor and they generate the appropriate event about their existence (and luckily, the kernel will recognize it). Your assertion about checking each /dev/* is flawed on these days. See the udev faq (small fraction follow):
“Q: Oh come on, pretty please. It can’t be that hard to do.
A: Such a functionality isn’t needed on a properly configured system. All devices present on the system should generate hotplug events, loading the appropriate driver, and udev will notice and create the appropriate device node. If you don’t want to keep all drivers for your hardware in memory, then use something else to manage your modules (scripts, modules.conf, etc.) This is not a task for udev.
Q: But I love that feature of devfs, please?
A: The devfs approach caused a lot of spurious modprobe attempts as programs probed to see if devices were present or not. Every probe attempt created a process to run modprobe, almost all of which were spurious.”
In other words, YES, it would be possible to streamline your kernel in a bit more sane way today (think about all the options you must turn off and also, what you should not touch, because, as you know, there are some interdependencies between some modules), without have to fight against kernel building failures.
Is the effort worth on face of all hassling? For some of us YES.
in essence, just build the kernel with everything non essential to boot as modules, and your kernel will be as small as required, since only stuff you use will be loaded…
and with this in mind, there are 2 choices:
1(for the person not caring about disk space): build ALL shit as modules, since we know only required stuff will be loaded
2(for people caring about disk space): build only modules you actually want
My wish was not to cut totally the human interaction on kernel configuration, just to be a little more sane on the amount of information we, humans, need to process when setting a basically new machine.
Also, realize that when you setup a new computer, the kernel that cames with your distro already has almost everything plus kitchen sink compiled.
So, basically what I was thinking about is this:
– got a new or updated computer;
– install the distro kernel;
– run a program to analyze your hardware and cut things you are not going to use (could have a guess level) that generates a valid minimum (or almost) .config;
– run make menuconfig (or even oldconfig) to do the tweaks.
Realize that, as the kernel keep getting more and more drivers and settings, the time needed to tweak it get bigger and bigger.
Also, such tool can help when you want a lean kernel for a particular machine you are not going to touch anymore (or almost).
Note that this is the paravirt_ops implementation of Xen and not the one that supports all of the great features. The paravirt_ops version is actually a good bit stripped down compared to the one you can get in kernels downloaded from xensource.com.
Is the CFS process scheduler going to make Linux smoother for desktop use. Occasionally I will notice my Totem media player stutters when I have massive harddisk IO. I tried a similar process on Windows XP and Windows media player didn’t stutter. I felt like my trusty Linux kernel had let me down. I would love to be able to have the option to optimize the kernel for non-server use when I am not using Apache and MySql in the background to test applications.
Edited 2007-10-09 23:30
CFS is designed around preventing this kind of thing occuring by making sure nothing hogs the CPU. Fair scheduling is a concept that is not new – FreeBSD famously uses it as did Con Kolivas’ SD/RDSL schedulers. I believe Windows prioritises certain tasks to avoid such stutters rather than being fair.
same as the old kernel did. But on modern hardware, such automatic priority adjustment isn’t needed, and it can lead to stalls…
CFS might even help now. And improvements to read-ahead and the IO scheduler which I both hope to see someday
Anyway, this is imho a nice kernel release.
Linux has always had these scheduler problems. CPU has probably gotten better but when a program uses a lot of IO, the rest of the system is almost dead. I really hope this has improved too now.
only time i have managed to get any type of stutter here is when i get a checksum attack on a torrent. and then only if the stuff im playing is stored on the same drive as the one the torrent is being saved to.
i have run kernel compiles and more without any notice, and thats on the old scheduler.
Windows uses multmedia timers for music playback, which allow for higher-priority scheduling. By default on Linux non-root user processes don’t get the capabilities to change scheduling priorities and this is what causes the stuttering. The caps aren’t given because they could be abused to the detriment of other users.
Personally I am waiting for Linux developers to come up with the idea of dividing CPU usage among users (with per-user priorities too) and within the per-user allocated time, scheduling per-process.
I found that (at least for me) with ubuntu it was kjournald being niced to -5 that caused stutter on IO; changing it to 0 fixed things, and didn’t seem to catch anything else on fire.
I sit and watch one of my processor cores do nothing while I’m beatin’ the hell out of the other one. I compile KDE4, same result. Kernel is multicore aware, but how many apps are OpenMP leveraged?
Concurrency? What’s the point of all these processors if you aren’t seeing them lit up?
If there are any tricks in GCC that turns on both to build Inkscape, Scribus, OpenOffice, etc., I’d like to know.
I want my system utilized.
I’m surprised you’d say that. Linux seems to go out of its way to spread load across processors, even if that means moving a single threaded process back and forth between CPUs.
For example, on my machine here’s some sample top output while listening to mp3s and loading a web page under compiz fusion:
top – 22:27:59 up 1 day, 23:40, 11 users, load average: 0.29, 0.22, 0.23
Tasks: 135 total, 3 running, 132 sleeping, 0 stopped, 0 zombie
Cpu0 : 0.9%us, 0.4%sy, 0.0%ni, 97.6%id, 0.3%wa, 0.0%hi, 0.7%si, 0.0%st
Cpu1 : 1.3%us, 0.6%sy, 0.0%ni, 98.0%id, 0.2%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 1.0%us, 0.4%sy, 0.0%ni, 98.1%id, 0.5%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 0.9%us, 0.5%sy, 0.0%ni, 98.4%id, 0.2%wa, 0.0%hi, 0.0%si, 0.0%st
notice how no CPU thread (I have 2 processors, each hyperthreaded) is at 100% idle.
you have 135 processes running – not just mp3 and a browser.
That said, switching cpu all the time is a BAD thing.
top shows all tasks.
even windows have multiple tasks running in the background all the time.
That said, switching cpu all the time is a BAD thing.
man taskset
Edited 2007-10-10 07:56
mini:~ matzon$ man taskset
No manual entry for taskset
that said, I dont think that Mr & Ms Doe will be setting processor affinity anytime soon!
My point was that the OS should try to keep a thread on the same cpu as much as possible.
Edited 2007-10-10 08:04
“My point was that the OS should try to keep a thread on the same cpu as much as possible.”
It does try to keep a process running on the same CPU for as long as possible. But it does that for _all_ processes.
CPU affinity is always being recalculated, and when you see a process jumping from one CPU to the next, remember that between those jumps there were thousands of context switches where that process was kept on the same CPU. It just happened that, at that time, the decision was made to boost the affinity of another process, because that made more sense.
Besides, the kernel is aware of Hyperthreading/dual-core issues, and knows that the penalty to move a process between cores in the same physical CPU is _much_ less expensive than moving it across physical CPUs, because cores share caches (if not L2, at least an L3 cache).
Now, if you want the best performance from your system, moving processes across CPUs is the best option. If, on the other hand, you want the best performance from a particular process — or a predictable response time/realtime — then the best option is to keep a process on the same CPU without thinking how that hits other processes, and you can use taskset for that.
It doesn’t do it all the time – thread rebalancing is done something like every half second iirc.
It does force the newly moved thread to use a cold cache, but on the other hand, having one cpu doing nothing while another one is oversubscribed would probably be a much bigger waste.
I think it’s safe to take for granted that people are already aware of the processes relating to X11, network (et al) daemons, and so on and so forth. Plus you can garentee that the mp3 player and web browser are using more than 1 task each.
I agree completely, I’m surprised more people didn’t pick up on that. Its one thing to have a not-so-intensive processed moved out of the way for a process that will max out the CPU…. but I’ve seen cases where there’s just one really intensive process, and linux is moving it back and forth between processors! It makes no sense, not only does it take unnecessary time to move all of the register data over, but it totally kills the CPU cache! These days CPUs are more and more reliant on cache, just a few years ago CPUs had 256, maybe 512 kilobytes of cache. Now they have at least 2 megs. Hopefully this new CFS process scheduler will reduce this strange behavior.
There isn’t a big reason to make a compiler use multiple cores, because they have to process so many files it’s very simple to simply run multiple instances of gcc, 1 for each file. make -j 3 will attempt to keep 3 cores busy, but YMMV – some makefiles will break because they don’t expect this.
KDE 4 breaks, Inkscape doesn’t. Ghostscript 8.60 breaks, and more. It’s a crapshoot.
I’m running Debian Sid Linux 2.6.22-2-amd64 x86_64
Typically if you’re doing a compile, you don’t want the compiler split between CPUs (OpenMP). You need to just run two different instances of the compiler on different compilation units. I think the option is “make -j2” where 2 is the number of CPUs you have. I could be wrong, but look up “parallel compilation” in the cmake docs. (Yes, despite my recent posting history, I was a Gentoo ricer at one point ).
It’s actually an (n + 1) fork and with a Pentium D 940 I do a make -j3 and all I get are 3 instances using the first core.
Either GCC 4.x.x doesn’t work as billed or I’m missing something else.
It’s not gcc that deals with parallel building there, it’s make. And it does so by simply launching several processes at the same time, then it’s up to the kernel to spread them accross the CPUs.
If it doesn’t work, I don’t know – perhaps double check your kernel’s SMP build settings?
When compiling use “make -jN” where N is amount of cores you have. I believe SCons also takes “-j” switch.
… And how many new regressions, I wonder?
Use a vendor kernel.
>… And how many new regressions, I wonder?
There has been big changes in this kernel (scheduler replacement, rewrite of the x86 asm setup in C, etc) so yes regressions are more likely to happen, but *nobody* is forcing you to upgrade..
Use Linux 1.2.13. It work great.
BIOS flag for threading was disabled.
Works as billed now.
Well I think CFS needs more time. It’s really young scheduler but feels very good already. There’s _a lot_ patches pending to 2.6.24 release so lot’s of improvements are coming up.
Edited 2007-10-10 05:23
Yes, SATA support is improved!
Netive queing, hotplug on promise controllers, wonderful
Do Ingo merge the concept of RFC?
I guess you refer to Roman Zippel’s scheduler (proof-of-concept, mostly).
Ingo took a couple of ideas from it and wrote his own patches (since Roman didn’t submit any patch himself). But those changes are queued for 2.6.24.
He prefer KFC.
Can’t wait to try it out sometime. I wonder which distros will have this kernel version in the near future?
Edited 2007-10-10 08:26
CFS is available in opensuse 10.3 right now.
That’s kind of funny, since that’s what I have installed x)
I guess I should go and seriously stress-test my system then
Edited 2007-10-10 12:49
RE: CFS… cool
“CFS is available in opensuse 10.3 right now.”
Hey wait a minute… OpenSUSE 10.3 has 2.6.22 and that kernel does not have CFS!
Edited 2007-10-10 13:58
Yes, it does through a patch, as noted on the article on:
“http://people.redhat.com/mingo/cfs-scheduler/“
Oh, okay then.
does not matter,
2.6.23 is the first kernel that has cfs included, but patches are known since 2.6.21
So, suse simply has 2.6.22 + CFSv20.x
I hope though that suse implemented latest cfs as versions cfs 20.x caused serious problems with wireles + WPA2, also playing CD with amarok messed sound.
These seems to be fixed in cfs provided with 2.6.23
some issues still are not resolved though (e.g. amarok + CD play + touchpad = pointer slow. Same combination with standard mouse works)
In Yast, go to system->System Settings and enable it if it isn’t already.
CFS is available in FC6, F7 (last kernel update).
F8 will have CFS.
Edit : some s/FC/F/
Edited 2007-10-10 17:33
Fedora 8 (out in a few weeks) will have 2.6.23.
Mandriva 2008 has 2.6.22, but with CFS backported.
Do not forget to try Gentoo or Arch
With every release, I scour the changelogs for fixes to sky2 (the driver for my onboard ethernet – flaky to the point that it stops working). In this changelog I see
commit f350339cbd0e8ed7751f98f0ef60cb3a0d410eda
Author: Stephen Hemminger <[email protected]>
Date: Tue Aug 21 11:10:22 2007 -0700
sky2: don’t clear phy power bits
There are special PHY settings available on Yukon EC-U chip that
should not get cleared. This should solve mysterious errors on some
motherboards (like Gigabyte DS-3).
That’s my mobo; hopefully the thing is working peachy now. The wireless situation is bad enough (though my intel 2200bg works great). Having a flaky wired network connection was really annoying. Here comes another reboot (kernel upgrade )
Edited 2007-10-10 08:52 UTC
I find that the linux kernel has matured to such a level that it meets my every day desktop and server needs.
The above changes in this kernel are proof that these updates aren’t exactly ground breaking, most of the problems have been figured out.
I do look forward to more focused desktop experience (such as Con’s scheduler) and a greater focus on the development of applications and desktops.
With the release of KDE 4 around the corner its an exciting time for the applications world on Linux.
Seeing as there are more users than hackers now, anyone care to explain what these features mean for users?
“a userspace driver framework”
Now, I don’t know anything about this, or where to find information on the development, but does this seem like big news to anyone else? Weren’t userspace drivers one of the contentions of the “microkernel” crowd in the debate between Tanenbaum (of Minix) and Linus?
If “a userspace driver framework” is what it sounds like, then that could push non-gpl software out of the kernel, bring some stability to the kernel, and even do the Minix thing where faulty drivers are just restarted when they crash (the “healing OS” or whatever). Or is this not that kind of framework?
Here’s a link to the description:
http://kernelnewbies.org/Linux_2_6_23#head-487ea010dd33e20f8b369dd2…
And here’s more info:
http://lwn.net/Articles/232575/
reading it now
When Ingo Molnar released the CFS patches for earlier Kernel versions I was bold enough to write to the openSUSE kernel mailing list and ask if there’s a chance that the patches would be merged. The answer was negative. It was explained to me that the distribution was already in feature freeze at that time and that the SuSE kernel maintainers didn’t trust the CFS yet due to some alleged performance regressions.
So, did something change in the time between my mail and release or are people here suffering from wishful thinking?
@Erunno,
A lot has changed in CFS. Ingo and crew have worked very diligently to improve CFS to the point where is has very few if any regressions. Once you try the new CFS code it is very unlikely you will want to go back to what ever you were using. It is not perfect (yet) , but Ingo and crew are working *DANG* hard to make it as close to perfect as possible.
Happy Computing
That’s probably why one of the SuSE kernel maintainers said “Let it cook some more” to me so I do not doubt that CFS has improved in the meantime. I was just curious if CFS was merged into the 10.3 kernel as some people here claim.