It’s a compelling story and on the surface makes a lot of sense. Carefully curated software patches applied to a known Linux kernel, frozen at a specific release, would obviously seem to be preferable to the random walk of an upstream open source Linux project. But is it true? Is there data to support this ?
After a lot of hard work and data analysis by my CIQ kernel engineering colleagues Ronnie Sahlberg and Jonathan Maple, we finally have an answer to this question. It’s no. The data shows that “frozen” vendor Linux kernels, created by branching off a release point and then using a team of engineers to select specific patches to back-port to that branch, are buggier than the upstream “stable” Linux kernel created by Greg Kroah-Hartman.
Jeremy Allison at CIQ
I mean, it kind of makes sense. The full whitepaper is available, too.
I always seek to use & support the latest linux mainline kernel (possibly with some patches as needed). On x86 this goes very well! On ARM though….it’s a nightmare and so many of us end up being stuck with poorly supported branches (including android, random embedded devices, just ARM devices in general).
Actually there is a stronger reason for companies sticking to LTS versions of linux distributions which usually have such frozen kernels. Many banking institutions for instance use HSMs – dedicated hardware for which you need kernel drivers. Having frozen version guarantees that the given kernel driver would work without a need of recompiling (which would mean reaching out to vendor who in turn would recompile, then test scrupulously and go thru the certification process for a given kernel).
ondrasek,
I’m not familiar with those devices, but kernel instability is a notorious long standing problem with linux and you’re right that it definitely creates support issues of it’s own. It creates a lot of developer churn that burns us out and my opinion is that unstable ABIs are more harmful than beneficial. The linux ABI/API breakages usually aren’t very complex to fix, but it becomes a repetitive tedious chore and if you don’t keep up you end up being incompatible with mainline, which sucks. While I don’t program for google kernel, I am glad they are insisting on more stable kernels. It occurs to me now that it’s a bit antithetical to this article.
As long as you include bug fixes with security fixes in your data, this is meaningless. There are FAR more bugs fixes that are not security related, and some of them include security flaws.
Nice try hack vendor “whitepaper”.
This has been discussed when the kernel folks became their own CVE authority, and basically said : every bug is a potential CVE. The rationale is that no one person can think of all the usages a specific bug might fall into, so if you want to maintain an old version, and want it to be secure, you better assess ALL bugfix patches, because hidden in them might be something not identified as a security issue by authors/reporters at the time of fixing.
I did not know that, fair enough. Ok I retract the hack comment. But this seems like extremely conventional wisdom. I still believe in the argument of simplicity and monitoring to prevent security issues, over mindless updates.
I’m a person who uses Arch and likes it, but perhaps for not much longer due to the documented introduction of vulnerabilities that were not introduced in release based distros.
Found it: https://youtu.be/HeeoTE9jLjM?t=1239 (Talk by GKH) , just so you know I’m not making this up
The methodology here seems very flawed. It’s counting the number of known bugs (not necessarily security vulnerabilities) that are not fixed in an older kernel. It does not (and cannot reliably) count the number of new bugs introduced into newer kernels. Those bugs can only be counted once they have been discovered and fixed. At any point in time, a new kernel will have a small set of known problems, which grows over time as unknown things become known things.
From the point of view of a system operator, it’s not even sufficient to compare the number of bugs in a vendor kernel to the number in the current kernel. The comparison is the number of bugs that the operator would have encountered by upgrading to each intermediate kernel along the way.
People’s lived experience is they will have fewer problems by taking a point-in-time snapshot and fixing serious issues. That’s because logically speaking, the number of bugs introduced has to approximately equal the number fixed. Counting the fixes (which are known) without counting the introductions (that aren’t known yet) amounts to a numerical game intended to gaslight people’s actual experience.
You explained very well my take on it.
The post was supposed to be direct to malxau.
I would expect a little bit more critical thinking prior publishing a shameless marketing plug from the rocky company, whose value is “we want to be Red Hat, not contributing to the ecosystem, bashing who write the code you rely on, but not too much because without Red Hat there would not be most of code we use, but want to make money out of it so need to invent something to be different”.
Also, agree with malxau, the analysis in flawed in many ways, starting from the fact that if that’s a “paper” (lol) it should be noticed that on of the 3 authors recently swapped from RH to CIQ, so it should be mentioned since pose a serious conflict of interest [0][1]
The mentioned AOSP kernels are distributed with a long list of out-of-tree functionalities, and despite being build on a long term release, will be end of life after only 5 years max: RHEL last 10y.
The paper also dont say that if you analyze the commit with those words, math don’t match because not all the “missing fixes” are really missing, for the simple reason that most of those are not needed, because rh kernel is not just that upstream kernel release, but heavily patched, improved and stabilized.
[0] https://www.linkedin.com/in/ronnie-sahlberg-b737b8295/
[1] https://bugzilla.redhat.com/show_bug.cgi?id=2094785#c3