Cook details what he means in his Google Security Blog, “Linux Kernel Security Done Right.” Cook wrote, “the Linux kernel runs well: when driving down the highway, you’re not sprayed in the face with oil and gasoline, and you quickly get where you want to go. However, in the face of failure, the car may end up on fire, flying off a cliff.”
This is true. With great power comes great responsibility. You can do almost anything with Linux, but you can also completely ruin your Linux system with a single command. And, that’s only the ultra-powerful commands, which you should only use with the greatest of caution.
Cook is referring to the other, far less visible security problems buried deep in Linux. As Cook said, while Linux enables us to do amazing things, “What’s still missing, though, is sufficient focus to make sure that Linux fails well too. There’s a strong link between code robustness and security: making it harder for any bugs to manifest makes it harder for security flaws to manifest. But that’s not the end of the story. When flaws do manifest, it’s important to handle them effectively.”
That isn’t easy, as Cook points out. Linux is written in C, which means “it will continue to have a long tail of associated problems. Linux must be designed to take proactive steps to defend itself from its own risks. Cars have seat belts not because we want to crash, but because it is guaranteed to happen sometimes.”
While moving forward, some of Linux will be written in the far safer Rust, C will remain the foundation of Linux for at least another generation. That means Cook continued, “though everyone wants a safe kernel running on their computer, phone, car, or interplanetary helicopter, not everyone is in a position to do something about it. Upstream kernel developers can fix bugs but have no control over what downstream vendors incorporate into their products. End-users choose their products but don’t usually have control over what bugs are fixed or what kernel is used. Ultimately, vendors are responsible for keeping their product’s kernels safe.”
This is difficult. Cook observed, “the stable kernel releases (“bug fixes only”) each contain close to 100 new fixes per week. Faced with this high rate of change, a vendor can choose to ignore all the fixes, pick out only ‘important’ fixes, or face the daunting task of taking everything.”
Believe it or not, many vendors, especially in the Internet of Things (IoT), choose not to fix anything. Sure, they could do it. Several years ago, Linus Torvalds, Linux’s creator, pointed out that “in theory, open-source [IoT devices] can be patched. In practice, vendors get in the way.”
Cook remarked, with malware here, botnets there, and state attackers everywhere, vendors certainly should protect their devices, but, all too often, they don’t. “Unfortunately, this is the very common stance of vendors who see their devices as just a physical product instead of a hybrid product/service that must be regularly updated.”
Linux distributors, however, aren’t as neglectful. They tend to “‘cherry-pick only the ‘important’ fixes. But what constitutes ‘important’ or even relevant? Just determining whether to implement a fix takes developer time.”
It hasn’t helped any that Linus Torvalds has sometimes made light of security issues. For example, in 2017, Torvalds dismissed some security developers’ [as] “f-cking morons.” He didn’t mean to put all security developers in the same basket, but his colorful language set the tone for too many Linux developers. So it was that David A. Wheeler, The Linux Foundation’s director of open-source supply chain security, said in the Report on the 2020 FOSS Contributor Survey that “it is clear from the 2020 findings that we need to take steps to improve security without overburdening contributors.”
In Linux distributor circles, Cook continued, “The prevailing wisdom has been to choose vulnerabilities to fix based on the Mitre Common Vulnerabilities and Exposures (CVE) list.” But this is based on the faulty assumption that “all-important flaws (and therefore fixes) would have an associated CVE.” They don’t. Therefore,
In short, if you rely on cherry-picking CVEs, you’re “all but guaranteed to miss important vulnerabilities that others are actively fixing, which is almost worse than doing nothing since it creates the illusion that security updates are being appropriately handled.”
Cook continued:
How can software vendors possibly do that? Cook considers it a painful, but in the end, “simple resource allocation problem, and is more easily accomplished than might be imagined: downstream redundancy can be moved into greater upstream collaboration.”
Performing continuous kernel updates (major or stable) understandably faces enormous resistance within an organization due to fear of regressions – will the update break the product? The answer is usually that a vendor doesn’t know, or that the update frequency is shorter than their time needed for testing. But the problem with updating is not that the kernel might cause regressions; it’s that vendors don’t have sufficient test coverage and automation to know the answer. Testing must take priority over individual fixes.
What does that mean? Cook explained:
He makes an excellent point. I know dozens of developers who spend their days porting changes from the stable kernel into their distribution-specific kernels. It’s useful work, but Cook’s right; much of it consists of duplicating efforts.
In addition, Cook suggests that “Beyond just squashing bugs after the fact, more focus on upstream code review will help stem the tide of their introduction in the first place, with benefits extending beyond just the immediate bugs caught. Capable code review bandwidth is a limited resource. Without enough people dedicated to upstream code review and subsystem maintenance tasks, the entire kernel development process bottlenecks.”
This is a known problem. One major reason why the University of Minnesota playing security games with the Linux kernel developers annoyed the programmers so much was that it wasted their time. And, as Greg Kroah-Hartmann, the Linux kernel maintainer for the stable branch, tartly observed, “Linux kernel developers do not like being experimented on; we have enough real work to do.”
Amen. The Linux kernel maintainers must oversee hundreds, even thousands, of code updates a week. As Cook remarked, “long-term Linux robustness depends on developers, but especially on effective kernel maintainers. … Maintainers are built not only from their depth of knowledge of a subsystem’s technology but also from their experience with the mentorship of other developers and code reviews. Training new reviewers must become the norm, motivated by making the upstream review part of the job. Today’s reviewers become tomorrow’s maintainers. If each major kernel subsystem gained four more dedicated maintainers, we could double productivity.”
Besides simply adding more reviewers and maintainers, Cook also thinks, “improving Linux’s development workflow is critical to expanding everyone’s ability to contribute. Linux’s ’email only’ workflow is showing its age. Still, the upstream development of more automated patch tracking, continuous integration, fuzzing, coverage, and testing will make the development process significantly more efficient.”
And, as DevOps continuous integration and delivery (CI/CD) users know, shifting testing into the early stages of development is much more efficient. Cook observed, “it’s more effective to test during development. When tests are performed against unreleased kernel versions (e.g. Linux-next) and reported upstream, developers get immediate feedback about bugs. Fixes can be developed before a flaw is ever actually released; it’s always easier to fix a bug earlier than later.”
But, there’s still more to be done. Cook believes we “need to proactively eliminate entire classes of flaws, so developers cannot introduce these types of bugs ever again. Why fix the same kind of security vulnerability 10 times a year when we can stop it from ever appearing again?”
This is already being done in the Linux kernel. For example, “Over the last few years, various fragile language features and kernel APIs have been eliminated or replaced (e.g. VLAs, switch fallthrough, addr_limit). However, there is still plenty more work to be done. One of the most time-consuming aspects has been the refactoring involved in making these usually invasive and context-sensitive changes across Linux’s 25 million lines of code.”
It’s not just the code that needs cleaning of inherent security problems. Cook wants “the compiler and toolchain … to grow more defensive features (e.g. variable zeroing, CFI, sanitizers). With the toolchain technically “outside” the kernel, its development effort is often inappropriately overlooked and underinvested. Code safety burdens need to be shifted as much as possible to the toolchain, freeing humans to work in other areas. On the most progressive front, we must make sure Linux can be written in memory-safe languages like Rust.”
So, what can you do to help this process? Cook proclaimed you shouldn’t wait for another minute:
Specifically, Cook concluded, “Based on our most conservative estimates, the Linux kernel and its toolchains are currently underinvested by at least 100 engineers, so it’s up to everyone to bring their developer talent together upstream. This is the only solution that will ensure a balance of security at reasonable long-term cost.”
So are you ready for the challenge? I hope so. Linux is far too important across all of the technology for us not to do our best to protect it and harden its security.
Related Stories:
Patch now: Linux file system security hole, dubbed Sequoia, can take over systems.Nasty Linux systemd security bug revealed.Major Linux RPM problem uncovered.