GSoC 2018 Reports: Kernel Undefined Behavior Sanitizer, Part 1
Prepared by Harry Pantazis (IRC:luserx0, Mail:luserx0 AT gmail DOT com) as part of GSoC 2018.
For GSoC '18, I'm working on the Kernel Undefined Behavior Sanitizer (KUBSAN) project for the integration of Undefined Behavior regression testing on the amd64 kernel. This article summarizes what has been done up to this point (Phase 1 Evaluation), future goals and a brief introduction to Undefined Behavior.
So, first things first, let's get started. The mailing list project presentation
What is Undefined Behavior?
For Turing-complete languages we cannot reliably decide offline whether a program has the potential to execute an error; we just have to run it and see. DUH!
Undefined Behavior in C is basically what the ANSI standard leaves unexplained. Code containing Undefined Behavior is ANSI C compatible. It follows all the rules explained in the standard and causes real trouble. In programming terms, it involves all the possible functionalities C code can run. It's whatever the compiler doesn't moan about, but when run it causes run-time bugs, hard to locate.
The C FAQ defines "Undefined Behavior" like this:
Anything at all can happen; the Standard imposes no requirements. The program may fail to compile, or it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.
A brief explanation of what is classifed as UB and some real case scenarios
A great blog post explaining more than mere mortals might need
The important and scary thing to realize is that just about *any* optimization based on undefined behavior can start being triggered on buggy code at any time in the future. Inlining, loop unrolling, memory promotion and other optimizations will and a significant part of their reason for existing is to expose secondary optimizations like the ones above.
Solution: Make a UB Sanitizer
What we can do to find undefined behavior errors in our code, is creating a Sanitizer. Hopefully both CLang and GCC have taken care of such "dream" tools, covering the majority of undefined behavior cases in a very nice manner. They allow us to simply parse the -fsanitize=undefined option when we build our code and the compiler "spits out" simple warnings for us to see. The CLang supported flags (same as GCC's but they don't have such extensive explanation docs).
Adding ATF Tests for Userland UBSan
This was my first deliverable for the integration of KUBSan. The concept was to include tests causing simple C programs to portray Undefined Behavior, such as overflows, erroneous shifting and out of bounds accessing of arrays (VLAs actually). The ATF framework is not a real "sweetheart" to learn, so it took me more than expected to complete this preliminary step to the project. The good news was that I had enough time to understand Undefined Behavior to a suave depth and make my extensive research for ideas (and not only).
The initial commit of the tests cleaned up and submitted by my mentor Kamil Rytarowski.
Addition of Example Kernel Module Panic_String
Next on our roadmap was the understanding of NetBSD's loadable kernel modules. For this, I created a kernel module parsing a string from a device named /dev/panic and calling the kernel panic(9) with it as argument, after syncing the system. This took a long time, but in the process I had the priviledge of reading FreeBSD Device Drivers: A Guide for the Intrepid, which unfortunatelyfor our foundation is the only book in close resemblance to our kernel module infrastructure.
The panic_string module commit revised, corrected and uploaded by Kamil.
Compiling the kernel with -fsanitize=undefined
Compiled the kernel with the aforementioned option to catch UB bugs. We got one. Only one! Which was reported to the tech-kern mailing list in this Thread.
Adding the option to compile the Kernel with KUBSan
At last what was our last deliverable for GSoC's first evaluation, was getting the amd64 kernel to boot with the KUBSan option enabled. This was a trick. We only needed the appropriate dummy functions, so we could use them as symbols in the linking process of a kernel build. At first I created KUBSan as a loadable kernel module, but the chaotic structure of our codebase was to much for me. This means that I searched for 4 whole days a way to link the exported symbols to the kernel build and was unsuccessful :(. But everything happens for a reason, because that one failure ignited me to search for all the available UBSan implementations and I was able to locate the initial support of the KUBSan functionality for: Linux, Chromium and FreeBSD. Which in turn, made me realise that the module was not necessary, since I could include the KUBSan functiuonality to our /sys infrastructure. Which I did and which was successful and which allowed me to boot and run a fully KUBSan-ed kernel.
It hasn't been uploaded to upstream yet, but you can have a look at my local (and totally messy) fork.
Summary and Future Goals
This first month of GSoC has been a great experience. Last year I participated again with project trying to "revamp" support for Scheme R7RS in the Eclipse IDE (we later tried to create a Kawa-Scheme Language Server-LSP, but that's a sad story) and my overall experience was really bad (I had to quit mid-July). This year we are doing things in much friendlier, cooperative and result-producing manner. I'm really happy about that.
A brief summary is that: the Kernel booted with KUBSan and I'm in knowledge of all the tools needed to extent that functionality. That's of ye need to know up to this point.
- Future goals include:
- Making a full implementation of KUBSan, with an edge on surpassing other existing implementations,
- Clear up any license issues,
- Finish the amd64 implementation and switch focus to the i386,
- Spead the NetBSD hype
At last I would like to deliver a huge thanks to my mentors Kamil and Christos for their advices and help with the project, but mostly for their incredible behavior towards the problems I went through this past month. Much love :)
- Further Reading:
- Linux Kernel UBSan docs
- A nice in-depth article on UBSan for C/C++
- My Personal Blogsite