Linux Kernel Basics

Fundamental difference: CPU privilege level at a given time

What does CPU privilege enable?

Exceptions review

We’ve previously demonstrated how our attempts to execute an invalid instruction or a privileged instruction while in user mode causes a CPU exception. At the hardware level, the CPU immediately switches it’s privilege mode to kernel mode and jumps at corresponding kernel function installed at boot to handle handle the exception. In this case, the handler function prints an error message to the kernel ring buffer and kills our program.

To give a couple more examples: Software conditions such as dividing by zero or accessing an unaligned address trigger CPU exceptions. Hardware can of course also interrupt CPU execution by changing voltage on CPU pins. Finally, attempting to access a pointer (virtual memory address) for which the memory-mapping unit does not have a corresponding physical address triggers an page fault exception which the kernel may resolve by setting up appropriate mapping (e.g. memory that was lazily allocated or swapped to disk), or by sending the program the SIGSEV signal otherwise known as a “Segmentation Fault”.

Userspace Demo

Here’s a short AT&T-style x86 assembly file we can use to generate a binary that will attempt to execute a privileged instruction:

global _start		; declare the _start symbol to have exernal linkage for visibility of linker
_start:				; the true entry point for an x86 executable program
	rdmsr			; execute the RDMSR instruction

Build the object file rdmsr.o from rdmsr.src with:

as -o rdmsr.o rdmsr.src

Create the linked executable binary rdmsr from rdmsr.o with:

ld -o rdmsr rdmsr.o.

Invocation of this binary by ./rdmsr should trigger a protection fault.

More information on the #UD Invalid Opcode exception.

Kernelspace Demo

With a small kernel module, we can get Linux to run the same instruction in kernelspace:

#include <linux/module.h>
#include <linux/init.h>
MODULE_LICENSE("GPL");
static int priv_demo_init(void) {
                /* arbitrary poison values */
                int result_lower_32 = -0xAF, result_upper_32 = -0xBF;
                pr_info("EDX:EAX := MSR[ECX];");
                asm ( "rdmsr"
                : "=r" (result_upper_32), "=r" (result_lower_32) : : );
                pr_info("rdmsr: EDX=0x%x, EAX=0x%x\n",
                                result_lower_32, result_upper_32);
                return 0;
}
static void priv_demo_exit(void) {
                pr_info("rdmsr exiting");
}
module_init(priv_demo_init);
module_exit(priv_demo_exit)

We can build this with the same Makefile as shown here on the E2 page.

Fully Automated demo

We created fully automated demo of privileged and unprivileged instruction execution. To acquire and run this demo, enter your VM and run git clone https://kdlp.underground.software/cgit/priv_rdmsr_demo/ and run make inside the directory.

Further look at kernelspace vs userspace demo

We took another look at the demo we posted on L05 after class.

That demo can be found here and obtained by running:

git clone https://kdlp.underground.software/cgit/priv_rdmsr_demo

Ensure that you are comfortable with some of the introductory details we discussed in L05.

Recall from L05 that a trap is a type of CPU exception.

We browsed the source for the Linux implementation of trap handling to understand the codepath that executes when the user executes the “UD2” instruction and prints a message to the kernel ring buffer (dmesg).

Th address of the handler for this exception is defined in arch/x86/kernel/traps.c, as exc_invalid_op. Elsewhere, the corresponding row of the IDT is set to this address, so when the exception is generated, handle_invalid_op is called.

If you are interested in the IDT then may also be interested in the GDT.

Linux implements a lot of x86-specific IDT related code in arch/x86/kernel/idt.c.

Intro to kernelspace

To begin, we used parts of the Kernel Modules and Device Drivers slide deck.

The most important takeaway: kernel code is reentrant