On a Linux system, when the CPU is executing code in a fully privileged mode, we say that the CPU is executing the code in kernelspace
On a Linux system, When the CPU is executing code at a restricted privilege level, we say that the CPU is executing the code in userspace
#GP(0)
CPU exception.We’ve previously demonstrated how our attempts to execute an invalid instruction or a privileged instruction while in user mode causes a CPU exception. At the hardware level, the CPU immediately switches it’s privilege mode to kernel mode and jumps at corresponding kernel function installed at boot to handle handle the exception. In this case, the handler function prints an error message to the kernel ring buffer and kills our program.
To give a couple more examples:
Software conditions such as dividing by zero or accessing an unaligned address trigger CPU exceptions. Hardware can of course also interrupt CPU execution by changing voltage on CPU pins. Finally, attempting to access a pointer
(virtual memory address) for which the
memory-mapping unit does not have a corresponding physical address triggers an
page fault
exception which the kernel may resolve by setting up appropriate mapping (e.g. memory that was lazily allocated or swapped to disk), or by sending the program the SIGSEV
signal otherwise known as a “Segmentation Fault”.
Here’s a short AT&T-style x86 assembly file we can use to generate a binary that will attempt to execute a privileged instruction:
global _start ; declare the _start symbol to have exernal linkage for visibility of linker
_start: ; the true entry point for an x86 executable program
rdmsr ; execute the RDMSR instruction
Build the object file rdmsr.o
from rdmsr.src
with:
as -o rdmsr.o rdmsr.src
Create the linked executable binary rdmsr
from rdmsr.o
with:
ld -o rdmsr rdmsr.o
.
Invocation of this binary by ./rdmsr
should trigger a protection fault.
More information on the #UD
Invalid Opcode exception.
With a small kernel module, we can get Linux to run the same instruction in kernelspace:
#include <linux/module.h>
#include <linux/init.h>
MODULE_LICENSE("GPL");
static int priv_demo_init(void) {
/* arbitrary poison values */
int result_lower_32 = -0xAF, result_upper_32 = -0xBF;
pr_info("EDX:EAX := MSR[ECX];");
asm ( "rdmsr"
: "=r" (result_upper_32), "=r" (result_lower_32) : : );
pr_info("rdmsr: EDX=0x%x, EAX=0x%x\n",
result_lower_32, result_upper_32);
return 0;
}
static void priv_demo_exit(void) {
pr_info("rdmsr exiting");
}
module_init(priv_demo_init);
module_exit(priv_demo_exit)
We can build this with the same Makefile as shown here on the E2 page.
We created fully automated demo of privileged and unprivileged instruction execution.
To acquire and run this demo, enter your VM and run git clone https://kdlp.underground.software/cgit/priv_rdmsr_demo/
and run make
inside the directory.
We took another look at the demo we posted on L05 after class.
That demo can be found here and obtained by running:
git clone https://kdlp.underground.software/cgit/priv_rdmsr_demo
Ensure that you are comfortable with some of the introductory details we discussed in L05.
Recall from L05 that a trap is a type of CPU exception.
We browsed the source for the Linux implementation of trap handling to understand the codepath that executes when the user executes the “UD2” instruction and prints a message to the kernel ring buffer (dmesg
).
Th address of the handler for this exception is defined in
arch/x86/kernel/traps.c, as
exc_invalid_op
.
Elsewhere, the corresponding row of the
IDT
is set to this address, so when the exception is generated,
handle_invalid_op
is called.
If you are interested in the IDT then may also be interested in the GDT.
Linux implements a lot of x86-specific IDT related code in arch/x86/kernel/idt.c.
To begin, we used parts of the Kernel Modules and Device Drivers slide deck.
The slides are a little bit out of sync with how we have re-arranged the course and we have not yet reached device driver development.
The last three slides were the most relevant, however students may be interested in taking a look at the rest of it.
The kernel uses a small, fixed-size stack, compared to the larger, extendable stack used by userspace programs.
The C library, itself being a userspace program, is not available in kernelspace. Instead, many – but importantly not all – are implemented within the kernel.
For example, the IEEE754 floating point storage type that we all know and love from userspace C programming is entirely banned from the kernel.
The reason is that when CPU switches between kernelspace and userspace, it has to save and restore it’s execution state to remember where it left off, and saving and restoring the floating point registers is considered to be too much overhead.
The kernel uses a different range of the address space than userspace. On x86_64 systems, the virtual address space is generally split in half
Definition: A computer program is considered reentrant if and only if multiple concurrent executions of the same program always run correctly.
Further information can be found on the reentrancy Wikipedia page.
Assume that any line of code in the kernel can be running at any time with any number of concurrent executions of the same code.