1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146
|
% $Id: hostsupport.tex,v 1.4 2009-03-02 15:22:20 potyra Exp $
% vim:tabstop=8:shiftwidth=8:textwidth=78
%
% Copyright (C) 2002-2009 FAUmachine Team <info@faumachine.org>.
% This program is free software. You can redistribute it and/or modify it
% under the terms of the GNU General Public License, either version 2 of
% the License, or (at your option) any later version. See COPYING.
\documentclass[a4paper,10pt]{article}
\parindent0pt
\title{Extensions to the hosting Linux kernel to provide better Support for
FAUmachine}
\author{Hans-J\"org H\"oxer, Volkmar Sieh
\\{\tt Hans-Joerg.Hoexer@informatik.uni-erlangen.de}
\\{\tt Volkmar.Sieh@informatik.uni-erlangen.de}
\\{\tt info@faumachine.org}}
\begin{document}
\maketitle
\section{\label{section0}Introduction}
When running the Linux kernel as a user space process hardware can not be
accessed directly and needs to be virtualized. The current implementation of
FAUmachine shows that the systemcall interface provided by Linux is sufficient to
accomplish this. It is even possible to virtualize mechanisms of the CPU and
MMU like different privileged execution modes (user mode vs. superuser mode)
and memory protection (user space vs. kernel space). But doing this in user
space is difficult and slow. Therefore it is desireable to extend the hosting
Linux kernel with some additional fast and smart mechanisms. This report
gives an overview.
\section{\label{section1}Extended {\tt mmap} Systemcall}
Pages mapped into the address space of a process get an additional attribute
that specifies a protection level. Similarly processes get an additional
attribute that specifies the privilege level at which the process is currently
running. Level 0 is the most privileged level and allows a process to access
pages with a protection level greater or equal 0.
A process can change its privilege level using an extended signal mechanism
described in section \ref{section2}. Thus it is possible to distinguish
between different execution modes with different access rights to the virtual
address space of a process.
The {\tt mmap} systemcall is extended with a {\tt level} argument which
specifies the protection level of pages managed with {\tt mmap}.
The default protection level for pages and privilege level for a process is
0. Thus a process has full access to its virtual address space and behaves
just like a conventional user process currently does.
\section{\label{section2}Extended Signals}
An extended signal mechanism is used to switch between different privilege
levels of a process. The systemcalls {\tt sigaltstack}, {\tt sigreturn} and
{\tt sigaction} get an additional level argument.
For {\tt sigaltstack} and {\tt sigaction} the level defines the privilege
level at which the signalhandler is executed. For {\tt sigreturn} the level
defines the privilege level at which the interrupted process resumes
execution.
On return to a less privileged level $newlevel$ the Linux kernel unmaps all
pages with a protection level less than $newlevel$. Consequently the page
fault handler of the kernel has to compare the current privilege level of a
process with the protection level of a faulting page. If the faulting page
has a protection level less than the current privilege level of the process,
this page is not mapped and the process receives a {\tt SIGSEGV}. Otherwise
the page is mapped and the page fault is resolved.
Again the default level for {\tt sigaction}, {\tt sigaltstack} and {\tt
sigreturn} is 0 which is equivalent to the current behaviour of these
systemcalls.
\section{\label{section3}Exception and Interrupt Trapping}
A further aspect regarding exceptions and interrupts can be added to the
extended signal mechanism: All interrupts -- including software interrupts
like 0x80 on {\tt x86} -- and exceptions occuring in the context of a process
with privilege level greater than 0 are converted into a signal and the trap
number is put on the signal stack of a corresponding level 0 signalhandler.
This handler can examine the trap number and decide how to proceed.
This makes it easy to redirect systemcalls without using {\tt ptrace} and to
prohibit that sandboxed applications execute certain systemcalls.
\section{\label{section4}Implementing FAUmachine using Host Support}
FAUmachine uses 3 privilege levels: The FAUmachine process executes the simulator
binary {\tt simulator} at level 0. {\tt simulator} provides a level 0
signalhandler for all signals that the FAUmachine process may receive. The pages
of {\tt simulator} are mapped with protection level 0.
The user mode Linux kernel {\tt vmlinux} is executing at level 1 and physical
memory is mapped with protection level 1 in the kernel area of the address
space of the FAUmachine process.
User mode processes\footnote{These processes are actually threads running
inside the FAUmachine process} of the guest Linux kernel {\tt vmlinux} are
executed at protection level 2. The protection level of pages associated with
these processes is raised to level 2. These pages are mapped to the user
space of the guest kernel.
Thus FAUmachine provides three different views on the virtual address space of
the FAUmachine process: The simulator binary running at privilege level 0 has
full access to all pages of the FAUmachine process. The guest kernel is executed
with level 1 and has full access to all pages of the FAUmachine process except
the level 0 pages that are associated with {\tt simulator}. User mode
processes of the guest kernel are running with level 2 and have only access
to their own level 2 pages.\\
The transition from one privilege level to another is done using signals. The
FAUmachine process is generated with {\tt fork} and starts execution of text of
{\tt simulator} at level 0. {\tt simulator} sends itself a signal and
switches to an appropriate signalhandler. This handler arranges for the
FAUmachine process to end up inside the {\tt vmlinux} binary with privilege level
1 after return from the handler. Now {\tt simulator} disappears with all its
associated pages.
To switch from kernel mode to user mode, {\tt vmlinux} generates a signal --
for example by executing an {\tt int3} instruction -- and the appropriate
level 0 signalhandler of {\tt simulator} is executed. The handler arranges
for the FAUmachine process to return to user mode user space with the privilege
level set to 2. Now the user mode kernel is not visible anymore and access to
the guest kernel space raises exceptions.
When a process running in user mode tries to executes a systemcall -- ie. does
an {\tt int 0x80} -- the hosting Linux kernel delivers a signal to the
FAUmachine
process and the corresponding level 0 signalhandler of {\tt simulator} is
executed. This one can now arrange for the FAUmachine process to resume
execution at level 1 inside the guest kernel.\\
For Jeff Dike's User Mode Linux a quite similar model can be used by executing
the UM kernel at level 0 and raising the privilege level to 1 on return to
guest user mode.
\section{\label{section5}Conclusion}
The proposed extensions to the Linux kernel provide a mechanism to divide the
address space of a userland process into separate areas with page granularity
that are protected from each other. Access to these areas depends on the
current privilege level of a userland process. Transition from one privilege
level to another is done using signals.
Thus the abstraction layer that Linux provides for hardware access is extended
in a logical way. Applications like a user mode Linux implementation can now
use the concept of user mode vs. superuser mode in userland.
\end{document}
|