Next Previous Contents

11. What to do when UML doesn't work

11.1 Child nnnnn exited with signal 11

This appears just after

VFS: Mounted root (ext2 filesystem) readonly.
Mounted devfs on /dev
Two causes for this - the old cause which isn't the problem unless you've a seriously old host kernel The other cause, which is far more likely these days I have another report of this which appears not to be caused by either of the above.

11.2 Segfault in padzero

You run UML under the kernel debugger and this appears in the debugger window:

Program received signal SIGSEGV, Segmentation fault.
0x10035830 in padzero (elf_bss=1073765049)
    at /ext1/usermode/linux/include/asm/arch/string.h:418
418     __asm__ __volatile__(
This is the normal faulting in of init. To having gdb stop every time a page is faulted in, do this in the debugger window:
handle SIGSEGV pass nostop noprint
This isn't necessary after test10 is released since the kernel debugger won't see SIGSEGV any more.

11.3 Out of pty's in getmaster

When UML boots up, it panics like this:

Initializing stdio console driver
Initializing software serial port version 0
Kernel panic: Out of pty's in getmaster
Either your system is out of pseudo-terminals, in which case you need to figure out why and fix it, or you're running devfs with no old-style tty-pty pairs. Make a few, and this panic will go away.

As of test10, this problem doesn't cause a panic. The serial line driver just fails to initialize itself.

11.4 Can't set up the umn device : "Failed to set slip line discipline"

On newer host kernels, the security on the slip device was tightened so that you need to be root in order to set the slip line discipline on a terminal. On recent versions of UML this isn't a problem since the uml_net helper sets up the slip device.

11.5 Stack overflowed onto current_task page

This panic was introduced in test9 to try to catch a real stack overflow bug. It actually caught a lot of cases which weren't bugs. It's fixed in test10 by the stack being twice as big and there being a guard page between the stack and the task structure. This panic is probably only seen on fairly recent 2.4.0 host kernels. So, a workaround would be to run a 2.2 or a not-too-recent 2.3/2.4 kernel as the host.

11.6 Strange compilation errors when you build from source

As of test11, it is necessary to have "ARCH=um" in the environment or on the make command line for all steps in building UML, including clean, distclean, or mrproper, config, menuconfig, or xconfig, dep, and linux. If you forget for any of them, the i386 build seems to contaminate the UML build. If this happens, start from scratch with

make mrproper ARCH=um
and repeat the build process with ARCH=um on all the steps.

See Compiling the kernel and modules for more details.

Another cause of strange compilation errors is building UML in /usr/src/linux. If you do this, the first thing you need to do is clean up the mess you made. The /usr/src/linux/asm link will now point to /usr/src/linux/asm-um. Make it point back to /usr/src/linux/asm-i386. Then, move your UML pool someplace else and build it there. Also see below, where a more specific set of symptoms is described.

11.7 UML hangs on boot after mounting devfs

If you have the debugger running, it will always show copy_mount_options on the stack. This is due to a bogus compiler. You will have a kgcc on your system. Redo the UML build with "CC=kgcc" on the make command line.

This was a UML bug, not a compiler bug, and has since been fixed.

11.8 A variety of panics and hangs with /tmp on a reiserfs filesystem

I saw this on reiserfs 3.5.21 and it seems to be fixed in 3.5.27. Panics preceded by

Detaching pid nnnn
are diagnostic of this problem. This is a reiserfs bug which causes a thread to occasionally read stale data from a mmapped page shared with another thread. The fix is to upgrade the filesystem or to have /tmp be an ext2 filesystem.

11.9 The compile fails with errors about conflicting types for 'open', 'dup', and 'waitpid'

This happens when you build in /usr/src/linux. The UML build makes the include/asm link point to include/asm-um. /usr/include/asm points to /usr/src/linux/include/asm, so when that link gets moved, files which need to include the asm-i386 versions of headers get the incompatible asm-um versions. The fix is to move the include/asm link back to include/asm-i386 and to do UML builds someplace else.

11.10 UML doesn't work when /tmp is an NFS filesystem

This seems to be a similar situation with the resierfs problem above. Some versions of NFS seems not to handle mmap correctly, which UML depends on. The workaround is have /tmp be non-NFS directory.

11.11 UML hangs on boot when compiled with gprof support

If you build UML with gprof support and, early in the boot, it does this

kernel BUG at page_alloc.c:100!
you have a buggy gcc. You can work around the problem by removing UM_FASTCALL from CFLAGS in arch/um/Makefile-i386. This will open up another bug, but that one is fairly hard to reproduce.

11.12 syslogd dies with a SIGTERM on startup

The exact boot error depends on the distribution that you're booting, but Debian produces this:

/etc/rc2.d/S10sysklogd: line 49:    93 Terminated
start-stop-daemon --start --quiet --exec /sbin/syslogd -- $SYSLOGD
This is a syslogd bug. There's a race between a parent process installing a signal handler and its child sending the signal. See this uml-devel post for the details.

11.13 TUN/TAP networking doesn't work on a 2.4 host

There are a couple of problems which were http://www.geocrawler.com/lists/3/SourceForge/597/0/ name="pointed out"> by Tim Robinson

11.14 You can network to the host but not to other machines on the net

This is because of routing that's automatically set up, but which is wrong for UML. You need to delete the network route and replace it with a host route to the host IP. See the bottom of the networking page for details.

This has been fixed by UML setting up proxy arp differently so that things work with the network route and the host route isn't needed.

11.15 I have no root and I want to scream

Thanks to Birgit Wahlich for telling me about this strange one. It turns out that there's a limit of six environment variables on the kernel command line. When that limit is reached or exceeded, argument processing stops, which means that the 'root=' argument that UML usually adds is not seen. So, the filesystem has no idea what the root device is, so it panics.

The fix is to put less stuff on the command line. Glomming all your setup variables into one is probably the best way to go.

11.16 UML build conflict between ptrace.h and ucontext.h

On some older systems, /usr/include/asm/ptrace.h and /usr/include/sys/ucontext.h define the same names. So, when they're included together, the defines from one completely mess up the parsing of the other, producing errors like:

/usr/include/sys/ucontext.h:47: parse error before
`10'
plus a pile of warnings.

This is a libc botch, which has since been fixed, and I don't see any way around it besides upgrading.

11.17 Any other panic, hang, or strange behavior

If you're seeing truly strange behavior, such as hangs or panics that happen in random places, or you try running the debugger to see what's happening and it acts strangely, then it could be a problem in the host kernel. If you're not running a stock Linus or -ac kernel, then try that. An early version of the preemption patch and a 2.4.10 SuSE kernel have caused very strange problems in UML.

Otherwise, let me know about it. Send a message to one of the UML mailing lists - either the developer list or the user list , whichever you prefer. Don't assume that everyone knows about it and that a fix is imminent.

If you want to be super-helpful, read Diagnosing Problems and follow the instructions contained therein.


Next Previous Contents