File: UserModeLinux-HOWTO-14.html

package info (click to toggle)
user-mode-linux-doc 20060501-1
  • links: PTS
  • area: main
  • in suites: etch, etch-m68k, jessie, jessie-kfreebsd, lenny, squeeze, wheezy
  • size: 2,360 kB
  • ctags: 517
  • sloc: makefile: 36; sh: 7
file content (208 lines) | stat: -rw-r--r-- 7,156 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
 <META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.21">
 <TITLE>User Mode Linux HOWTO : Diagnosing Problems</TITLE>
 <LINK HREF="UserModeLinux-HOWTO-15.html" REL=next>
 <LINK HREF="UserModeLinux-HOWTO-13.html" REL=previous>
 <LINK HREF="UserModeLinux-HOWTO.html#toc14" REL=contents>
</HEAD>
<BODY>
<A HREF="UserModeLinux-HOWTO-15.html">Next</A>
<A HREF="UserModeLinux-HOWTO-13.html">Previous</A>
<A HREF="UserModeLinux-HOWTO.html#toc14">Contents</A>
<HR>
<H2><A NAME="trouble"></A> <A NAME="s14">14.</A> <A HREF="UserModeLinux-HOWTO.html#toc14">Diagnosing Problems</A></H2>

<P> 
If you get UML to crash, hang, or otherwise misbehave, you should
report this on one of the project mailing lists, either the 
developer list - user-mode-linux-devel at lists dot sourceforge dot
net (subscription info) or the user list -
user-mode-linux-user at lists dot sourceforge dot net 
(subscription info).  When you do, it is
likely that I will want more information.  So, it would be helpful to
read the stuff below, do whatever is applicable in your case, and
report the results to the list.</P>
<P> 
For any diagnosis, you're going to need to build a debugging kernel.
The binaries from this site aren't debuggable.  If you haven't done
this before, read about 
<A HREF="UserModeLinux-HOWTO-2.html#compile">Compiling the kernel and modules</A>  and
<A HREF="UserModeLinux-HOWTO-11.html#debugging">Kernel debugging</A>  UML first. </P>

<H2><A NAME="ss14.1">14.1</A> <A HREF="UserModeLinux-HOWTO.html#toc14.1">Case 1 : Normal kernel panics</A>
</H2>

<P>The most common case is for a normal thread to panic.  To debug this,
you will need to run it under the debugger (add 'debug' to the command
line).  An xterm will start up with gdb running inside it.  Continue
it when it stops in start_kernel and make it crash.  Now ^C gdb and
'bt'.  I'm going to want to see the resulting stack trace. </P>
<P> 
If the panic was a &quot;Kernel mode fault&quot;, then there will be a segv
frame on the stack and I'm going to want some more information.  The
stack might look something like this:
<BLOCKQUOTE><CODE>
<PRE>
(UML gdb)  backtrace
#0  0x1009bf76 in __sigprocmask (how=1, set=0x5f347940, oset=0x0)
    at ../sysdeps/unix/sysv/linux/sigprocmask.c:49
#1  0x10091411 in change_sig (signal=10, on=1) at process.c:218
#2  0x10094785 in timer_handler (sig=26) at time_kern.c:32
#3  0x1009bf38 in __restore ()
    at ../sysdeps/unix/sysv/linux/i386/sigaction.c:125
#4  0x1009534c in segv (address=8, ip=268849158, is_write=2, is_user=0)
    at trap_kern.c:66
#5  0x10095c04 in segv_handler (sig=11) at trap_user.c:285
#6  0x1009bf38 in __restore ()
</PRE>
</CODE></BLOCKQUOTE>

I'm going to want to see the symbol and line information for the value
of ip in the segv frame.  In this case, you would do the following:
<BLOCKQUOTE><CODE>
<PRE>
(UML gdb)  i sym 268849158
</PRE>
</CODE></BLOCKQUOTE>

and
<BLOCKQUOTE><CODE>
<PRE>
(UML gdb)  i line *268849158
</PRE>
</CODE></BLOCKQUOTE>

The reason for this is the __restore frame right above the
segv_handler frame is hiding the frame that actually segfaulted.  So,
I have to get that information from the faulting ip.</P>

<H2><A NAME="ss14.2">14.2</A> <A HREF="UserModeLinux-HOWTO.html#toc14.2">Case 2 : Tracing thread panics</A>
</H2>

<P>The less common and more painful case is when the tracing thread
panics.  In this case, the kernel debugger will be useless because it
needs a healthy tracing thread in order to work.  The first thing to
do is get a backtrace from the tracing thread.  This is done by
figuring out what its pid is, firing up gdb, and attaching it to that
pid.  You can figure out the tracing thread pid by looking at the
first line of the console output, which will look like this:
<BLOCKQUOTE><CODE>
<PRE>
tracing thread pid = 15851
</PRE>
</CODE></BLOCKQUOTE>

or by running ps on the host and finding the line that looks like
this:
<BLOCKQUOTE><CODE>
<PRE>
jdike 15851 4.5 0.4 132568 1104 pts/0 S 21:34 0:05 ./linux [(tracing thread)]
</PRE>
</CODE></BLOCKQUOTE>

If the panic was 'segfault in signals', then follow the instructions
above for collecting information about the location of the seg fault.</P>
<P> 
If the tracing thread flaked out all by itself, then send that
backtrace in and wait for our crack debugging team to fix the
problem.  </P>

<H2><A NAME="ss14.3">14.3</A> <A HREF="UserModeLinux-HOWTO.html#toc14.3">Case 3 : Tracing thread panics caused by other threads</A>
</H2>

<P>However, there are cases where the misbehavior of another
thread caused the problem.  The most common panic of this type is:
<BLOCKQUOTE><CODE>
<PRE>
wait_for_stop failed to wait for  &lt;pid>  to stop with  &lt;signal number> 
</PRE>
</CODE></BLOCKQUOTE>

In this case, you'll need to get a backtrace from the process
mentioned in the panic, which is complicated by the fact that the
kernel debugger is defunct and without some fancy footwork, another
gdb can't attach to it.  So, this is how the fancy footwork goes:</P>
<P>In a shell:
<BLOCKQUOTE><CODE>
<PRE>
host% kill -STOP pid
</PRE>
</CODE></BLOCKQUOTE>

Run gdb on the tracing thread as described in case 2 and do:
<BLOCKQUOTE><CODE>
<PRE>
(host gdb)  call detach(pid)
</PRE>
</CODE></BLOCKQUOTE>

If you get a segfault, do it again.  It always works the second
time.</P>
<P>Detach from the tracing thread and attach to that other thread:
<BLOCKQUOTE><CODE>
<PRE>
(host gdb)  detach
</PRE>
</CODE></BLOCKQUOTE>

<BLOCKQUOTE><CODE>
<PRE>
(host gdb)  attach pid
</PRE>
</CODE></BLOCKQUOTE>

If gdb hangs when attaching to that process, go back to a shell and
do:
<BLOCKQUOTE><CODE>
<PRE>
host% 
kill -CONT pid
</PRE>
</CODE></BLOCKQUOTE>

And then get the backtrace:
<BLOCKQUOTE><CODE>
<PRE>
(host gdb)  backtrace
</PRE>
</CODE></BLOCKQUOTE>
</P>

<H2><A NAME="ss14.4">14.4</A> <A HREF="UserModeLinux-HOWTO.html#toc14.4">Case 4 : Hangs</A>
</H2>

<P>Hangs seem to be fairly rare, but they sometimes happen.  When a hang
happens, we need a backtrace from the offending process.  Run the
kernel debugger as described in case 1 and get a backtrace.  If the
current process is not the idle thread, then send in the backtrace.
You can tell that it's the idle thread if the stack looks like this:
<BLOCKQUOTE><CODE>
<PRE>
#0  0x100b1401 in __libc_nanosleep ()
#1  0x100a2885 in idle_sleep (secs=10) at time.c:122
#2  0x100a546f in do_idle () at process_kern.c:445
#3  0x100a5508 in cpu_idle () at process_kern.c:471
#4  0x100ec18f in start_kernel () at init/main.c:592
#5  0x100a3e10 in start_kernel_proc (unused=0x0) at um_arch.c:71
#6  0x100a383f in signal_tramp (arg=0x100a3dd8) at trap_user.c:50
</PRE>
</CODE></BLOCKQUOTE>

If this is the case, then some other process is at fault, and went to
sleep when it shouldn't have.  Run ps on the host and figure out which
process should not have gone to sleep and stayed asleep.  Then attach
to it with gdb and get a backtrace as described in case 3.</P>






<HR>
<A HREF="UserModeLinux-HOWTO-15.html">Next</A>
<A HREF="UserModeLinux-HOWTO-13.html">Previous</A>
<A HREF="UserModeLinux-HOWTO.html#toc14">Contents</A>
</BODY>
</HTML>