File: rt-introduce-cpu-chill.patch

package info (click to toggle)
linux 4.9.25-1
  • links: PTS, VCS
  • area: main
  • in suites: stretch
  • size: 794,052 kB
  • ctags: 3,033,799
  • sloc: ansic: 14,481,780; asm: 287,385; makefile: 35,234; perl: 27,553; sh: 15,791; python: 13,364; cpp: 6,087; yacc: 4,337; lex: 2,439; awk: 1,212; pascal: 231; lisp: 218; sed: 21
file content (129 lines) | stat: -rw-r--r-- 5,487 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
Subject: rt: Introduce cpu_chill()
From: Thomas Gleixner <tglx@linutronix.de>
Date: Wed, 07 Mar 2012 20:51:03 +0100
Origin: https://www.kernel.org/pub/linux/kernel/projects/rt/4.9/older/patches-4.9.20-rt16.tar.xz

Retry loops on RT might loop forever when the modifying side was
preempted. Add cpu_chill() to replace cpu_relax(). cpu_chill()
defaults to cpu_relax() for non RT. On RT it puts the looping task to
sleep for a tick so the preempted task can make progress.

Steven Rostedt changed it to use a hrtimer instead of msleep():
|
|Ulrich Obergfell pointed out that cpu_chill() calls msleep() which is woken
|up by the ksoftirqd running the TIMER softirq. But as the cpu_chill() is
|called from softirq context, it may block the ksoftirqd() from running, in
|which case, it may never wake up the msleep() causing the deadlock.
|
|I checked the vmcore, and irq/74-qla2xxx is stuck in the msleep() call,
|running on CPU 8. The one ksoftirqd that is stuck, happens to be the one that
|runs on CPU 8, and it is blocked on a lock held by irq/74-qla2xxx. As that
|ksoftirqd is the one that will wake up irq/74-qla2xxx, and it happens to be
|blocked on a lock that irq/74-qla2xxx holds, we have our deadlock.
|
|The solution is not to convert the cpu_chill() back to a cpu_relax() as that
|will re-create a possible live lock that the cpu_chill() fixed earlier, and may
|also leave this bug open on other softirqs. The fix is to remove the
|dependency on ksoftirqd from cpu_chill(). That is, instead of calling
|msleep() that requires ksoftirqd to wake it up, use the
|hrtimer_nanosleep() code that does the wakeup from hard irq context.
|
||Looks to be the lock of the block softirq. I don't have the core dump
||anymore, but from what I could tell the ksoftirqd was blocked on the
||block softirq lock, where the block softirq handler did a msleep
||(called by the qla2xxx interrupt handler).
||
||Looking at trigger_softirq() in block/blk-softirq.c, it can do a
||smp_callfunction() to another cpu to run the block softirq. If that
||happens to be the cpu where the qla2xx irq handler is doing the block
||softirq and is in a middle of a msleep(), I believe the ksoftirqd will
||try to run the softirq. If it does that, then BOOM, it's deadlocked
||because the ksoftirqd will never run the timer softirq either.
|
||I should have also stated that it was only one lock that was involved.
||But the lock owner was doing a msleep() that requires a wakeup by
||ksoftirqd to continue. If ksoftirqd happens to be blocked on a lock
||held by the msleep() caller, then you have your deadlock.
||
||It's best not to have any softirqs going to sleep requiring another
||softirq to wake it up. Note, if we ever require a timer softirq to do a
||cpu_chill() it will most definitely hit this deadlock.

+ bigeasy: add PF_NOFREEZE:
| [....] Waiting for /dev to be fully populated...
| =====================================
| [ BUG: udevd/229 still has locks held! ]
| 3.12.11-rt17 #23 Not tainted
| -------------------------------------
| 1 lock held by udevd/229:
|  #0:  (&type->i_mutex_dir_key#2){+.+.+.}, at: lookup_slow+0x28/0x98
|
| stack backtrace:
| CPU: 0 PID: 229 Comm: udevd Not tainted 3.12.11-rt17 #23
| (unwind_backtrace+0x0/0xf8) from (show_stack+0x10/0x14)
| (show_stack+0x10/0x14) from (dump_stack+0x74/0xbc)
| (dump_stack+0x74/0xbc) from (do_nanosleep+0x120/0x160)
| (do_nanosleep+0x120/0x160) from (hrtimer_nanosleep+0x90/0x110)
| (hrtimer_nanosleep+0x90/0x110) from (cpu_chill+0x30/0x38)
| (cpu_chill+0x30/0x38) from (dentry_kill+0x158/0x1ec)
| (dentry_kill+0x158/0x1ec) from (dput+0x74/0x15c)
| (dput+0x74/0x15c) from (lookup_real+0x4c/0x50)
| (lookup_real+0x4c/0x50) from (__lookup_hash+0x34/0x44)
| (__lookup_hash+0x34/0x44) from (lookup_slow+0x38/0x98)
| (lookup_slow+0x38/0x98) from (path_lookupat+0x208/0x7fc)
| (path_lookupat+0x208/0x7fc) from (filename_lookup+0x20/0x60)
| (filename_lookup+0x20/0x60) from (user_path_at_empty+0x50/0x7c)
| (user_path_at_empty+0x50/0x7c) from (user_path_at+0x14/0x1c)
| (user_path_at+0x14/0x1c) from (vfs_fstatat+0x48/0x94)
| (vfs_fstatat+0x48/0x94) from (SyS_stat64+0x14/0x30)
| (SyS_stat64+0x14/0x30) from (ret_fast_syscall+0x0/0x48)

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/delay.h |    6 ++++++
 kernel/time/hrtimer.c |   19 +++++++++++++++++++
 2 files changed, 25 insertions(+)

--- a/include/linux/delay.h
+++ b/include/linux/delay.h
@@ -52,4 +52,10 @@ static inline void ssleep(unsigned int s
 	msleep(seconds * 1000);
 }
 
+#ifdef CONFIG_PREEMPT_RT_FULL
+extern void cpu_chill(void);
+#else
+# define cpu_chill()	cpu_relax()
+#endif
+
 #endif /* defined(_LINUX_DELAY_H) */
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1768,6 +1768,25 @@ SYSCALL_DEFINE2(nanosleep, struct timesp
 	return hrtimer_nanosleep(&tu, rmtp, HRTIMER_MODE_REL, CLOCK_MONOTONIC);
 }
 
+#ifdef CONFIG_PREEMPT_RT_FULL
+/*
+ * Sleep for 1 ms in hope whoever holds what we want will let it go.
+ */
+void cpu_chill(void)
+{
+	struct timespec tu = {
+		.tv_nsec = NSEC_PER_MSEC,
+	};
+	unsigned int freeze_flag = current->flags & PF_NOFREEZE;
+
+	current->flags |= PF_NOFREEZE;
+	hrtimer_nanosleep(&tu, NULL, HRTIMER_MODE_REL, CLOCK_MONOTONIC);
+	if (!freeze_flag)
+		current->flags &= ~PF_NOFREEZE;
+}
+EXPORT_SYMBOL(cpu_chill);
+#endif
+
 /*
  * Functions related to boot-time initialization:
  */