Package: bsd-mailx / 8.1.2-0.20111106cvs-1+deb7u1

24-False-cant-send-email-errors.patch Patch series | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
From: Ivan Zahariev <famzah@icdsoft.com>
Date: Wed, 7 Oct 2009 20:50:11 +0300
Subject: 24 False cant send email errors.

Debian Bug  #550116:

This bug is caused by the patch in "send.c" for the bug report #145379.
Under certain circumstances, a race condition can occur if:
1. The parent fork()'s a process and exec()'s "sendmail" in "send.c".
The child process is born.
2. The child starts, finishes quickly and exits. The parent has not
called wait_child(pid) in "send.c" yet.
3. The parent immediately gets SIGCHLD because the child exited already.
The sigchild() handler in "popen.c" reaps the child via waitpid() and
exits directly because findchild(pid, 1) returned NULL. It returned NULL
because the PID of the child process has not been added to the "child"
structure list at all.
4. The execution of the parent process is resumed in "send.c", and it
now calls wait_child(pid). The function wait_child(pid) returns "-1"
because wait_child(pid) in "popen.c" calls waitpid(pid, ...) again for
the same child PID, which the sigchild() handler already reaped. The
second call to findchild(pid, 1) by wait_child(pid) in "popen.c" returns
NULL too, because as already stated the PID of the child process has not
been added to the "child" structure list. As a result, the false error
message "Can't send mail: sendmail process failed" is given.

This bug happens only rarely, usually when the system is under load and
the parent process lags a bit after the child one. But it does happen.
We send about 15 messages every hour on 36 servers each, and we get 10
false error messages on average for 24 hours (0.08% false error rate).
---
 extern.h |    1 +
 popen.c  |    9 +++++++++
 send.c   |   30 ++++++++++++++++++++++++++++++
 3 files changed, 40 insertions(+), 0 deletions(-)

diff --git a/extern.h b/extern.h
index 919747d..c2642fe 100644
--- a/extern.h
+++ b/extern.h
@@ -126,6 +126,7 @@ void	 fixhead(struct header *, struct name *);
 void	 fmt(char *, struct name *, FILE *, int);
 int	 folders(void *);
 int	 forward(char *, FILE *, char *, int);
+int	 add_child(pid_t);
 void	 free_child(pid_t);
 int	 from(void *);
 off_t	 fsize(FILE *);
diff --git a/popen.c b/popen.c
index d33e1a0..c7b497d 100644
--- a/popen.c
+++ b/popen.c
@@ -396,6 +396,15 @@ wait_child(pid_t pid)
 		return (WEXITSTATUS(wait_status));
 	else
 		return(-1);
+}
+
+/*
+ * Mark that we will wait for this child via SIGCHLD, or waitpid() manually, and we want its exit status stored.
+ */
+int
+add_child(pid_t pid) {
+	if (!findchild(pid, 0/* alloc if not present */))
+		return(-1); // this should never happen
 	else
 		return(0);
 }
diff --git a/send.c b/send.c
index 2e7cf88..4d866ab 100644
--- a/send.c
+++ b/send.c
@@ -349,6 +349,7 @@ mail1(struct header *hp, int printheaders)
 	struct name *to;
 	FILE *mtf;
 	int w;
+	sigset_t parent_nset, parent_oset;
 
 	/*
 	 * Collect user's mail from standard input.
@@ -403,6 +404,15 @@ mail1(struct header *hp, int printheaders)
 	}
 	if ((cp = value("record")) != NULL)
 		(void)savemail(expand(cp), mtf);
+
+	/*
+	 * Block the SIGCHLD signal temporarily, until we fork() a child and add its PID to the struct child "child" list.
+	 * See: Bug#550116
+	 */
+	sigemptyset(&parent_nset);
+	sigaddset(&parent_nset, SIGCHLD);
+	sigprocmask(SIG_BLOCK, &parent_nset, &parent_oset);
+
 	/*
 	 * Fork, set up the temporary mail file as standard
 	 * input for "mail", and exec with the user list we generated
@@ -417,6 +427,12 @@ mail1(struct header *hp, int printheaders)
 	if (pid == 0) {
 		sigset_t nset;
 
+		/*
+		 * Restore the original signal mask which we altered in the parent, just in case.
+		 * See: Bug#550116
+		 */
+		sigprocmask(SIG_SETMASK, &parent_oset, NULL);
+
 		sigemptyset(&nset);
 		sigaddset(&nset, SIGHUP);
 		sigaddset(&nset, SIGINT);
@@ -433,6 +449,20 @@ mail1(struct header *hp, int printheaders)
 		warn("%s", cp);
 		_exit(1);
 	}
+
+	/*
+	 * Add the PID to the struct child "child" list or else the sigchild() handler may reap the child
+	 *   before we manage to reap it in wait_child().
+	 * Then restore the original signal mask.
+	 * See: Bug#550116
+	 */
+	if (add_child(pid)) {
+		fprintf(stderr, "findchild() failed"); // this should never happen
+	}
+	sigprocmask(SIG_SETMASK, &parent_oset, NULL);
+
+
+
 #ifndef DEBIAN
 	if (value("verbose") != NULL)
 		(void)wait_child(pid);
--