File: 0001-modesetting-Add-custom-UDP-Prime-Sync-protocol-for-P.patch

package info (click to toggle)
psychtoolbox-3 3.0.19.14.dfsg1-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 86,796 kB
  • sloc: ansic: 176,245; cpp: 20,103; objc: 5,393; sh: 2,753; python: 1,397; php: 384; makefile: 193; java: 113
file content (260 lines) | stat: -rw-r--r-- 10,540 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
From 6d6e1526b949ab7ac40efceddbca54ef7cfc01a1 Mon Sep 17 00:00:00 2001
From: Mario Kleiner <mario.kleiner.de@gmail.com>
Date: Sat, 15 Oct 2016 10:22:12 +0200
Subject: [PATCH xserver 1/2] modesetting: Add custom UDP Prime-Sync protocol
 for Psychtoolbox.

The new Prime-Sync code contributed to XOrg 1.19 by NVidia's
Alex Goins for implementing properly synchronized and serialized
NVidia dGPU outputSource -> Intel iGPU outputSlave Prime support,
to support Optimus Laptops, works well for preventing tearing or
incomplete rendering.

A current limitation is that it doesn't provide any way to
reliably signal completion of an OpenGL double-buffer swap to
a running fullscreen OpenGL client, e.g., Psychtoolbox.

Therefore the current X-Server 1.19 implementation is not
useable for vision science applications which require precise
visual stimulus onset timing or any kind of reliable time-
stamping.

As it is too late for some better solution for the 1.19 cycle,
and use of nouveau is not always possible for (lack of)performance
reasons, we use the following hack to make NVidia Optimus
useable for vision science applications on modern gpus:

This patch implements a custom UDP protocol between the
modesetting-ddx which drives the slaveOutput iGPU, and
interested clients, ie. Psychtoolbox. The modesetting-ddx
creates one UDP socket for emission of UDP packets for each
X-Screen. A UDP packet is sent to localhost:10000+(x-screenId),
e.g., localhost:10000 for X-Screen 0, localhost:10001 for
X-Screen 1 etc., whenever a kms-pageflip has been scheduled
or completed on the iGPU. The send operation is non-blocking,
so the X-Server can't get stalled if there isn't any client
listening at the receiving port.

1. "Flip scheduled" packets are sent out after a successfull
   call to drmModePageFlip, including the vblank count (msc)
   and timestamp (ust) of the vblank in which the flip was
   scheduled. The expectation is that usually such a flip
   will complete at msc + 1 and ust + 1 videorefresh duration.
   This allows Psychtoolbox to know that a Flip will likely
   complete one frame duration ahead of likely completion.

2. "Flip completed" packets with msc and ust of completion
   are sent out when a kms pageflip completed, iow. visual
   stimulus onset after vblank msc at time ust is guaranteed.

Ideally Psychtoolbox could just wait for "Flip completed" packets,
but the current implementation of slaveOutput update scheduling
in the modesetting ddx introduces 1 frame extra lag for each
client glXSwapBuffers call. Therefore PTB must not wait for flip
completion, but start rendering the next frame already when
a type 1 "Flip scheduled" packet arrives, so it can submit
glXSwapBuffers calls 1 frame early to compensate for the 1 frame
delay. This allows to achieve full framerate (fps == display Hz)
at the expense that timestamping could be wrong under very high
load scenarios, where the dGPU can't complete the DMA copy of the
new framebuffer from VRAM to the shared dmabuf in system RAM within
1 video refresh cycle, or where some massive kthread scheduling
delay would prevent hw pageflip programming within 1 refresh cycle.
In practice no such glitch was observed during testing.

For highest reliability PTB can instead wait for type 2 packets,
trading loss of performance for highest reliability.

The modesetting ddx creates a new XAtom to signal to PTB or
other clients that it supports this custom protocol, so PTB
et al. can enable their corresponding receiver and timestamping
code.

An important limitation is that the outputSlave / modesetting
driver can not detect the reason for a requested output update.
It could be an OpenGL bufferswap of a unredirected fullscreen
window, or any kind of visual update on a regular desktop GUI,
or even the visual movement/appearance change of a software cursor.
As such this timestamping/swap completion protocol can only work
somewhat reliable if the client displays a unredirected fullscreen
window covering the whole X-Screen -- luckily the common scenario
for vision science stimulation. It also only works well if there
is one single active output attached to the X-Screen, as the Prime
implementation will update/pageflip each active output individually
and send out separate UDP packets for flip completion. The client
has no way to disambiguate which packet to use for its flip completion
handling. A third limitation seems to be that we can only drive one
X-Screen in a session with NVidia's Optimus + proprietary driver and
GLX module. At least i could not find a xorg.conf which would allow
to successfully setup a multi-x-screen ZaphodHeads setup or such, so
this is so far only successfully tested on a single display setup,
either Laptop panel only, or external video output only, but not both
at the same time.

This patch tested against the NVidia 375.20 release driver with final
X-Server 1.19.0 on a Lenovo Z50 Optimus laptop with Intel HD 4400
+ GeForce 840M. Datapixx confirms correct timestamps.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
---
 hw/xfree86/drivers/modesetting/drmmode_display.c | 108 +++++++++++++++++++++++
 1 file changed, 108 insertions(+)

diff --git a/hw/xfree86/drivers/modesetting/drmmode_display.c b/hw/xfree86/drivers/modesetting/drmmode_display.c
index 6e755e9..d0cf665 100644
--- a/hw/xfree86/drivers/modesetting/drmmode_display.c
+++ b/hw/xfree86/drivers/modesetting/drmmode_display.c
@@ -29,6 +29,11 @@
 #include "dix-config.h"
 #endif
 
+/* For Unix UDP socket (msc,ust) side channel: */
+#include <sys/socket.h>
+#include <sys/fcntl.h>
+#include <netinet/ip.h>
+
 #include <errno.h>
 #include <sys/ioctl.h>
 #include <sys/mman.h>
@@ -199,6 +204,95 @@ drmmode_SetSlaveBO(PixmapPtr ppix,
     return TRUE;
 }
 
+static int fd_primestatus[1024] = { 0 };
+static struct buf {
+    uint64_t frame;
+    uint64_t usec;
+    int scrnIndex;
+    unsigned char flags;
+} buf;
+
+static void
+drmmode_InitSharedPixmapFeedback(drmmode_ptr drmmode)
+{
+    static Atom PrimeTimingHack1 = None;
+    int scrnIndex = drmmode->scrn->scrnIndex % 256;
+    struct sockaddr_in addr = { 0 };
+
+    fd_primestatus[scrnIndex] = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP);
+    if  (-1 == fd_primestatus[scrnIndex]) {
+        xf86DrvMsg(drmmode->scrn->scrnIndex, X_ERROR,
+                    "Failed to create Unix UDP socket for Prime feedback! %s\n",
+                   strerror(errno));
+    } else {
+        memset(&addr, 0, sizeof(addr));
+        addr.sin_family = AF_INET;
+        addr.sin_port = htons(10000 + scrnIndex);
+        addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+
+        if(connect(fd_primestatus[scrnIndex], (struct sockaddr *) &addr, sizeof(addr))) {
+            close(fd_primestatus[scrnIndex]);
+            fd_primestatus[scrnIndex] = 0;
+            xf86DrvMsg(drmmode->scrn->scrnIndex, X_ERROR,
+                        "Failed to connect() socket for Prime feedback on localhost:%i! %s\n",
+                       10000 + scrnIndex,strerror(errno));
+        }
+        else {
+            unsigned char sendpriority = IPTOS_LOWDELAY;
+            setsockopt(fd_primestatus[scrnIndex], SOL_IP, IP_TOS, &sendpriority, sizeof(sendpriority));
+
+            fcntl(fd_primestatus[scrnIndex], F_SETFL, O_NONBLOCK);
+
+            xf86DrvMsg(drmmode->scrn->scrnIndex, X_INFO,
+                        "Bound Unix UDP socket for Prime feedback on localhost:%i\n", 10000 + scrnIndex);
+        }
+    }
+
+    /* Create an Atom to signal that this is an enhanced modesetting-ddx with custom UDP
+     * Prime timestamping.
+     */
+    if (PrimeTimingHack1 == None)
+        PrimeTimingHack1 = MakeAtom("PrimeTimingHack1", strlen("PrimeTimingHack1"), TRUE);
+}
+
+static void
+drmmode_FiniSharedPixmapFeedback(drmmode_ptr drmmode)
+{
+    int scrnIndex = drmmode->scrn->scrnIndex % 256;
+    if (fd_primestatus[scrnIndex] > 0) {
+        close(fd_primestatus[scrnIndex]);
+        fd_primestatus[scrnIndex] = 0;
+        xf86DrvMsg(drmmode->scrn->scrnIndex, X_INFO,
+                   "Closed Unix UDP socket for Prime feedback.\n");
+    }
+}
+
+static void
+drmmode_SetSharedPixmapFeedback(int scrnIndex, uint64_t frame, uint64_t usec)
+{
+    scrnIndex = scrnIndex % 256;
+
+    if (fd_primestatus[scrnIndex] <= 0)
+        return;
+
+    buf.frame = frame;
+    buf.usec = usec;
+    buf.scrnIndex = scrnIndex;
+}
+
+static void
+drmmode_SendSharedPixmapFeedback(Bool flipcomplete)
+{
+    if (fd_primestatus[buf.scrnIndex] <= 0)
+        return;
+
+    buf.flags = flipcomplete ? 1 : 0;
+
+    if ((send(fd_primestatus[buf.scrnIndex], &buf, sizeof(buf), MSG_DONTWAIT) == sizeof(buf)) && FALSE)
+        xf86DrvMsg(buf.scrnIndex, X_DEBUG,
+                   "Send for Prime feedback: flipcompletion=%d : msc=%lu : ust=%lu\n", buf.flags, buf.frame, buf.usec);
+}
+
 static Bool
 drmmode_SharedPixmapPresent(PixmapPtr ppix, xf86CrtcPtr crtc,
                             drmmode_ptr drmmode)
@@ -248,7 +342,12 @@ drmmode_SharedPixmapVBlankEventHandler(uint64_t frame, uint64_t usec,
 
     drmmode_crtc_private_ptr drmmode_crtc = args->crtc->driver_private;
 
+    drmmode_SetSharedPixmapFeedback(args->drmmode->scrn->scrnIndex, frame, usec);
+
     if (args->flip) {
+        /* pageflip completed - Send completion packet */
+        drmmode_SendSharedPixmapFeedback(TRUE);
+
         /* frontTarget is being displayed, update crtc to reflect */
         drmmode_crtc->prime_pixmap = args->frontTarget;
         drmmode_crtc->prime_pixmap_back = args->backTarget;
@@ -342,6 +441,10 @@ drmmode_SharedPixmapFlip(PixmapPtr frontTarget, xf86CrtcPtr crtc,
         return FALSE;
     }
 
+    /* pageflip scheduled - Send scheduled packet */
+    if (drmmode_crtc->flipping_active)
+        drmmode_SendSharedPixmapFeedback(FALSE);
+
     return TRUE;
 }
 
@@ -360,6 +463,9 @@ drmmode_InitSharedPixmapFlipping(xf86CrtcPtr crtc, drmmode_ptr drmmode)
         drmmode_SharedPixmapPresent(drmmode_crtc->prime_pixmap_back,
                                     crtc, drmmode);
 
+    if (drmmode_crtc->flipping_active)
+        drmmode_InitSharedPixmapFeedback(drmmode);
+
     return drmmode_crtc->flipping_active;
 }
 
@@ -384,6 +490,8 @@ drmmode_FiniSharedPixmapFlipping(xf86CrtcPtr crtc, drmmode_ptr drmmode)
                           drmmode_crtc->prime_pixmap_back)->flip_seq;
     if (seq)
         ms_drm_abort_seq(crtc->scrn, seq);
+
+    drmmode_FiniSharedPixmapFeedback(drmmode);
 }
 
 static Bool drmmode_set_target_scanout_pixmap(xf86CrtcPtr crtc, PixmapPtr ppix,
-- 
2.7.4