File: extract-html-Don-t-assume-a-NUL-terminated-string-for-the.patch

package info (click to toggle)
localsearch 3.8.2-7
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 16,688 kB
  • sloc: ansic: 59,411; python: 3,774; xml: 261; perl: 106; sh: 62; makefile: 53
file content (56 lines) | stat: -rw-r--r-- 1,940 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
From: Xi Ruoyao <xry111@xry111.site>
Date: Mon, 31 Mar 2025 14:03:44 +0800
Subject: extract/html: Don't assume a NUL-terminated string for the SAX
 callback

The HTML parser or libxml2 used to always pass NUL-terminated string to
the callback, but it's not documented and it no longer happens with
libxml2-2.14.0.

Fixes #391.

Signed-off-by: Xi Ruoyao <xry111@xry111.site>
(cherry picked from commit f3245004ecebf1a9829875bc6232fc94dddb2858)
---
 src/tracker-extract/tracker-extract-html.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/src/tracker-extract/tracker-extract-html.c b/src/tracker-extract/tracker-extract-html.c
index d8afef1..583bb44 100644
--- a/src/tracker-extract/tracker-extract-html.c
+++ b/src/tracker-extract/tracker-extract-html.c
@@ -196,20 +196,16 @@ parser_characters (void          *data,
 
 	switch (pd->current) {
 	case READ_TITLE:
-		g_string_append (pd->title, ch);
+		g_string_append_len (pd->title, ch, len);
 		break;
 	case READ_IGNORE:
 		break;
 	default:
 		if (pd->in_body && pd->n_bytes_remaining > 0) {
-			gsize text_len;
-
-			text_len = strlen (ch);
-
 			if (tracker_text_validate_utf8 (ch,
-			                                (pd->n_bytes_remaining < text_len ?
+			                                (pd->n_bytes_remaining < len ?
 			                                 pd->n_bytes_remaining :
-			                                 text_len),
+			                                 len),
 			                                &pd->plain_text,
 			                                NULL)) {
 				/* In the case of HTML, each string arriving this
@@ -219,8 +215,8 @@ parser_characters (void          *data,
 				g_string_append_c (pd->plain_text, ' ');
 			}
 
-			if (pd->n_bytes_remaining > text_len) {
-				pd->n_bytes_remaining -= text_len;
+			if (pd->n_bytes_remaining > len) {
+				pd->n_bytes_remaining -= len;
 			} else {
 				pd->n_bytes_remaining = 0;
 			}