1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56
|
From: Xi Ruoyao <xry111@xry111.site>
Date: Mon, 31 Mar 2025 14:03:44 +0800
Subject: extract/html: Don't assume a NUL-terminated string for the SAX
callback
The HTML parser or libxml2 used to always pass NUL-terminated string to
the callback, but it's not documented and it no longer happens with
libxml2-2.14.0.
Fixes #391.
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
(cherry picked from commit f3245004ecebf1a9829875bc6232fc94dddb2858)
---
src/tracker-extract/tracker-extract-html.c | 14 +++++---------
1 file changed, 5 insertions(+), 9 deletions(-)
diff --git a/src/tracker-extract/tracker-extract-html.c b/src/tracker-extract/tracker-extract-html.c
index d8afef1..583bb44 100644
--- a/src/tracker-extract/tracker-extract-html.c
+++ b/src/tracker-extract/tracker-extract-html.c
@@ -196,20 +196,16 @@ parser_characters (void *data,
switch (pd->current) {
case READ_TITLE:
- g_string_append (pd->title, ch);
+ g_string_append_len (pd->title, ch, len);
break;
case READ_IGNORE:
break;
default:
if (pd->in_body && pd->n_bytes_remaining > 0) {
- gsize text_len;
-
- text_len = strlen (ch);
-
if (tracker_text_validate_utf8 (ch,
- (pd->n_bytes_remaining < text_len ?
+ (pd->n_bytes_remaining < len ?
pd->n_bytes_remaining :
- text_len),
+ len),
&pd->plain_text,
NULL)) {
/* In the case of HTML, each string arriving this
@@ -219,8 +215,8 @@ parser_characters (void *data,
g_string_append_c (pd->plain_text, ' ');
}
- if (pd->n_bytes_remaining > text_len) {
- pd->n_bytes_remaining -= text_len;
+ if (pd->n_bytes_remaining > len) {
+ pd->n_bytes_remaining -= len;
} else {
pd->n_bytes_remaining = 0;
}
|