Package: ruby-sanitize / 6.0.0-1.1+deb12u1

Forcibly-escape-content-in-unescaped-text-elements-i.patch Patch series | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
From: Ryan Grove <ryan@wonko.com>
Date: Wed, 25 Jan 2023 17:42:24 -0800
Subject: Forcibly escape content in "unescaped text" elements inside math or
 svg namespaces
Origin: https://github.com/rgrove/sanitize/commit/b4ee521df0d0616340c9648444be488381c238b1

This fixes an edge case in which the contents of an "unescaped text"
element (such as `<noembed>` or `<xmp>`) were not properly escaped if
that element was allowlisted and was also inside an allowlisted `<math>`
or `<svg>` element.

The only way to encounter this situation was to ignore multiple warnings
in the readme and create a custom config that allowlisted all the
elements involved, including `<math>` or `<svg>`.

Please let this be a reminder that Sanitize cannot safely sanitize
MathML or SVG content and does not support this use case. The default
configs don't allow MathML or SVG elements, and allowlisting MathML or
SVG elements in a custom config may create a security vulnerability in
your application.

Documentation has been updated to add more warnings and to make the
existing warnings about this more prominent.
---
 HISTORY.md                                 | 25 +++++++
 README.md                                  | 19 ++---
 lib/sanitize/config/default.rb             |  5 ++
 lib/sanitize/transformers/clean_element.rb | 35 +++++++++
 test/test_malicious_html.rb                | 86 ++++++++++++++++++++++
 5 files changed, 161 insertions(+), 9 deletions(-)

diff --git a/README.md b/README.md
index 0f1f3fbfb352..0f6efb56fde0 100644
--- a/README.md
+++ b/README.md
@@ -72,10 +72,11 @@ Sanitize can sanitize the following types of input:
 * Standalone CSS stylesheets
 * Standalone CSS properties
 
-However, please note that Sanitize _cannot_ fully sanitize the contents of
-`<math>` or `<svg>` elements, since these elements don't follow the same parsing
-rules as the rest of HTML. If this is something you need, you may want to look
-for another solution.
+> **Warning**
+>
+> Sanitize cannot fully sanitize the contents of `<math>` or `<svg>` elements. MathML and SVG elements are [foreign elements](https://html.spec.whatwg.org/multipage/syntax.html#foreign-elements) that don't follow normal HTML parsing rules.
+>
+> By default, Sanitize will remove all MathML and SVG elements. If you add MathML or SVG elements to a custom element allowlist, you may create a security vulnerability in your application.
 
 ### HTML Fragments
 
@@ -420,11 +421,11 @@ elements not in this array will be removed.
 ]
 ```
 
-**Warning:** Sanitize cannot fully sanitize the contents of `<math>` or `<svg>`
-elements, since these elements don't follow the same parsing rules as the rest
-of HTML. If you add `math` or `svg` to the allowlist, you must assume that any
-content inside them will be allowed, even if that content would otherwise be
-removed by Sanitize.
+> **Warning**
+>
+> Sanitize cannot fully sanitize the contents of `<math>` or `<svg>` elements. MathML and SVG elements are [foreign elements](https://html.spec.whatwg.org/multipage/syntax.html#foreign-elements) that don't follow normal HTML parsing rules.
+>
+> By default, Sanitize will remove all MathML and SVG elements. If you add MathML or SVG elements to a custom element allowlist, you must assume that any content inside them will be allowed, even if that content would otherwise be removed or escaped by Sanitize. This may create a security vulnerability in your application.
 
 #### :parser_options (Hash)
 
diff --git a/lib/sanitize/config/default.rb b/lib/sanitize/config/default.rb
index 4dd1d22d01f7..d22c1e84c326 100644
--- a/lib/sanitize/config/default.rb
+++ b/lib/sanitize/config/default.rb
@@ -54,6 +54,11 @@ class Sanitize
 
       # HTML elements to allow. By default, no elements are allowed (which means
       # that all HTML will be stripped).
+      #
+      # Warning: Sanitize cannot safely sanitize the contents of foreign
+      # elements (elements in the MathML or SVG namespaces). Do not add `math`
+      # or `svg` to this list! If you do, you may create a security
+      # vulnerability in your application.
       :elements => [],
 
       # HTML parsing options to pass to Nokogumbo.
diff --git a/lib/sanitize/transformers/clean_element.rb b/lib/sanitize/transformers/clean_element.rb
index a73994b0f0a8..a2bacd8198f7 100644
--- a/lib/sanitize/transformers/clean_element.rb
+++ b/lib/sanitize/transformers/clean_element.rb
@@ -1,5 +1,6 @@
 # encoding: utf-8
 
+require 'cgi'
 require 'set'
 
 class Sanitize; module Transformers; class CleanElement
@@ -18,6 +19,18 @@ class Sanitize; module Transformers; class CleanElement
   # http://www.whatwg.org/specs/web-apps/current-work/multipage/elements.html#embedding-custom-non-visible-data-with-the-data-*-attributes
   REGEX_DATA_ATTR = /\Adata-(?!xml)[a-z_][\w.\u00E0-\u00F6\u00F8-\u017F\u01DD-\u02AF-]*\z/u
 
+  # Elements whose content is treated as unescaped text by HTML parsers.
+  UNESCAPED_TEXT_ELEMENTS = Set.new(%w[
+    iframe
+    noembed
+    noframes
+    noscript
+    plaintext
+    script
+    style
+    xmp
+  ])
+
   # Attributes that need additional escaping on `<a>` elements due to unsafe
   # libxml2 behavior.
   UNSAFE_LIBXML_ATTRS_A = Set.new(%w[
@@ -185,6 +198,28 @@ class Sanitize; module Transformers; class CleanElement
       @add_attributes[name].each {|key, val| node[key] = val }
     end
 
+    # Make a best effort to ensure that text nodes in invalid "unescaped text"
+    # elements that are inside a math or svg namespace are properly escaped so
+    # that they don't get parsed as HTML.
+    #
+    # Sanitize is explicitly documented as not supporting MathML or SVG, but
+    # people sometimes allow `<math>` and `<svg>` elements in their custom
+    # configs without realizing that it's not safe. This workaround makes it
+    # slightly less unsafe, but you still shouldn't allow `<math>` or `<svg>`
+    # because Nokogiri doesn't parse them the same way browsers do and Sanitize
+    # can't guarantee that their contents are safe.
+    unless node.namespace.nil?
+      prefix = node.namespace.prefix
+
+      if (prefix == 'math' || prefix == 'svg') && UNESCAPED_TEXT_ELEMENTS.include?(name)
+        node.children.each do |child|
+          if child.type == Nokogiri::XML::Node::TEXT_NODE
+            child.content = CGI.escapeHTML(child.content)
+          end
+        end
+      end
+    end
+
     # Element-specific special cases.
     case name
 
diff --git a/test/test_malicious_html.rb b/test/test_malicious_html.rb
index d0fc57bbc2ad..377e6deb0fb5 100644
--- a/test/test_malicious_html.rb
+++ b/test/test_malicious_html.rb
@@ -232,4 +232,90 @@ describe 'Malicious HTML' do
       end
     end
   end
+
+  # These tests cover an unsupported and unsafe custom config that allows MathML
+  # and SVG elements, which Sanitize's docs specifically say multiple times in
+  # big prominent warnings that you SHOULD NOT DO because Sanitize doesn't
+  # support MathML or SVG.
+  #
+  # Do not use the custom configs you see in these tests! If you do, you may be
+  # creating XSS vulnerabilities in your application.
+  describe 'foreign content bypass in unsafe custom config that allows MathML or SVG' do
+    unescaped_content_elements = %w[
+      noembed
+      noframes
+      noscript
+      plaintext
+      script
+      xmp
+    ]
+
+    removed_content_elements = %w[
+      iframe
+    ]
+
+    removed_elements = %w[
+      style
+    ]
+
+    before do
+      @s = Sanitize.new(
+        Sanitize::Config.merge(
+          Sanitize::Config::RELAXED,
+          elements: Sanitize::Config::RELAXED[:elements] +
+            unescaped_content_elements +
+            removed_content_elements +
+            %w[math svg]
+        )
+      )
+    end
+
+    unescaped_content_elements.each do |name|
+      it "forcibly escapes text content inside `<#{name}>` in a MathML namespace" do
+        assert_equal(
+          "<math><#{name}>&lt;img src=x onerror=alert(1)&gt;</#{name}></math>",
+          @s.fragment("<math><#{name}>&lt;img src=x onerror=alert(1)&gt;</#{name}>")
+        )
+      end
+
+      it "forcibly escapes text content inside `<#{name}>` in an SVG namespace" do
+        assert_equal(
+          "<svg><#{name}>&lt;img src=x onerror=alert(1)&gt;</#{name}></svg>",
+          @s.fragment("<svg><#{name}>&lt;img src=x onerror=alert(1)&gt;</#{name}>")
+        )
+      end
+    end
+
+    removed_content_elements.each do |name|
+      it "removes text content inside `<#{name}>` in a MathML namespace" do
+        assert_equal(
+          "<math><#{name}></#{name}></math>",
+          @s.fragment("<math><#{name}>&lt;img src=x onerror=alert(1)&gt;</#{name}>")
+        )
+      end
+
+      it "removes text content inside `<#{name}>` in an SVG namespace" do
+        assert_equal(
+          "<svg><#{name}></#{name}></svg>",
+          @s.fragment("<svg><#{name}>&lt;img src=x onerror=alert(1)&gt;</#{name}>")
+        )
+      end
+    end
+
+    removed_elements.each do |name|
+      it "removes `<#{name}>` elements in a MathML namespace" do
+        assert_equal(
+          '<math></math>',
+          @s.fragment("<math><#{name}>&lt;img src=x onerror=alert(1)&gt;</#{name}>")
+        )
+      end
+
+      it "removes `<#{name}>` elements in an SVG namespace" do
+        assert_equal(
+          '<svg></svg>',
+          @s.fragment("<svg><#{name}>&lt;img src=x onerror=alert(1)&gt;</#{name}>")
+        )
+      end
+    end
+  end
 end
-- 
2.39.1