Package: ruby-sanitize / 6.0.0-1.1+deb12u1

Always-remove-noscript-elements.patch Patch series | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
From: Ryan Grove <ryan@wonko.com>
Date: Thu, 26 Jan 2023 13:42:39 -0800
Subject: Always remove `<noscript>` elements
Origin: https://github.com/rgrove/sanitize/commit/ec14265e530dc3fe31ce2ef773594d3a97778d22
Bug-Debian: https://bugs.debian.org/1030047
Bug-Debian-Security: https://security-tracker.debian.org/tracker/CVE-2023-23627

...even if `noscript` is in the allowlist.

A `<noscript>` element's content is parsed differently in browsers
depending on whether or not scripting is enabled. Since Nokogiri doesn't
support scripting, it always parses `<noscript>` elements as if
scripting is disabled. This results in edge cases where it's not
possible to reliably sanitize the contents of a `<noscript>` element
because Nokogiri can't fully replicate the parsing behavior of a
scripting-enabled browser. The safest thing to do is to simply remove
all `<noscript>` elements.

Fixes GHSA-fw3g-2h3j-qmm7
---
 HISTORY.md                                 | 27 ++++++++++++++++++++++
 README.md                                  | 14 +++++++----
 lib/sanitize/transformers/clean_element.rb | 10 ++++++++
 test/test_clean_element.rb                 |  7 ++++++
 test/test_malicious_html.rb                | 20 +++++++++++++++-
 5 files changed, 73 insertions(+), 5 deletions(-)

diff --git a/README.md b/README.md
index 0f6efb56fde0..5f5e73816f68 100644
--- a/README.md
+++ b/README.md
@@ -12,10 +12,10 @@ properties, @ rules, and URL protocols in elements or attributes containing CSS.
 Any HTML or CSS that you don't explicitly allow will be removed.
 
 Sanitize is based on the [Nokogumbo HTML5 parser][nokogumbo], which parses HTML
-exactly the same way modern browsers do, and [Crass][crass], which parses CSS
-exactly the same way modern browsers do. As long as your allowlist config only
-allows safe markup and CSS, even the most malformed or malicious input will be
-transformed into safe output.
+the same way modern browsers do, and [Crass][crass], which parses CSS the same
+way modern browsers do. As long as your allowlist config only allows safe markup
+and CSS, even the most malformed or malicious input will be transformed into
+safe output.
 
 [![Gem Version](https://badge.fury.io/rb/sanitize.svg)](http://badge.fury.io/rb/sanitize)
 [![Tests](https://github.com/rgrove/sanitize/workflows/Tests/badge.svg)](https://github.com/rgrove/sanitize/actions?query=workflow%3ATests)
@@ -427,6 +427,12 @@ elements not in this array will be removed.
 >
 > By default, Sanitize will remove all MathML and SVG elements. If you add MathML or SVG elements to a custom element allowlist, you must assume that any content inside them will be allowed, even if that content would otherwise be removed or escaped by Sanitize. This may create a security vulnerability in your application.
 
+> **Note**
+>
+> Sanitize always removes `<noscript>` elements and their contents, even if `noscript` is in the allowlist.
+>
+> This is because a `<noscript>` element's content is parsed differently in browsers depending on whether or not scripting is enabled. Since Nokogiri doesn't support scripting, it always parses `<noscript>` elements as if scripting is disabled. This results in edge cases where it's not possible to reliably sanitize the contents of a `<noscript>` element because Nokogiri can't fully replicate the parsing behavior of a scripting-enabled browser.
+
 #### :parser_options (Hash)
 
 [Parsing options](https://github.com/rubys/nokogumbo/tree/master#parsing-options) to be supplied to `nokogumbo`.
diff --git a/lib/sanitize/transformers/clean_element.rb b/lib/sanitize/transformers/clean_element.rb
index a2bacd8198f7..98208a7c15f3 100644
--- a/lib/sanitize/transformers/clean_element.rb
+++ b/lib/sanitize/transformers/clean_element.rb
@@ -252,6 +252,16 @@ class Sanitize; module Transformers; class CleanElement
 
         node['content'] = node['content'].gsub(/;\s*charset\s*=.+\z/, ';charset=utf-8')
       end
+
+    # A `<noscript>` element's content is parsed differently in browsers
+    # depending on whether or not scripting is enabled. Since Nokogiri doesn't
+    # support scripting, it always parses `<noscript>` elements as if scripting
+    # is disabled. This results in edge cases where it's not possible to
+    # reliably sanitize the contents of a `<noscript>` element because Nokogiri
+    # can't fully replicate the parsing behavior of a scripting-enabled browser.
+    # The safest thing to do is to simply remove all `<noscript>` elements.
+    when 'noscript'
+      node.unlink
     end
   end
 
diff --git a/test/test_clean_element.rb b/test/test_clean_element.rb
index 1293cea17ba2..916b0f934251 100644
--- a/test/test_clean_element.rb
+++ b/test/test_clean_element.rb
@@ -541,5 +541,12 @@ describe 'Sanitize::Transformers::CleanElement' do
       )).must_equal "<html><head><meta http-equiv=\"Content-Type\" content=\"text/html;charset=utf-8\"></head><body>Howdy!</body></html>"
     end
 
+    it 'always removes `<noscript>` elements even if `noscript` is in the allowlist' do
+      assert_equal(
+        '',
+        Sanitize.fragment('<noscript>foo</noscript>', elements: ['noscript'])
+      )
+    end
+
   end
 end
diff --git a/test/test_malicious_html.rb b/test/test_malicious_html.rb
index 377e6deb0fb5..7b33f68880d3 100644
--- a/test/test_malicious_html.rb
+++ b/test/test_malicious_html.rb
@@ -244,7 +244,6 @@ describe 'Malicious HTML' do
     unescaped_content_elements = %w[
       noembed
       noframes
-      noscript
       plaintext
       script
       xmp
@@ -255,6 +254,7 @@ describe 'Malicious HTML' do
     ]
 
     removed_elements = %w[
+      noscript
       style
     ]
 
@@ -318,4 +318,22 @@ describe 'Malicious HTML' do
       end
     end
   end
+
+  describe 'sanitization bypass by exploiting scripting-disabled <noscript> behavior' do
+    before do
+      @s = Sanitize.new(
+        Sanitize::Config.merge(
+          Sanitize::Config::RELAXED,
+          elements: Sanitize::Config::RELAXED[:elements] + ['noscript']
+        )
+      )
+    end
+
+    it 'is prevented by removing `<noscript>` elements regardless of the allowlist' do
+      assert_equal(
+        '',
+        @s.fragment(%[<noscript><div id='</noscript>&lt;img src=x onerror=alert(1)&gt; '>])
+      )
+    end
+  end
 end
-- 
2.39.1