From: Ryan Grove <ryan@wonko.com>
Date: Thu, 26 Jan 2023 13:42:39 -0800
Subject: Always remove `<noscript>` elements
Origin: https://github.com/rgrove/sanitize/commit/ec14265e530dc3fe31ce2ef773594d3a97778d22
Bug-Debian: https://bugs.debian.org/1030047
Bug-Debian-Security: https://security-tracker.debian.org/tracker/CVE-2023-23627

...even if `noscript` is in the allowlist.

A `<noscript>` element's content is parsed differently in browsers
depending on whether or not scripting is enabled. Since Nokogiri doesn't
support scripting, it always parses `<noscript>` elements as if
scripting is disabled. This results in edge cases where it's not
possible to reliably sanitize the contents of a `<noscript>` element
because Nokogiri can't fully replicate the parsing behavior of a
scripting-enabled browser. The safest thing to do is to simply remove
all `<noscript>` elements.

Fixes GHSA-fw3g-2h3j-qmm7
---
 HISTORY.md                                 | 27 ++++++++++++++++++++++
 README.md                                  | 14 +++++++----
 lib/sanitize/transformers/clean_element.rb | 10 ++++++++
 test/test_clean_element.rb                 |  7 ++++++
 test/test_malicious_html.rb                | 20 +++++++++++++++-
 5 files changed, 73 insertions(+), 5 deletions(-)

diff --git a/README.md b/README.md
index 0f6efb56fde0..5f5e73816f68 100644
--- a/README.md
+++ b/README.md
@@ -12,10 +12,10 @@ properties, @ rules, and URL protocols in elements or attributes containing CSS.
 Any HTML or CSS that you don't explicitly allow will be removed.
 
 Sanitize is based on the [Nokogumbo HTML5 parser][nokogumbo], which parses HTML
-exactly the same way modern browsers do, and [Crass][crass], which parses CSS
-exactly the same way modern browsers do. As long as your allowlist config only
-allows safe markup and CSS, even the most malformed or malicious input will be
-transformed into safe output.
+the same way modern browsers do, and [Crass][crass], which parses CSS the same
+way modern browsers do. As long as your allowlist config only allows safe markup
+and CSS, even the most malformed or malicious input will be transformed into
+safe output.
 
 [![Gem Version](https://badge.fury.io/rb/sanitize.svg)](http://badge.fury.io/rb/sanitize)
 [![Tests](https://github.com/rgrove/sanitize/workflows/Tests/badge.svg)](https://github.com/rgrove/sanitize/actions?query=workflow%3ATests)
@@ -427,6 +427,12 @@ elements not in this array will be removed.
 >
 > By default, Sanitize will remove all MathML and SVG elements. If you add MathML or SVG elements to a custom element allowlist, you must assume that any content inside them will be allowed, even if that content would otherwise be removed or escaped by Sanitize. This may create a security vulnerability in your application.
 
+> **Note**
+>
+> Sanitize always removes `<noscript>` elements and their contents, even if `noscript` is in the allowlist.
+>
+> This is because a `<noscript>` element's content is parsed differently in browsers depending on whether or not scripting is enabled. Since Nokogiri doesn't support scripting, it always parses `<noscript>` elements as if scripting is disabled. This results in edge cases where it's not possible to reliably sanitize the contents of a `<noscript>` element because Nokogiri can't fully replicate the parsing behavior of a scripting-enabled browser.
+
 #### :parser_options (Hash)
 
 [Parsing options](https://github.com/rubys/nokogumbo/tree/master#parsing-options) to be supplied to `nokogumbo`.
diff --git a/lib/sanitize/transformers/clean_element.rb b/lib/sanitize/transformers/clean_element.rb
index a2bacd8198f7..98208a7c15f3 100644
--- a/lib/sanitize/transformers/clean_element.rb
+++ b/lib/sanitize/transformers/clean_element.rb
@@ -252,6 +252,16 @@ class Sanitize; module Transformers; class CleanElement
 
         node['content'] = node['content'].gsub(/;\s*charset\s*=.+\z/, ';charset=utf-8')
       end
+
+    # A `<noscript>` element's content is parsed differently in browsers
+    # depending on whether or not scripting is enabled. Since Nokogiri doesn't
+    # support scripting, it always parses `<noscript>` elements as if scripting
+    # is disabled. This results in edge cases where it's not possible to
+    # reliably sanitize the contents of a `<noscript>` element because Nokogiri
+    # can't fully replicate the parsing behavior of a scripting-enabled browser.
+    # The safest thing to do is to simply remove all `<noscript>` elements.
+    when 'noscript'
+      node.unlink
     end
   end
 
diff --git a/test/test_clean_element.rb b/test/test_clean_element.rb
index 1293cea17ba2..916b0f934251 100644
--- a/test/test_clean_element.rb
+++ b/test/test_clean_element.rb
@@ -541,5 +541,12 @@ describe 'Sanitize::Transformers::CleanElement' do
       )).must_equal "<html><head><meta http-equiv=\"Content-Type\" content=\"text/html;charset=utf-8\"></head><body>Howdy!</body></html>"
     end
 
+    it 'always removes `<noscript>` elements even if `noscript` is in the allowlist' do
+      assert_equal(
+        '',
+        Sanitize.fragment('<noscript>foo</noscript>', elements: ['noscript'])
+      )
+    end
+
   end
 end
diff --git a/test/test_malicious_html.rb b/test/test_malicious_html.rb
index 377e6deb0fb5..7b33f68880d3 100644
--- a/test/test_malicious_html.rb
+++ b/test/test_malicious_html.rb
@@ -244,7 +244,6 @@ describe 'Malicious HTML' do
     unescaped_content_elements = %w[
       noembed
       noframes
-      noscript
       plaintext
       script
       xmp
@@ -255,6 +254,7 @@ describe 'Malicious HTML' do
     ]
 
     removed_elements = %w[
+      noscript
       style
     ]
 
@@ -318,4 +318,22 @@ describe 'Malicious HTML' do
       end
     end
   end
+
+  describe 'sanitization bypass by exploiting scripting-disabled <noscript> behavior' do
+    before do
+      @s = Sanitize.new(
+        Sanitize::Config.merge(
+          Sanitize::Config::RELAXED,
+          elements: Sanitize::Config::RELAXED[:elements] + ['noscript']
+        )
+      )
+    end
+
+    it 'is prevented by removing `<noscript>` elements regardless of the allowlist' do
+      assert_equal(
+        '',
+        @s.fragment(%[<noscript><div id='</noscript>&lt;img src=x onerror=alert(1)&gt; '>])
+      )
+    end
+  end
 end
-- 
2.39.1

