From: Ryan Grove Date: Thu, 26 Jan 2023 13:42:39 -0800 Subject: Always remove `

` element's content is parsed differently in browsers depending on whether or not scripting is enabled. Since Nokogiri doesn't support scripting, it always parses `<noscript>` elements as if scripting is disabled. This results in edge cases where it's not possible to reliably sanitize the contents of a `<noscript>` element because Nokogiri can't fully replicate the parsing behavior of a scripting-enabled browser. The safest thing to do is to simply remove all `<noscript>` elements. Fixes GHSA-fw3g-2h3j-qmm7 --- HISTORY.md | 27 ++++++++++++++++++++++ README.md | 14 +++++++---- lib/sanitize/transformers/clean_element.rb | 10 ++++++++ test/test_clean_element.rb | 7 ++++++ test/test_malicious_html.rb | 20 +++++++++++++++- 5 files changed, 73 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 0f6efb56fde0..5f5e73816f68 100644 --- a/README.md +++ b/README.md @@ -12,10 +12,10 @@ properties, @ rules, and URL protocols in elements or attributes containing CSS. Any HTML or CSS that you don't explicitly allow will be removed. Sanitize is based on the [Nokogumbo HTML5 parser][nokogumbo], which parses HTML -exactly the same way modern browsers do, and [Crass][crass], which parses CSS -exactly the same way modern browsers do. As long as your allowlist config only -allows safe markup and CSS, even the most malformed or malicious input will be -transformed into safe output. +the same way modern browsers do, and [Crass][crass], which parses CSS the same +way modern browsers do. As long as your allowlist config only allows safe markup +and CSS, even the most malformed or malicious input will be transformed into +safe output. [![Gem Version](https://badge.fury.io/rb/sanitize.svg)](http://badge.fury.io/rb/sanitize) [![Tests](https://github.com/rgrove/sanitize/workflows/Tests/badge.svg)](https://github.com/rgrove/sanitize/actions?query=workflow%3ATests) @@ -427,6 +427,12 @@ elements not in this array will be removed. > > By default, Sanitize will remove all MathML and SVG elements. If you add MathML or SVG elements to a custom element allowlist, you must assume that any content inside them will be allowed, even if that content would otherwise be removed or escaped by Sanitize. This may create a security vulnerability in your application. +> **Note** +> +> Sanitize always removes `<noscript>` elements and their contents, even if `noscript` is in the allowlist. +> +> This is because a `<noscript>` element's content is parsed differently in browsers depending on whether or not scripting is enabled. Since Nokogiri doesn't support scripting, it always parses `<noscript>` elements as if scripting is disabled. This results in edge cases where it's not possible to reliably sanitize the contents of a `<noscript>` element because Nokogiri can't fully replicate the parsing behavior of a scripting-enabled browser. + #### :parser_options (Hash) [Parsing options](https://github.com/rubys/nokogumbo/tree/master#parsing-options) to be supplied to `nokogumbo`. diff --git a/lib/sanitize/transformers/clean_element.rb b/lib/sanitize/transformers/clean_element.rb index a2bacd8198f7..98208a7c15f3 100644 --- a/lib/sanitize/transformers/clean_element.rb +++ b/lib/sanitize/transformers/clean_element.rb @@ -252,6 +252,16 @@ class Sanitize; module Transformers; class CleanElement node['content'] = node['content'].gsub(/;\s*charset\s*=.+\z/, ';charset=utf-8') end + + # A `<noscript>` element's content is parsed differently in browsers + # depending on whether or not scripting is enabled. Since Nokogiri doesn't + # support scripting, it always parses `<noscript>` elements as if scripting + # is disabled. This results in edge cases where it's not possible to + # reliably sanitize the contents of a `<noscript>` element because Nokogiri + # can't fully replicate the parsing behavior of a scripting-enabled browser. + # The safest thing to do is to simply remove all `<noscript>` elements. + when 'noscript' + node.unlink end end diff --git a/test/test_clean_element.rb b/test/test_clean_element.rb index 1293cea17ba2..916b0f934251 100644 --- a/test/test_clean_element.rb +++ b/test/test_clean_element.rb @@ -541,5 +541,12 @@ describe 'Sanitize::Transformers::CleanElement' do )).must_equal "<html><head><meta http-equiv=\"Content-Type\" content=\"text/html;charset=utf-8\"></head><body>Howdy!</body></html>" end + it 'always removes `<noscript>` elements even if `noscript` is in the allowlist' do + assert_equal( + '', + Sanitize.fragment('<noscript>foo

', elements: ['noscript']) + ) + end + end end diff --git a/test/test_malicious_html.rb b/test/test_malicious_html.rb index 377e6deb0fb5..7b33f68880d3 100644 --- a/test/test_malicious_html.rb +++ b/test/test_malicious_html.rb @@ -244,7 +244,6 @@ describe 'Malicious HTML' do unescaped_content_elements = %w[ noembed noframes - noscript plaintext script xmp @@ -255,6 +254,7 @@ describe 'Malicious HTML' do ] removed_elements = %w[ + noscript style ] @@ -318,4 +318,22 @@ describe 'Malicious HTML' do end end end + + describe 'sanitization bypass by exploiting scripting-disabled

` elements regardless of the allowlist' do + assert_equal( + '', + @s.fragment(%[<noscript><div id='

<img src=x onerror=alert(1)> '>]) + ) + end + end end -- 2.39.1