File: pdo_mysql_quote_gbk_double_escape.phpt

package info (click to toggle)
php8.4 8.4.11-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 208,108 kB
  • sloc: ansic: 1,060,628; php: 35,345; sh: 11,866; cpp: 7,201; pascal: 4,913; javascript: 3,091; asm: 2,810; yacc: 2,411; makefile: 689; xml: 446; python: 301; awk: 148
file content (44 lines) | stat: -rw-r--r-- 2,026 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
--TEST--
PDO_MYSQL: Test quoting of multibyte sequence with GBK vs utf8mb4
--EXTENSIONS--
pdo_mysql
--SKIPIF--
<?php
require_once __DIR__ . '/inc/mysql_pdo_test.inc';
MySQLPDOTest::skip();
?>
--FILE--
<?php
    require_once __DIR__ . '/inc/mysql_pdo_test.inc';

    $link = MySQLPDOTest::factory('PDO', ['charset' => 'GBK']);
    $quoted = $link->quote("\xbf\x27");
    $quoted_without_outer_quotes = substr($quoted, 1, -1);

    /* This should result in 5C BF 5C 27 for GBK instead of BF 5C 27 like with UTF8MB4.
     * To explain why the extra escaping takes place, let's assume we don't do that and see what happens.
     *
     * 1. First iteration, i.e. *from == 0xBF. This isn't a valid GBK multibyte sequence start,
     *    so the mb validity check fails.
     *    Without the character length check, we'd check if we need to escape the current character 0xBF.
     *    The character 0xBF isn't handled in the switch case so we don't escape it and append 0xBF to the output buffer.
     * 2. Second iteration, i.e. *from == 0x27. This isn't a valid start either, so we go to the escape logic.
     *     Note that 0x27 is the character ', so we have to escape! We write two bytes to the output:
     *     \ (this is 0x5C) and ' (this is 0x27).
     * 3. The function finished, let's look at the output: 0xBF 0x5C 0x27.
     *    Now we actually made a problem: 0xBF 0x5C is a valid GBK multibyte sequence!
     *    So we transformed an invalid multibyte sequences into a valid one, potentially corrupting data.
     *    The solution is to check whether it could have been part of a multibyte sequence, but the checks are less strict. */
    var_dump(bin2hex($quoted_without_outer_quotes));

    unset($link);

    // Compare with utf8mb4
    $link = MySQLPDOTest::factory('PDO', ['charset' => 'utf8mb4']);
    $quoted = $link->quote("\xbf\x27");
    $quoted_without_outer_quotes = substr($quoted, 1, -1);
    var_dump(bin2hex($quoted_without_outer_quotes));
?>
--EXPECT--
string(8) "5cbf5c27"
string(6) "bf5c27"