File: miscellaneous.rst

package info (click to toggle)
python-charset-normalizer 3.4.3-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 712 kB
  • sloc: python: 5,434; makefile: 25; sh: 17
file content (64 lines) | stat: -rw-r--r-- 2,027 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
==============
 Miscellaneous
==============

Convert to str
--------------

Any ``CharsetMatch`` object can be transformed to exploitable ``str`` variable.

 ::

    my_byte_str = 'Bсеки човек има право на образование.'.encode('cp1251')

    # Assign return value so we can fully exploit result
    result = from_bytes(
        my_byte_str
    ).best()

    # This should print 'Bсеки човек има право на образование.'
    print(str(result))


Logging
-------

Prior to the version 2.0.11 you may encounter some unexpected logs in your streams.
Something along the line of:

 ::

    ... | WARNING | override steps (5) and chunk_size (512) as content does not fit (465 byte(s) given) parameters.
    ... | INFO | ascii passed initial chaos probing. Mean measured chaos is 0.000000 %
    ... | INFO | ascii should target any language(s) of ['Latin Based']


It is most likely because you altered the root getLogger instance. The package has its own logic behind logging and why
it is useful. See https://docs.python.org/3/howto/logging.html to learn the basics.

If you are looking to silence and/or reduce drastically the amount of logs, please upgrade to the latest version
available for `charset-normalizer` using your package manager or by `pip install charset-normalizer -U`.

The latest version will no longer produce any entry greater than `DEBUG`.
On `DEBUG` only one entry will be observed and that is about the detection result.

Then regarding the others log entries, they will be pushed as `Level 5`. Commonly known as TRACE level, but we do
not register it globally.


Detect binaries
---------------

This package offers a neat way to detect files that can be considered as 'binaries'
meaning that it is not likely to be a text-file.

 ::

    from charset_normalizer import is_binary

    # It can receive both a path or bytes or even a file pointer.
    result = is_binary("./my-file.ext")

    # This should print 'True' or 'False'
    print(result)