1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73
|
Installation
============
This installs a package that can be used from Python (``import charset_normalizer``).
To install for all users on the system, administrator rights (root) may be required.
Using PIP
---------
Charset Normalizer can be installed from pip::
pip install charset-normalizer
You may retrieve the latest unicodedata backport as follow::
pip install charset-normalizer[unicode_backport]
From git via master
-----------------------
You can install from dev-master branch using git::
git clone https://github.com/Ousret/charset_normalizer.git
cd charset_normalizer/
python setup.py install
Basic Usage
===========
The new way
-----------
You may want to get right to it. ::
from charset_normalizer import from_bytes, from_path
# This is going to print out your sequence once properly decoded
print(
str(
from_bytes(
my_byte_str
).best()
)
)
# You could also want the same from a file
print(
str(
from_path(
'./data/sample.1.ar.srt'
).best()
)
)
Backward compatibility
----------------------
If you were used to python chardet, we are providing the very same ``detect()`` method as chardet.
This function is mostly backward-compatible with Chardet. The migration should be painless.
::
from charset_normalizer import detect
# This will behave exactly the same as python chardet
result = detect(my_byte_str)
if result['encoding'] is not None:
print('got', result['encoding'], 'as detected encoding')
You may upgrade your code with ease.
CTRL + R ``from chardet import detect`` to ``from charset_normalizer import detect``.
|