File: index.rst

package info (click to toggle)
python-charset-normalizer 3.4.3-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 712 kB
  • sloc: python: 5,434; makefile: 25; sh: 17
file content (88 lines) | stat: -rwxr-xr-x 2,170 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
===================
 Charset Normalizer
===================

Overview
========

A Library that helps you read text from unknown charset encoding.
This project is motivated by chardet, I'm trying to resolve the issue by taking another approach.
All IANA character set names for which the Python core library provides codecs are supported.

It aims to be as generic as possible.


It is released under MIT license, see LICENSE for more
details. Be aware that no warranty of any kind is provided with this package.

Copyright (C) 2025 Ahmed TAHRI <tahri(dot)ahmed(at)proton.me>

Introduction
============

This library aim to assist you in finding what encoding suit the best to content.
It **DOES NOT** try to uncover the originating encoding, in fact this program does not care about it.

By originating we means the one that was precisely used to encode a text file.

Precisely ::

    my_byte_str = 'Bonjour, je suis à la recherche d\'une aide sur les étoiles'.encode('cp1252')


We **ARE NOT** looking for cp1252 **BUT FOR** ``Bonjour, je suis à la recherche d'une aide sur les étoiles``.
Because of this ::

    my_byte_str.decode('cp1252') == my_byte_str.decode('cp1256') == my_byte_str.decode('cp1258') == my_byte_str.decode('iso8859_14')
    # Print True !

There is no wrong answer to decode ``my_byte_str`` to get the exact same result.
This is where this library differ from others. There's not specific probe per encoding table.

Features
========

- Encoding detection on a fp (file pointer), bytes or PathLike.
- Transpose any encoded content to Unicode the best we can.
- Detect spoken language in text.
- Ship with a great CLI.
- Also, detect binaries.

Start Guide
-----------

.. toctree::
    :maxdepth: 2

    user/support
    user/getstarted
    user/advanced_search
    user/handling_result
    user/miscellaneous
    user/cli

Community Guide
---------------

.. toctree::
    :maxdepth: 2

    community/speedup
    community/faq
    community/why_migrate
    community/featured

Developer Guide
---------------

.. toctree::
    :maxdepth: 3

    api

Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`