1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511
|
.. emoji documentation master file, created by sphinx-quickstart.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
.. py:module:: emoji
:noindex:
.. A setup code block. This code is not shown in the output for other builders,
but executed before the doctests of the group(s) it belongs to.
.. testsetup:: *
import emoji
from pprint import pprint
emoji
=====
Release v\ |version|. (:ref:`Installation <install>`)
emoji supports Python 3.7+. The last version to support Python 2.7 and 3.5 was v2.4.0.
.. contents:: Table of Contents
Usage and Examples
------------------
The main purpose of this package is converting Unicode emoji to emoji names and vice
versa with :func:`emojize` and :func:`demojize`.
The entire set of Emoji codes as defined by the `Unicode consortium <https://unicode.org/emoji/charts/full-emoji-list.html>`__
is supported in addition to a bunch of `aliases <https://www.webfx.com/tools/emoji-cheat-sheet/>`__.
By default, only the official list is enabled but doing ``emoji.emojize(language='alias')``
enables both the full list and aliases.
.. doctest::
>>> print(emoji.emojize('Python is :thumbs_up:'))
Python is π
>>> print(emoji.emojize('Python is :thumbsup:', language='alias'))
Python is π
>>> print(emoji.demojize('Python is π'))
Python is :thumbs_up:
>>> print(emoji.emojize("Python is fun :red_heart:", variant="text_type"))
Python is fun β€οΈ
>>> print(emoji.emojize("Python is fun :red_heart:", variant="emoji_type"))
Python is fun β€οΈ
..
Languages
^^^^^^^^^
By default, the language is English (``language='en'``) but also supported languages are:
Spanish (``'es'``), Portuguese (``'pt'``), Italian (``'it'``), French (``'fr'``), German (``'de'``), Farsi/Persian (``'fa'``)
.. doctest::
>>> print(emoji.emojize('Python es :pulgar_hacia_arriba:', language='es'))
Python es π
>>> print(emoji.demojize('Python es π', language='es'))
Python es :pulgar_hacia_arriba:
>>> print(emoji.emojize("Python Γ© :polegar_para_cima:", language='pt'))
Python Γ© π
>>> print(emoji.demojize("Python Γ© π", language='pt'))
Python Γ© :polegar_para_cima:
..
If you want to access the emoji names of a language directly,
you can load the language data and then access it in the :data:`EMOJI_DATA` dict
by using the language code as a key:
.. doctest::
>>> emoji.config.load_language('es')
>>> print(emoji.EMOJI_DATA['π']['es'])
:pulgar_hacia_arriba:
..
Note: Not all emoji have names in all languages
Extracting emoji
^^^^^^^^^^^^^^^^
The function :func:`analyze` finds all emoji in string and yields the emoji
together with its position and the available meta information about the emoji.
:func:`analyze` returns a generator that yields each emoji, so you need to iterate or
convert the output to a list.
.. doctest::
>>> first_token = next(emoji.analyze('Python is π'))
Token(chars='π', value=EmojiMatch(π, 10:11))
>>> emoji_match = first_token.value
EmojiMatch(π, 10:11)
>>> emoji_match.data
{'en': ':thumbs_up:', 'status': 2, 'E': 0.6, 'alias': [':thumbsup:', ':+1:'], 'variant': True, 'de': ':daumen_hoch:', 'es': ':pulgar_hacia_arriba:', 'fr': ':pouce_vers_le_haut:', 'ja': ':γ΅γ γΊγ’γγ:', 'ko': ':μ¬λ¦°_μμ§:', 'pt': ':polegar_para_cima:', 'it': ':pollice_in_su:', 'fa': ':ΩΎΨ³ΩΨ―ΫΨ―Ω:', 'id': ':jempol_ke_atas:', 'zh': ':ζζεδΈ:'}
>>> list(emoji.analyze('A π©βπ aboard a π'))
[Token(chars='π©\u200dπ', value=EmojiMatch(π©βπ, 2:5)), Token(chars='π', value=EmojiMatch(π, 15:16))]
>>> list(emoji.analyze('Aπ©βπBπ', non_emoji=True))
[Token(chars='A', value='A'), Token(chars='π©\u200dπ', value=EmojiMatch(π©βπ, 1:4)), Token(chars='B', value='B'), Token(chars='π', value=EmojiMatch(π, 5:6))]
..
The parameter ``join_emoji`` controls whether `non-RGI emoji <#non-rgi-zwj-emoji>`_ are handled as a single token or as multiple emoji:
.. doctest::
>>> list(emoji.analyze('π¨βπ©πΏβπ§π»βπ¦πΎ', join_emoji=True))
[Token(chars='π¨\u200dπ©πΏ\u200dπ§π»\u200dπ¦πΎ', value=EmojiMatchZWJNonRGI(π¨βπ©πΏβπ§π»βπ¦πΎ, 0:10))]
>>> list(emoji.analyze('π¨βπ©πΏβπ§π»βπ¦πΎ', join_emoji=False))
[Token(chars='π¨', value=EmojiMatch(π¨, 0:1)), Token(chars='π©πΏ', value=EmojiMatch(π©πΏ, 2:4)), Token(chars='π§π»', value=EmojiMatch(π§π», 5:7)), Token(chars='π¦πΎ', value=EmojiMatch(π¦πΎ, 8:10))]
..
The function :func:`emoji_list` finds all emoji in string and their position.
Keep in mind that an emoji can span over multiple characters:
.. doctest::
>>> emoji.emoji_list('Python is π')
[{'match_start': 10, 'match_end': 11, 'emoji': 'π'}]
>>> emoji.emoji_list('A π©βπ aboard a π')
[{'match_start': 2, 'match_end': 5, 'emoji': 'π©βπ'}, {'match_start': 15, 'match_end': 16, 'emoji': 'π'}]
..
To retrieve the distinct set of emoji from a string, use :func:`distinct_emoji_list`:
.. code-block:: python
>>> emoji.distinct_emoji_list('Some emoji: π, π, π, π, π, π¦οΈ')
['π', 'π', 'π¦οΈ', 'π']
..
To count the number of emoji in a string, use :func:`emoji_count`:
.. doctest::
>>> emoji.emoji_count('Some emoji: π, π, π, π, π, π¦οΈ')
6
>>> emoji.emoji_count('Some emoji: π, π, π, π, π, π¦οΈ', unique=True)
4
..
You can check if a string is a single, valid emoji with :func:`is_emoji`
.. doctest::
>>> emoji.is_emoji('π')
True
>>> emoji.is_emoji('ππ')
False
>>> emoji.is_emoji('test')
False
..
While dealing with emojis, it is generally a bad idea to look at individual characters.
Unicode contains modifier characters, such as variation selectors, which are not emojis themselves
and modify the preceding emoji instead. You can check if a string has only emojis in it with :func:`purely_emoji`
.. doctest::
>>> '\U0001f600\ufe0f'
'π'
>>> emoji.is_emoji('\U0001f600\ufe0f')
False
>>> emoji.is_emoji('\U0001f600')
True
>>> emoji.is_emoji('\ufe0f')
False
>>> emoji.purely_emoji('\U0001f600\ufe0f')
True
..
To get more information about an emoji, you can look it up in the :data:`EMOJI_DATA` dict:
.. testcode::
pprint(emoji.EMOJI_DATA['π'])
..
.. testoutput::
{'E': 0.7,
'alias': [':earth_africa:'],
'de': ':globus_mit_europa_und_afrika:',
'en': ':globe_showing_Europe-Africa:',
'es': ':globo_terrΓ‘queo_mostrando_europa_y_Γ‘frica:',
'fr': ':globe_tournΓ©_sur_lβafrique_et_lβeurope:',
'it': ':europa_e_africa:',
'pt': ':globo_mostrando_europa_e_Γ‘frica:',
'status': 2,
'variant': True}
..
``'E'`` is the :ref:`Emoji version <Emoji version>`.
``'status'`` is defined in :data:`STATUS`. For example ``2`` corresponds
to ``'fully_qualified'``. More information on the meaning can be found in the
Unicode Standard http://www.unicode.org/reports/tr51/#Emoji_Variation_Selector_Notes
Replacing and removing emoji
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
With :func:`replace_emoji` you can replace, filter, escape or remove emoji in a string:
.. code-block:: python
>>> emoji.replace_emoji('Python is π', replace='')
'Python is '
>>> emoji.replace_emoji('Python is π', replace='π')
'Python is π'
>>> def unicode_escape(chars, data_dict):
>>> return chars.encode('unicode-escape').decode()
>>> emoji.replace_emoji('Python is π', replace=unicode_escape)
'Python is \U0001f44d'
>>> def xml_escape(chars, data_dict):
>>> return chars.encode('ascii', 'xmlcharrefreplace').decode()
>>> emoji.replace_emoji('Python is π', replace=xml_escape)
'Python is 👍'
>>> emoji.replace_emoji('Python is π', replace=lambda chars, data_dict: chars.encode('ascii', 'namereplace').decode())
'Python is \N{THUMBS UP SIGN}'
>>> emoji.replace_emoji('Python is π', replace=lambda chars, data_dict: data_dict['es'])
'Python is :pulgar_hacia_arriba:'
..
Emoji versions
^^^^^^^^^^^^^^
The parameter ``version`` in :func:`replace_emoji` allows to replace only emoji above
that :ref:`Emoji version <Emoji version>` to prevent incompatibility with older platforms.
For the functions :func:`emojize` and :func:`demojize` the parameter ``version`` will
replace emoji above the specified version with the value of the parameter ``handle_version``.
It defaults to an empty string, but can be set to any string or a function that returns a string.
For example the ``:croissant:`` π₯ emoji was added in Emoji 3.0 (Unicode 9.0) in 2016 and
``:T-Rex:`` π¦ was added later in Emoji 5.0 (Unicode 10.0) in 2017:
.. doctest::
>>> emoji.replace_emoji('A π¦ is eating a π₯', replace='[Unsupported emoji]', version=1.0)
'A [Unsupported emoji] is eating a [Unsupported emoji]'
>>> emoji.replace_emoji('A π¦ is eating a π₯', replace=lambda chars, data_dict: data_dict['en'], version=3.0)
'A :T-Rex: is eating a π₯'
>>> emoji.emojize('A :T-Rex: is eating a :croissant:', version=3.0)
'A is eating a π₯'
>>> emoji.emojize('A :T-Rex: is eating a :croissant:', version=3.0, handle_version='[Unsupported emoji]')
'A [Unsupported emoji] is eating a π₯'
>>> emoji.demojize('A π¦ is eating a π₯', version=3.0)
'A is eating a :croissant:'
>>> emoji.replace_emoji('A π¦ is eating a π₯', replace='', version=5.0)
'A π¦ is eating a π₯'
..
You can find the version of an emoji with :func:`version`:
.. doctest::
>>> emoji.version('π₯')
3
>>> emoji.version('ποΈββοΈ')
4
>>> emoji.version('π¦')
5
..
Non-RGI ZWJ emoji
^^^^^^^^^^^^^^^^^
Some emoji contain multiple persons and each person can have an individual skin tone.
Unicode supports `Multi-Person Skin Tones <http://www.unicode.org/reports/tr51/#multiperson_skintones>`__ as of Emoji 11.0.
Skin tones can be add to the nine characters known as `Multi-Person Groupings <https://www.unicode.org/reports/tr51/#MultiPersonGroupingsTable>`__.
Multi-person groups with different skin tones can be represented with Unicode, but are not yet RGI (recommended for general interchange). This means Unicode.org recommends not to show them in emoji keyboards.
However some browser and platforms already support some of them:
.. figure:: 1F468-200D-1F469-1F3FF-200D-1F467-1F3FB-200D-1F466-1F3FE.png
:height: 4em
:alt: A family emoji π¨βπ©πΏβπ§π»βπ¦πΎ with four different skin tone values
The emoji π¨βπ©πΏβπ§π»βπ¦πΎ as it appears in Firefox on Windows 11
It consists of eleven Unicode characters, four person emoji, four different skin tones joined together by three ``\u200d`` **Z**\ ero-\ **W**\ idth **J**\ oiner:
#. π¨ ``:man:``
#. π½ ``:medium_skin_tone:``
#. ``\u200d``
#. π© ``:woman:``
#. πΏ ``:dark_skin_tone:``
#. ``\u200d``
#. π§ ``:girl:``
#. π» ``:light_skin_tone:``
#. ``\u200d``
#. π¦ ``:boy:``
#. πΎ ``:medium-dark_skin_tone:``
On platforms that don't support it, it might appear as separate emoji: π¨π½π©πΏπ§π»π¦πΎ
In the module configuration :class:`config` you can control how such emoji are handled.
Migrating to version 2.0.0
--------------------------
There a two major, breaking changes in version 2.0.0
non-English short codes
^^^^^^^^^^^^^^^^^^^^^^^
The names of emoji in non-English languages have changed, because the data files were updated to
the new version 41. See https://cldr.unicode.org/index/downloads.
That means some ``:short-code-emoji:`` with non-English names will no longer work in 2.0.0.
:func:`emojize` will ignore the old codes.
This may be a problem if you have previously stored ``:short-code-emoji:`` with non-English names
for example in a database or if your users have stored them.
Regular expression
^^^^^^^^^^^^^^^^^^
The function ``get_emoji_regexp()`` was removed in 2.0.0. Internally the module no longer uses
a regular expression when scanning for emoji in a string (e.g. in :func:`demojize`).
The regular expression was slow in Python 3 and it failed to correctly find certain combinations
of long emoji (emoji consisting of multiple Unicode codepoints).
If you used the regular expression to remove emoji from strings, you can use :func:`replace_emoji`
as shown in the examples above.
If you want to extract emoji from strings, you can use :func:`emoji_list` as a replacement.
If you want to keep using a regular expression despite its problems, you can create the
expression yourself like this:
.. testcode::
import re
import emoji
def get_emoji_regexp():
# Sort emoji by length to make sure multi-character emojis are
# matched first
emojis = sorted(emoji.EMOJI_DATA, key=len, reverse=True)
pattern = '(' + '|'.join(re.escape(u) for u in emojis) + ')'
return re.compile(pattern)
exp = get_emoji_regexp()
print(exp.sub(repl='[emoji]', string='A ποΈββοΈ is eating a π₯'))
..
Output:
.. testoutput::
A [emoji] is eating a [emoji]
..
Common problems
---------------
.. code-block::
UnicodeWarning: Unicode unequal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
..
This exception is thrown in Python 2.7 if you passed a ``str`` string instead of a
``unicode`` string.
You should only pass Unicode strings to this module.
See https://python.readthedocs.io/en/v2.7.2/howto/unicode.html#the-unicode-type for more
information on Unicode in Python 2.7.
The API documentation
---------------------
Reference documentation of all functions and properties in the module:
.. toctree::
:titlesonly:
api
+-------------------------------+--------------------------------------------------------------+
| API Reference | |
+===============================+==============================================================+
| **Functions:** | |
+-------------------------------+--------------------------------------------------------------+
| :func:`emojize` | Replace emoji names with Unicode codes |
+-------------------------------+--------------------------------------------------------------+
| :func:`demojize` | Replace Unicode emoji with emoji shortcodes |
+-------------------------------+--------------------------------------------------------------+
| :func:`analyze` | Find Unicode emoji in a string |
+-------------------------------+--------------------------------------------------------------+
| :func:`replace_emoji` | Replace Unicode emoji with a customizable string |
+-------------------------------+--------------------------------------------------------------+
| :func:`emoji_list` | Location of all emoji in a string |
+-------------------------------+--------------------------------------------------------------+
| :func:`distinct_emoji_list` | Distinct list of emojis in the string |
+-------------------------------+--------------------------------------------------------------+
| :func:`emoji_count` | Number of emojis in a string |
+-------------------------------+--------------------------------------------------------------+
| :func:`is_emoji` | Check if a string/character is a single emoji |
+-------------------------------+--------------------------------------------------------------+
| :func:`purely_emoji` | Check if a string contains only emojis |
+-------------------------------+--------------------------------------------------------------+
| :func:`version` | Find Unicode/Emoji version of an emoji |
+-------------------------------+--------------------------------------------------------------+
| **Module variables:** | |
+-------------------------------+--------------------------------------------------------------+
| :data:`EMOJI_DATA` | Dict of all emoji |
+-------------------------------+--------------------------------------------------------------+
| :data:`STATUS` | Dict of Unicode/Emoji status |
+-------------------------------+--------------------------------------------------------------+
| :class:`config` | Module wide configuration |
+-------------------------------+--------------------------------------------------------------+
| **Classes:** | |
+-------------------------------+--------------------------------------------------------------+
| :class:`EmojiMatch` | |
+-------------------------------+--------------------------------------------------------------+
| :class:`EmojiMatchZWJ` | |
+-------------------------------+--------------------------------------------------------------+
| :class:`EmojiMatchZWJNonRGI` | |
+-------------------------------+--------------------------------------------------------------+
| :class:`Token` | |
+-------------------------------+--------------------------------------------------------------+
Links
=====
**Overview of all emoji:**
`https://carpedm20.github.io/emoji/ <https://carpedm20.github.io/emoji/>`__
(auto-generated list of the emoji that are supported by the current version of this package)
**For English:**
`Emoji Cheat Sheet <https://www.webfx.com/tools/emoji-cheat-sheet/>`__
`Official Unicode list <http://www.unicode.org/emoji/charts/full-emoji-list.html>`__
**For Spanish:**
`Unicode list <https://emojiterra.com/es/lista-es/>`__
**For Portuguese:**
`Unicode list <https://emojiterra.com/pt/lista/>`__
**For Italian:**
`Unicode list <https://emojiterra.com/it/lista-it/>`__
**For French:**
`Unicode list <https://emojiterra.com/fr/liste-fr/>`__
**For German:**
`Unicode list <https://emojiterra.com/de/liste/>`__
Indices and tables
==================
.. toctree::
:maxdepth: 2
install
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
|