File: text-encoding.rst

package info (click to toggle)

copyq 13.0.0-1

links: PTS, VCS
area: main
in suites: forky, sid
size: 12,964 kB
sloc: cpp: 63,306; sh: 992; xml: 452; python: 293; ruby: 152; makefile: 27; javascript: 25

file content (26 lines) | stat: -rw-r--r-- 1,131 bytes

parent folder | download | duplicates (6)

Text Encoding
=============

This page serves as concept for adding additional CopyQ command line
switch to print and read texts in UTF-8 (i.e. without using system
encoding).

Every time the bytes are read from a command (standard output or
arguments from client) the input is expected to be either just series of
bytes or text in system encoding (possibly Latin1 on Windows). But
texts/strings in CopyQ and in clipboard are UTF-8 formatted (except some
MIME types with specified encoding).

When reading system-encoded text (MIME starts with "text/") CopyQ
re-encodes the data from system encoding to UTF-8. That's not a problem
if the received data is really in system encoding. But if you send data
from Perl with the UTF-8 switch, CopyQ must also know that UTF-8 is used
instead of system encoding.

The same goes for other way. CopyQ sends texts back to client or to a
command in system encoding so it needs to convert these texts from
UTF-8.

As for the re-encoding part, Qt does nice job transforming characters
from UTF-8 but of course for lot of characters in UTF-8 there is no
alternative in Latin1 and other encodings.