File: decodemail.texi

package info (click to toggle)
mailutils 1%3A3.15-4
links: PTS, VCS
area: main
in suites: bookworm
size: 36,672 kB
sloc: ansic: 181,539; sh: 110,509; yacc: 7,458; cpp: 3,834; makefile: 3,152; lex: 1,972; python: 1,617; exp: 1,562; awk: 152; lisp: 132; sed: 31
file content (227 lines) | stat: -rw-r--r-- 8,146 bytes
@c This is part of the GNU Mailutils manual.
@c Copyright (C) 2020--2022 Free Software Foundation, Inc.
@c See file mailutils.texi for copying conditions.
@comment *******************************************************************
@pindex decodemail

The @command{decodemail} utility is a filter program that reads
messages from the input mailbox, decodes ``textual'' parts of each
multipart message from a base64- or quoted-printable encoding to an
8-bit or 7-bit transfer encoding, and stores the processed messages in
the output mailbox. All messages from the input mailbox are stored in
the output, regardless of whether a change was made.

The message parts deemed to be textual are those whose
@samp{Content-Type} header matches a predefined, or user-defined,
mime type pattern. In addition, encoded pieces of the @samp{From:},
@samp{To:}, @samp{Subject:}, etc., headers are decoded.

For example, @command{decodemail} makes this transformation:

@example
Subject: =?utf-8?Q?The=20Baroque=20Enquirer=20|=20July=202020?=
@result{} Subject: The Baroque Enquirer | July 2020
@end example

The built-in list of textual content type patterns is:

@example
text/*
application/*shell
application/shellscript
*/x-csrc
*/x-csource
*/x-diff
*/x-patch
*/x-perl
*/x-php
*/x-python
*/x-sh
@end example

These strings are matched as shell globbing patterns
(@pxref{glob,,,glob(7), glob(7) manual page}).

More patterns can be added to this list using the
@code{mime.text-type} configuration statement.
@xref{mime statement}, for a detailed discussion, and the
configuration section below for a simple example.

When processing old mesages you may encounter @samp{Content-Type}
headers whose value contains only type, but no subtype.  To match
such headers, use the pattern without @samp{/whatever} part.  E.g.
@samp{text/*} matches @samp{text/plain} and @samp{text/html}, but
does not match @samp{text}.  On the other hand, @samp{t*xt} does
not match @samp{text/plain}, but does match @samp{text}.

Optionally, the decoded parts can be converted to another character
set. By default, the character set is not changed.

@menu
* Opt-decodemail::   Invocation of @command{decodemail}.
* Conf-decodemail::  Configuration of @command{decodemail}.
* Using-decodemail:: Purpose and caveats of @command{decodemail}.
@end menu

@node Opt-decodemail
@subsection Invocation of @command{decodemail}.

Usually, the utility is invoked as:

@example
decodemail @var{inbox} @var{outbox}
@end example

@noindent
where @var{inbox} and @var{outbox} are file names or URLs of the input
and output mailboxes, correspondingly.  The input mailbox is opened
read-only and will not be modified in any way.  In particular, the
status of the processed messages will not change.  If the output
mailbox does not exist, it will be created.  If it exists, the
messages will be appended to it, preserving any original messages that
are already in it.  This behavior can be changed using the @option{-t}
(@option{--truncate}) option, described below.

The two mailboxes can be of different types.  For example you can read
input from an imap server and store it in local @samp{maildir} box
using the following command:

@example
decodemail imap://user@@example.com maildir:///var/mail/user
@end example

Both arguments can be omitted.  If @var{outbox} is not supplied, the
resulting mailbox will be printed on the standard output in Unix
@samp{mbox} format.  If @var{inbox} is not supplied, the utility will
open the system inbox for the current user and use it for input.

A consequence of these rules is that there is no simple way to read
the input mailbox from standard input (the input must be seekable).
If you need to do this, the normal procedure would be to save what
would be standard input in a temporary file and then give that file as
@command{decodemail}'s input.

The following command line options modify the @command{decodemail}
behavior:

@table @option
@item -c, --charset=@var{charset}
Convert all textual parts from their original character set to the
specified @var{charset}.

@item -R, --recode
Convert all textual parts from their original character set to the
current character set, as specified by the @env{LC_ALL} or @env{LANG}
environment variable.

@item --no-recode
Do not convert character sets.  This is the default.

@item -t, --truncate
If the output mailbox exists, truncate it before appending new
messages.

@item --no-truncate
Keep the existing messages in the output mailbox intact.  This is the
default.
@end table

Additionally, the @ref{Common Options} are also understood.

@node Conf-decodemail
@subsection Configuration of @command{decodemail}.

The following common configuration statements affect the behavior of
@command{decodemail}:

@multitable @columnfractions 0.3 0.6
@headitem Statement @tab Reference
@item mime          @tab @xref{mime statement}.
@item debug         @tab @xref{Debug Statement}.
@item mailbox       @tab @xref{Mailbox Statement}.
@item locking       @tab @xref{Locking Statement}.
@end multitable

Notably, the @code{mime} statement can be used to extend the list of
types which are decoded. For example, in the file @file{~/.decodemail}
(other locations are possible, @pxref{configuration}), you could have:

@example
# base64/qp decode these mime types also:
mime @{
  text-type "application/x-bibtex";
  text-type "application/x-tex";
@}
@end example

Since the list of textual mime types is open-ended, with new types being
used at any time, we do not attempt to make the built-in list
comprehensive.

@node Using-decodemail
@subsection Purpose and caveats of @command{decodemail}.

The principal use envisioned for this program is to decode messages in
batch, after they are received.

Unfortunately, some mailers prefer to encode messages in their
entirety in base64 (or quoted-printable), even when the content is
entirely human-readable text. This makes straightforward use of
@command{grep} or other standard commands impossible. The idea is for
@command{decodemail} to rectify that, by making the message text
readable again.

Besides personal mail, mailing list archives are another place where
such decoding can be useful, as they are often searched with standard
tools.

It is generally not recommended to run @command{decodemail} within a
mail reader (which should be able to do the decoding itself), or
directly in a terminal (since quite possibly there will be 8-bit
output not in the current character set).

Although the output message from @command{decodemail} should be
entirely equivalent to the input message, apart from the decoding, it
is generally not identical. Because @command{decodemail} parses the
input message and reconstructs it for output, there are usually small
differences:

@itemize
@item In the envelope @samp{From } line, multiple spaces are collapsed
to one.

@item A @samp{Content-Transfer-Encoding:} header may be added where
not previously present, or its value changed from @samp{8bit} to
@samp{7bit}, or vice versa. This may happen both for the message as a
whole, and for a given mime part. @command{decodemail} looks at the
actual content of the text and outputs
@samp{Content-Transfer-Encoding:} accordingly.

@item A trailing space is inserted when a long header line is broken
to occupy several lines (@dfn{header wrapping}).

@example
SomeHeader: 
  someextremelylongvaluethatcannotbebroken
@end example

@item The non-tracing headers may be reordered, notably those that are
mime-related.

@item Any material before the first mime part of a mime multipart
message is lost. By the standards, nothing should appear
there. Typically if it does appear, it is a string such as @samp{This
is a multi-part message in MIME format.}.

@item In mime parts, the charset specifications may no longer be
quoted (if quoting is not necessary). For example,
@samp{charset="utf-8"} becomes @samp{charset=utf-8}.

@item The mime boundary strings will be changed.

@end itemize

If a discrepancy is created which actually affects message parsing or
reading, that's most likely a bug, and please report it. Naturally,
please send an exact input message to reproduce the problem.