File: Encode.pod

package info (click to toggle)
libmail-message-perl 3.017-1
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 1,632 kB
sloc: perl: 11,156; makefile: 4
file content (282 lines) | stat: -rw-r--r-- 9,224 bytes
=encoding utf8

=head1 NAME

Mail::Message::Body::Encode - organize general message encodings

=head1 SYNOPSIS

 my Mail::Message $msg = ...;
 my $decoded = $msg->decoded;
 my $encoded = $msg->encode(mime_type => 'image/gif',
     transfer_encoding => 'base64');

 my $body = $msg->body;
 my $decoded = $body->decoded;
 my $encoded = $body->encode(transfer_encoding => '7bit');

=head1 DESCRIPTION

Manages the message's body encodings and decodings on request of the
main program.  This package adds functionality to the L<Mail::Message::Body|Mail::Message::Body>
class when the L<decoded()|Mail::Message::Body/"Constructing a body"> or L<encode()|Mail::Message::Body::Encode/"Constructing a body"> method is called.

Four types of encodings are handled (in the right order)

=over 4

=item * eol encoding

Various operating systems have different ideas about how to encode the
line termination.  UNIX uses a LF character, MacOS uses a CR, and
Windows uses a CR/LF combination.  Messages which are transported over
Internet will always use the CRLF separator.

=item * transfer encoding

Messages transmitted over Internet have to be plain ASCII.  Complicated
characters and binary files (like images and archives) must be encoded
during transmission to an ASCII representation.

The implementation of the required encoders and decoders is found in
the L<Mail::Message::TransferEnc|Mail::Message::TransferEnc> set of packages.  The related
manual page lists the transfer encodings which are supported.

=item * mime-type translation

NOT IMPLEMENTED YET

=item * charset conversion

=back

=head1 METHODS

=head2 Constructing a body

=over 4

=item $obj-E<gt>B<charsetDetect>(%options)

[3.013] This is tricky.  It is hard to detect whether the body originates from the
program, or from an external source.  And what about a database database?
are those octets or strings?
Please read L<Mail::Message::Body/Autodetection of character-set>.

 -Option  --Default
  external  <false>

=over 2

=item external => BOOLEAN

Do only consider externally valid character-sets, implicitly: C<PERL> is not
an acceptable answer.

=back

=item Mail::Message::Body-E<gt>B<charsetDetectAlgorithm>( [CODE|undef|METHOD] )

[3.013] When a body object does not specify its character-set, but that
detail is required, then it gets autodetected.  The default algorithm is
implemented in L<charsetDetect()|Mail::Message::Body::Encode/"Constructing a body">.  You may change this default algorithm,
or pass option C<charset_detect> for each call to L<encode()|Mail::Message::Body::Encode/"Constructing a body">.

When you call this method with an explicit C<undef>, you reset the default.
(Without parameter) the current algorithm (CODE or method name) is
returned.

=item $obj-E<gt>B<check>()

Check the content of the body not to include illegal characters.  Which
characters are considered illegal depends on the encoding of this body.

A body is returned which is checked.  This may be the body where this
method is called upon, but also a new object, when serious changes had
to be made.  If the check could not be made, because the decoder is not
defined, then C<undef> is returned.

=item $obj-E<gt>B<encode>(%options)

Encode (translate) a L<Mail::Message::Body|Mail::Message::Body> into a different format.
See the DESCRIPTION above.  Options which are not specified will not trigger
conversions.

 -Option           --Default
  charset            PERL
  charset_detect     <built-in>
  mime_type          undef
  result_type        <same as source>
  transfer_encoding  undef

=over 2

=item charset => CHARSET|'PERL'

Only applies when the mime_type is textual.

If the CHARSET is explicitly specified (for instance C<iso-8859-10>, then
the data is being interpreted as raw bytes (blob), not as text.  However, in
case of C<PERL>, it is considered to be an internal representation of
characters (either latin1 or Perl's utf8 --not the same as utf-8--, you should
not want to know).

This setting overrules the charset attribute in the mime_type FIELD.

=item charset_detect => CODE

[3.013] When the body does not contain an explicit charset specification,
then the RFC says it is C<us-ascii>.  In reality, this is not true:
it is just an unknown character set. This often happens when text files
are included as attachment, for instance a footer attachment.

When you want to be smarter than the default charset detector, you can
provide your own function for this parameter.  The function will get
the transfer-decoded version of this body.  You can change the default
globally via L<charsetDetectAlgorithm()|Mail::Message::Body::Encode/"Constructing a body">.

=item mime_type => STRING|FIELD

Convert into the specified mime type, which can be specified as STRING
or FIELD.  The FIELD is a L<Mail::Message::Field|Mail::Message::Field>-object, representing a
C<Content-Type> mime header.  The STRING must be valid content for such
header, and will be converted into a FIELD object.

The FIELD may contain attributes.  Usually, it has a C<charset> attribute,
which explains the CHARSET of the content after content transfer decoding.
The C<charset> option will update/add this attribute.  Otherwise (hopefully
in rare cases) the CHARSET will be auto-detected when the body gets
decoded.

=item result_type => CLASS

The type of body to be created when the body is changed to fulfill the request
on re-coding.  Also the intermediate stages in the translation process (if
needed) will use this type. CLASS must extend L<Mail::Message::Body|Mail::Message::Body>.

=item transfer_encoding => STRING|FIELD

=back

=item $obj-E<gt>B<encoded>(%options)

Encode the body to a format what is acceptable to transmit or write to
a folder file.  This returns the body where this method was called
upon when everything was already prepared, or a new encoded body
otherwise.  In either case, the body is checked.

 -Option        --Default
  charset_detect  <the default>

=over 2

=item charset_detect => CODE

See L<charsetDetectAlgorithm()|Mail::Message::Body::Encode/"Constructing a body">.

=back

=item $obj-E<gt>B<unify>($body)

Unify the type of the given $body objects with the type of the called
body.  C<undef> is returned when unification is impossible.  If the
bodies have the same settings, the $body object is returned unchanged.

Examples:

 my $bodytype = Mail::Message::Body::Lines;
 my $html  = $bodytype->new(mime_type=>'text/html', data => []);
 my $plain = $bodytype->new(mime_type=>'text/plain', ...);

 my $unified = $html->unify($plain);
 # $unified is the data of plain translated to html (if possible).

=back

=head2 About the payload

=over 4

=item $obj-E<gt>B<dispositionFilename>( [$directory] )

Various fields are searched for C<filename> and C<name> attributes.  Without
$directory, the name found will be returned unmodified.

When a $directory is given, a filename is composed.  For security reasons,
only the basename of the found name gets used and many potentially
dangerous characters removed.  If no name was found, or when the found
name is already in use, then an unique name is generated.

Don't forget to read RFC6266 section 4.3 for the security aspects in your
email application.

=item $obj-E<gt>B<isBinary>()

Returns true when the un-encoded message is binary data.  This information
is retrieved from knowledge provided by L<MIME::Types|MIME::Types>.

=item $obj-E<gt>B<isText>()

Returns true when the un-encoded message contains printable
text.

=back

=head2 Internals

=over 4

=item $obj-E<gt>B<addTransferEncHandler>( $name, <$class|$object> )

=item Mail::Message::Body-E<gt>B<addTransferEncHandler>( $name, <$class|$object> )

Relate the NAMEd transfer encoding to an OBJECTs or object of the specified
$class.  In the latter case, an object of that $class will be created on the
moment that one is needed to do encoding or decoding.

The $class or $object must extend L<Mail::Message::TransferEnc|Mail::Message::TransferEnc>.  It will
replace existing class and object for this $name.

Why aren't you contributing this class to MailBox?

=item $obj-E<gt>B<getTransferEncHandler>($type)

Get the transfer encoder/decoder which is able to handle $type, or return
undef if there is no such handler.

=back

=head1 DIAGNOSTICS

=over 4

=item Warning: Charset $name is not known

The encoding or decoding of a message body encounters a character set which
is not understood by Perl's Encode module.

=item Warning: No decoder defined for transfer encoding $name.

The data (message body) is encoded in a way which is not currently understood,
therefore no decoding (or recoding) can take place.

=item Warning: No encoder defined for transfer encoding $name.

The data (message body) has been decoded, but the required encoding is
unknown.  The decoded data is returned.

=back

=head1 SEE ALSO

This module is part of Mail-Message distribution version 3.017,
built on April 18, 2025. Website: F<http://perl.overmeer.net/CPAN/>

=head1 LICENSE

Copyrights 2001-2025 by [Mark Overmeer <markov@cpan.org>]. For other contributors see ChangeLog.

This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.
See F<http://dev.perl.org/licenses/>