1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282
|
=encoding utf8
=head1 NAME
Mail::Message::Body::Encode - organize general message encodings
=head1 SYNOPSIS
my Mail::Message $msg = ...;
my $decoded = $msg->decoded;
my $encoded = $msg->encode(mime_type => 'image/gif',
transfer_encoding => 'base64');
my $body = $msg->body;
my $decoded = $body->decoded;
my $encoded = $body->encode(transfer_encoding => '7bit');
=head1 DESCRIPTION
Manages the message's body encodings and decodings on request of the
main program. This package adds functionality to the L<Mail::Message::Body|Mail::Message::Body>
class when the L<decoded()|Mail::Message::Body/"Constructing a body"> or L<encode()|Mail::Message::Body::Encode/"Constructing a body"> method is called.
Four types of encodings are handled (in the right order)
=over 4
=item * eol encoding
Various operating systems have different ideas about how to encode the
line termination. UNIX uses a LF character, MacOS uses a CR, and
Windows uses a CR/LF combination. Messages which are transported over
Internet will always use the CRLF separator.
=item * transfer encoding
Messages transmitted over Internet have to be plain ASCII. Complicated
characters and binary files (like images and archives) must be encoded
during transmission to an ASCII representation.
The implementation of the required encoders and decoders is found in
the L<Mail::Message::TransferEnc|Mail::Message::TransferEnc> set of packages. The related
manual page lists the transfer encodings which are supported.
=item * mime-type translation
NOT IMPLEMENTED YET
=item * charset conversion
=back
=head1 METHODS
=head2 Constructing a body
=over 4
=item $obj-E<gt>B<charsetDetect>(%options)
[3.013] This is tricky. It is hard to detect whether the body originates from the
program, or from an external source. And what about a database database?
are those octets or strings?
Please read L<Mail::Message::Body/Autodetection of character-set>.
-Option --Default
external <false>
=over 2
=item external => BOOLEAN
Do only consider externally valid character-sets, implicitly: C<PERL> is not
an acceptable answer.
=back
=item Mail::Message::Body-E<gt>B<charsetDetectAlgorithm>( [CODE|undef|METHOD] )
[3.013] When a body object does not specify its character-set, but that
detail is required, then it gets autodetected. The default algorithm is
implemented in L<charsetDetect()|Mail::Message::Body::Encode/"Constructing a body">. You may change this default algorithm,
or pass option C<charset_detect> for each call to L<encode()|Mail::Message::Body::Encode/"Constructing a body">.
When you call this method with an explicit C<undef>, you reset the default.
(Without parameter) the current algorithm (CODE or method name) is
returned.
=item $obj-E<gt>B<check>()
Check the content of the body not to include illegal characters. Which
characters are considered illegal depends on the encoding of this body.
A body is returned which is checked. This may be the body where this
method is called upon, but also a new object, when serious changes had
to be made. If the check could not be made, because the decoder is not
defined, then C<undef> is returned.
=item $obj-E<gt>B<encode>(%options)
Encode (translate) a L<Mail::Message::Body|Mail::Message::Body> into a different format.
See the DESCRIPTION above. Options which are not specified will not trigger
conversions.
-Option --Default
charset PERL
charset_detect <built-in>
mime_type undef
result_type <same as source>
transfer_encoding undef
=over 2
=item charset => CHARSET|'PERL'
Only applies when the mime_type is textual.
If the CHARSET is explicitly specified (for instance C<iso-8859-10>, then
the data is being interpreted as raw bytes (blob), not as text. However, in
case of C<PERL>, it is considered to be an internal representation of
characters (either latin1 or Perl's utf8 --not the same as utf-8--, you should
not want to know).
This setting overrules the charset attribute in the mime_type FIELD.
=item charset_detect => CODE
[3.013] When the body does not contain an explicit charset specification,
then the RFC says it is C<us-ascii>. In reality, this is not true:
it is just an unknown character set. This often happens when text files
are included as attachment, for instance a footer attachment.
When you want to be smarter than the default charset detector, you can
provide your own function for this parameter. The function will get
the transfer-decoded version of this body. You can change the default
globally via L<charsetDetectAlgorithm()|Mail::Message::Body::Encode/"Constructing a body">.
=item mime_type => STRING|FIELD
Convert into the specified mime type, which can be specified as STRING
or FIELD. The FIELD is a L<Mail::Message::Field|Mail::Message::Field>-object, representing a
C<Content-Type> mime header. The STRING must be valid content for such
header, and will be converted into a FIELD object.
The FIELD may contain attributes. Usually, it has a C<charset> attribute,
which explains the CHARSET of the content after content transfer decoding.
The C<charset> option will update/add this attribute. Otherwise (hopefully
in rare cases) the CHARSET will be auto-detected when the body gets
decoded.
=item result_type => CLASS
The type of body to be created when the body is changed to fulfill the request
on re-coding. Also the intermediate stages in the translation process (if
needed) will use this type. CLASS must extend L<Mail::Message::Body|Mail::Message::Body>.
=item transfer_encoding => STRING|FIELD
=back
=item $obj-E<gt>B<encoded>(%options)
Encode the body to a format what is acceptable to transmit or write to
a folder file. This returns the body where this method was called
upon when everything was already prepared, or a new encoded body
otherwise. In either case, the body is checked.
-Option --Default
charset_detect <the default>
=over 2
=item charset_detect => CODE
See L<charsetDetectAlgorithm()|Mail::Message::Body::Encode/"Constructing a body">.
=back
=item $obj-E<gt>B<unify>($body)
Unify the type of the given $body objects with the type of the called
body. C<undef> is returned when unification is impossible. If the
bodies have the same settings, the $body object is returned unchanged.
Examples:
my $bodytype = Mail::Message::Body::Lines;
my $html = $bodytype->new(mime_type=>'text/html', data => []);
my $plain = $bodytype->new(mime_type=>'text/plain', ...);
my $unified = $html->unify($plain);
# $unified is the data of plain translated to html (if possible).
=back
=head2 About the payload
=over 4
=item $obj-E<gt>B<dispositionFilename>( [$directory] )
Various fields are searched for C<filename> and C<name> attributes. Without
$directory, the name found will be returned unmodified.
When a $directory is given, a filename is composed. For security reasons,
only the basename of the found name gets used and many potentially
dangerous characters removed. If no name was found, or when the found
name is already in use, then an unique name is generated.
Don't forget to read RFC6266 section 4.3 for the security aspects in your
email application.
=item $obj-E<gt>B<isBinary>()
Returns true when the un-encoded message is binary data. This information
is retrieved from knowledge provided by L<MIME::Types|MIME::Types>.
=item $obj-E<gt>B<isText>()
Returns true when the un-encoded message contains printable
text.
=back
=head2 Internals
=over 4
=item $obj-E<gt>B<addTransferEncHandler>( $name, <$class|$object> )
=item Mail::Message::Body-E<gt>B<addTransferEncHandler>( $name, <$class|$object> )
Relate the NAMEd transfer encoding to an OBJECTs or object of the specified
$class. In the latter case, an object of that $class will be created on the
moment that one is needed to do encoding or decoding.
The $class or $object must extend L<Mail::Message::TransferEnc|Mail::Message::TransferEnc>. It will
replace existing class and object for this $name.
Why aren't you contributing this class to MailBox?
=item $obj-E<gt>B<getTransferEncHandler>($type)
Get the transfer encoder/decoder which is able to handle $type, or return
undef if there is no such handler.
=back
=head1 DIAGNOSTICS
=over 4
=item Warning: Charset $name is not known
The encoding or decoding of a message body encounters a character set which
is not understood by Perl's Encode module.
=item Warning: No decoder defined for transfer encoding $name.
The data (message body) is encoded in a way which is not currently understood,
therefore no decoding (or recoding) can take place.
=item Warning: No encoder defined for transfer encoding $name.
The data (message body) has been decoded, but the required encoding is
unknown. The decoded data is returned.
=back
=head1 SEE ALSO
This module is part of Mail-Message distribution version 3.017,
built on April 18, 2025. Website: F<http://perl.overmeer.net/CPAN/>
=head1 LICENSE
Copyrights 2001-2025 by [Mark Overmeer <markov@cpan.org>]. For other contributors see ChangeLog.
This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.
See F<http://dev.perl.org/licenses/>
|