1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146
|
package Text::Markup::CommonMark;
use 5.8.1;
use strict;
use warnings;
use CommonMark;
use Text::Markup;
use File::BOM qw(open_bom);
our $VERSION = '0.33';
sub import {
# Replace Text::Markup::Markdown.
Text::Markup->register( markdown => $_[1] || qr{m(?:d(?:own)?|kdn?|arkdown)} );
}
sub parser {
my ($file, $encoding, $opts) = @_;
open_bom my $fh, $file, ":encoding($encoding)";
my %params = @{ $opts };
my $html = CommonMark->parse(
smart => 1,
unsafe => 1,
%params,
string => join( '', <$fh>),
)->render( %params, format => 'html' );
return unless $html =~ /\S/;
utf8::encode($html);
return $html if $params{raw};
return qq{<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body>
$html
</body>
</html>
};
}
1;
__END__
=head1 Name
Text::Markup::CommonMark - CommonMark Markdown parser for Text::Markup
=head1 Synopsis
use Text::Markup::CommonMark;
my $html = Text::Markup->new->parse(file => 'README.md');
my $raw = Text::Markup->new->parse(
file => 'README.md',
options => [ raw => 1 ],
);
=head1 Description
This is the L<CommonMark|https://commonmark.org> parser
for L<Text::Markup>. On load, it replaces the default L<Text::Markup::Markdown>
parser for parsing L<Markdown|https://daringfireball.net/projects/markdown/>.
Note that L<Text::Markup> does not load this module by default, but when
loaded manually will be the preferred Markdown parser.
Text::Markup::CommonMark reads in the file (relying on a
L<BOM|https://www.unicode.org/unicode/faq/utf_bom.html#BOM>), hands it off to
L<CommonMark> for parsing, and then returns the generated HTML as an
encoded UTF-8 string with an C<http-equiv="Content-Type"> element identifying
the encoding as UTF-8.
It recognizes files with the following extensions as CommonMark Markdown:
=over
=item F<.md>
=item F<.mkd>
=item F<.mkdn>
=item F<.mdown>
=item F<.markdown>
=back
To change it the files it recognizes, load this module directly and pass a
regular expression matching the desired extension(s), like so:
use Text::Markup::CommonMark qr{markd?};
Normally this module returns the output wrapped in a minimal HTML document
skeleton. If you would like the raw output without the skeleton, you can pass
the C<raw> option to C<parse>.
In addition Text::CommonMark supports all of the CommonMark
L<parse options|CommonMark/parse> and L<render options|CommonMark::Node/render>,
including:
=over
=item C<smart>
When true, convert straight quotes to curly, --- to em dashes, -- to en
dashes. Enabled by default.
=item C<sourcepos>
When true, include a data-sourcepos attribute on all block elements. Disabled
by default.
=item C<hardbreaks>
When true, render soft-break elements as hard line breaks. Disabled by default.
=item C<nobreaks>
When true, render soft-break elements as spaces. Disabled by default.
=item C<validate_utf8>
When true, validate UTF-8 in the input before parsing, replacing illegal
sequences with the replacement character C<U+FFFD>. Disabled by default.
=item C<unsafe>
Render raw HTML and unsafe links (C<javascript:>, C<vbscript:>, C<file:>, and
C<data:>, except for C<image/png>, C<image/gif>, C<image/jpeg>, or
C<image/webp> mime types). Raw HTML is replaced by a placeholder HTML comment.
Unsafe links are replaced by empty strings. Enabled by default.
=back
=head1 Author
David E. Wheeler <david@justatheory.com>
=head1 Copyright and License
Copyright (c) 2011-2024 David E. Wheeler. Some Rights Reserved.
This module is free software; you can redistribute it and/or modify it under
the same terms as Perl itself.
=cut
|