File: CommonMark.pm

package info (click to toggle)
libtext-markup-perl 0.33-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 324 kB
  • sloc: perl: 870; python: 178; makefile: 7
file content (146 lines) | stat: -rw-r--r-- 3,692 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
package Text::Markup::CommonMark;

use 5.8.1;
use strict;
use warnings;
use CommonMark;
use Text::Markup;
use File::BOM qw(open_bom);

our $VERSION = '0.33';

sub import {
    # Replace Text::Markup::Markdown.
    Text::Markup->register( markdown => $_[1] || qr{m(?:d(?:own)?|kdn?|arkdown)} );
}

sub parser {
    my ($file, $encoding, $opts) = @_;
    open_bom my $fh, $file, ":encoding($encoding)";
    my %params = @{ $opts };
    my $html = CommonMark->parse(
        smart  => 1,
        unsafe => 1,
        %params,
        string => join( '', <$fh>),
    )->render(  %params, format => 'html' );
    return unless $html =~ /\S/;
    utf8::encode($html);
    return $html if $params{raw};
    return qq{<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body>
$html
</body>
</html>
};

}

1;
__END__

=head1 Name

Text::Markup::CommonMark - CommonMark Markdown parser for Text::Markup

=head1 Synopsis

  use Text::Markup::CommonMark;
  my $html = Text::Markup->new->parse(file => 'README.md');
  my $raw  = Text::Markup->new->parse(
      file    => 'README.md',
      options => [ raw => 1 ],
  );

=head1 Description

This is the L<CommonMark|https://commonmark.org> parser
for L<Text::Markup>. On load, it replaces the default L<Text::Markup::Markdown>
parser for parsing L<Markdown|https://daringfireball.net/projects/markdown/>.
Note that L<Text::Markup> does not load this module by default, but when
loaded manually will be the preferred Markdown parser.

Text::Markup::CommonMark reads in the file (relying on a
L<BOM|https://www.unicode.org/unicode/faq/utf_bom.html#BOM>), hands it off to
L<CommonMark> for parsing, and then returns the generated HTML as an
encoded UTF-8 string with an C<http-equiv="Content-Type"> element identifying
the encoding as UTF-8.

It recognizes files with the following extensions as CommonMark Markdown:

=over

=item F<.md>

=item F<.mkd>

=item F<.mkdn>

=item F<.mdown>

=item F<.markdown>

=back

To change it the files it recognizes, load this module directly and pass a
regular expression matching the desired extension(s), like so:

  use Text::Markup::CommonMark qr{markd?};

Normally this module returns the output wrapped in a minimal HTML document
skeleton. If you would like the raw output without the skeleton, you can pass
the C<raw> option to C<parse>.

In addition Text::CommonMark supports all of the CommonMark
L<parse options|CommonMark/parse> and L<render options|CommonMark::Node/render>,
including:

=over

=item C<smart>

When true, convert straight quotes to curly, --- to em dashes, -- to en
dashes. Enabled by default.

=item C<sourcepos>

When true, include a data-sourcepos attribute on all block elements. Disabled
by default.

 =item C<hardbreaks>

When true, render soft-break elements as hard line breaks. Disabled by default.

 =item C<nobreaks>

When true, render soft-break elements as spaces. Disabled by default.

 =item C<validate_utf8>

When true, validate UTF-8 in the input before parsing, replacing illegal
sequences with the replacement character C<U+FFFD>. Disabled by default.

 =item C<unsafe>

Render raw HTML and unsafe links (C<javascript:>, C<vbscript:>, C<file:>, and
C<data:>, except for C<image/png>, C<image/gif>, C<image/jpeg>, or
C<image/webp> mime types). Raw HTML is replaced by a placeholder HTML comment.
Unsafe links are replaced by empty strings. Enabled by default.

=back

=head1 Author

David E. Wheeler <david@justatheory.com>

=head1 Copyright and License

Copyright (c) 2011-2024 David E. Wheeler. Some Rights Reserved.

This module is free software; you can redistribute it and/or modify it under
the same terms as Perl itself.

=cut