File: html2markdown.1

package info (click to toggle)
python-html2text 2020.1.16-1
  • links: PTS, VCS
  • area: main
  • in suites: bullseye, sid
  • size: 792 kB
  • sloc: python: 1,487; sh: 47; makefile: 10
file content (114 lines) | stat: -rw-r--r-- 3,408 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
.TH HTML2MARKDOWN "1" "July 2015" "html2markdown 2015.6.21" "User Commands"
.SH NAME
html2markdown \- converts a page of HTML into markdown.
.SH SYNOPSIS
.B html2markdown
[options...] [(\fIfilename\fR|\fIurl\fR) [\fIencoding\fR]]
.SH DESCRIPTION
\fBhtml2markdown\fR downloads the specified HTML page, and converts it
to text marked up with markdown.
The source HTML page may be a local file or remote URL.
If not specified, it will be read from standard input.
The output is printed to standard output.
.P
If an \fIencoding\fR is specified, it will override any encoding
information provided by the HTTP Server.
When not specified, \fBpython-feedparser\fR (if available) will be used
to determine the source encoding.
If not available, or when reading local files, the encoding is assumed
to be UTF-8.
.SH OPTIONS
.TP
.B \-\-no\-wrap\-links
Don't wrap long links.
.TP
.B \-\-ignore\-emphasis
Don't include any formatting for emphasis.
.TP
.B \-\-reference\-links
Use reference style links instead of in\-line links.
.TP
.B \-\-ignore\-links
Don't include any formatting for links.
.TP
.B \-\-protect\-links
Protect links from line breaks surrounding them with angle brackets.
.TP
.B \-\-ignore\-images
Don't include any formatting for images.
.TP
.B \-\-images\-to\-alt
Discard image data, only keep alt text.
.TP
.B \-\-images\-with\-size
Write image tags with height and width attrs as raw html to retain
dimensions.
.TP
.BR \-g ", " \-\-google\-doc
Convert an html-exported Google Document.
.TP
.BR \-d ", " \-\-dash\-unordered\-list
Use a dash rather than a star for unordered list items.
.TP
.BR \-e ", " \-\-asterisk\-emphasis
Use an asterisk rather than an underscore for emphasized text.
.TP
\fB\-b\fR \fIBODY_WIDTH\fR, \fB\-\-body\-width\fR=\fIBODY_WIDTH\fR
Number of characters per output line, \fB0\fR for no wrap.
.TP
\fB\-i\fR \fILIST_INDENT\fR, \fB\-\-google\-list\-indent\fR=\fILIST_INDENT\fR
Number of pixels Google indents nested lists.
.TP
.BR \-s ", " \-\-hide\-strikethrough
Hide strike-through text. Only relevant when \fB-g\fR is specified as
well.
.TP
.B \-\-escape\-all
Escape all special characters. Output is less readable, but avoids corner case formatting issues.
.TP
.B \-\-bypass\-tables
Format tables in HTML rather than Markdown syntax.
.TP
.B \-\-ignore\-tables
Ignore table-related tags (table, th, td, tr) while keeping rows.
.TP
.B \-\-single\-line\-break
Use a single line break after a block element rather than two line
breaks.
.B NOTE:
Requires \fB--body-width\fR=\fB0\fR.
.TP
.B \-\-unicode\-snob
Use unicode throughout document.
.TP
.B \-\-no\-automatic\-links
Do not use automatic links wherever applicable.
.TP
.B \-\-no\-skip\-internal\-links
Do not skip internal links.
.TP
.B \-\-links\-after\-para
Put links after each paragraph instead of document.
.TP
.B \-\-mark\-code
Mark program code blocks with \fB[code]\fI...\fB[/code]\fR.
.TP
\fB\-\-default\-image\-alt\fR=\fITEXT\fR
The default alt string for images with missing ones.
.TP
.B \-\-pad\-tables
Pad the cells to equal column width in tables.
.TP
\fB\-\-decode\-errors\fR=\fIDECODE_ERRORS\fR
What to do in case of decode errors. \fBignore\fR, \fBstrict\fR, and
\fBreplace\fR are acceptable values.
.TP
.B \-\-version
Show program's version number and exit.
.TP
.BR \-h ", " \-\-help
Show a help message and exit.

.SH AUTHOR
This manpage was written for Debian, by Stefano Rivera
<stefanor@debian.org>.