File: hxextract.1

package info (click to toggle)
html-xml-utils 7.7-1.1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm, bullseye, forky, sid, trixie
  • size: 2,488 kB
  • sloc: ansic: 11,213; sh: 7,996; lex: 243; makefile: 193; yacc: 125
file content (87 lines) | stat: -rw-r--r-- 1,842 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
.de d \" begin display
.sp
.in +4
.nf
..
.de e \" end display
.in -4
.fi
.sp
..
.TH "HXEXTRACT" "1" "10 Jul 2011" "7.x" "HTML-XML-utils"
.SH NAME
hxextract \- extract selected elements from a HTML or XML file
.SH SYNOPSIS
.B hxextract
.RB "[\| " \-h
.RB "| " \-? " \|]"
.RB "[\| " \-x " \|]"
.RB "[\| " \-s
.IR text " \|]"
.RB "[\| " \-e
.IR text " \|]"
.RB "[\| " \-b
.IR base " \|]"
.I element-or-class
.RB "[\| " \-c
.IR "configfile" " | "
.IR file\-or\-URL " \|]"
.SH DESCRIPTION
.B hxextract
outputs all elements with a certain name and/or class.
.PP
Input must be well-formed, since no HTML heuristics are applied.
.SH OPTIONS
The following options are supported:
.TP 10
.B \-x
Use XML format conventions.
.TP 10
.BI \-s " text"
Insert
.I text
at the start of the output.
.TP 10
.BI \-e " text"
Insert
.I text
at the end of the output.
.TP 10
.BI \-b " base"
URL base
.TP 10
.BI \-c " configfile"
Read @chapter lines from
.I configfile
(lines must be of the form "@chapter filename") and extract elements from each of those files.
.TP 10
.BR \-h ", " \-?
Print command usage.
.SH OPERANDS
The following operands are supported:
.TP 10
.I element-or-class
The name of an element to extract (e.g., "H2"), or the name of a class
preceded by "." (e.g., ".example") or a combination of both (e.g.,
"H2.example").
.TP
.I file-or-URL
A file name or a URL. To read from standard input, use "-".
.SH ENVIRONMENT
To use a proxy to retrieve remote files, set the environment variables
.B http_proxy
and
.BR ftp_proxy "."
E.g.,
.B http_proxy="http://localhost:8080/"
.SH BUGS
.LP
Remote files (specified with a URL) are currently only supported for
HTTP. Password-protected files or files that depend on HTTP "cookies"
are not handled. (You can use tools such as
.BR curl (1)
or
.BR wget (1)
to retrieve such files.)
.SH "SEE ALSO"
.BR hxselect (1)