1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78
|
$ fq -h html
html: HyperText Markup Language decoder
Options
=======
array=false Decode as nested arrays
attribute_prefix="@" Prefix for attribute keys
seq=false Use seq attribute to preserve element order
Decode examples
===============
# Decode file as html
$ fq -d html . file
# Decode value as html
... | html
# Decode file using html options
$ fq -d html -o array=false -o attribute_prefix="@" -o seq=false . file
# Decode value as html
... | html({array:false,attribute_prefix:"@",seq:false})
HTML is decoded in HTML5 mode and will always include <html>, <body> and <head> element.
See xml format for more examples and how to preserve element order and how to encode to xml.
There is no to_html function, see to_xml instead.
Element as object
=================
# decode as object is the default
$ echo '<a href="url">text</a>' | fq -d html
{
"html": {
"body": {
"a": {
"#text": "text",
"@href": "url"
}
},
"head": ""
}
}
Element as array
================
$ '<a href="url">text</a>' | fq -d html -o array=true
[
"html",
null,
[
[
"head",
null,
[]
],
[
"body",
null,
[
[
"a",
{
"#text": "text",
"href": "url"
},
[]
]
]
]
]
]
# decode html files to a {file: "title", ...} object
$ fq -n -d html '[inputs | {key: input_filename, value: .html.head.title?}] | from_entries' *.html
# <a> href:s in file
$ fq -r -o array=true -d html '.. | select(.[0] == "a" and .[1].href)?.[1].href' file.html
|