1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103
|
************
Faup
************
Description
===========
These new functions allow anybody to parse any variable containing URLs, hostnames, DNS queries and such to extract:
- the *scheme*
- the *credentials* (if present)
- the *tld* (with support for second-level TLDs)
- the *domain* (with and without the tld)
- the *subdomain*
- the *full hostname*
- the *port* (if present)
- the *resource* path (if present)
- the *query string* parameters (if present)
- the *fragment* (if present)
HowTo
-----
The module functions are fairly simple to use, and are divided in 2 classes:
* `faup()` allows to parse the entire URL and return all parts in a complete JSON
* `faup_<field>()` allows to parse the entire URL, but only returns the (potential) value of the requested field as string
Examples
^^^^^^^^
Using the `faup()` function
"""""""""""""""""""""""""""""""
The `faup()` is the simplest function to use, simply provide a value or variable (any type) as the only parameter, and the function returns a json object containing every element of the URL.
*example code:*
.. code-block:: none
set $!url = "https://user:pass@www.rsyslog.com:443/doc/v8-stable/rainerscript/functions/mo-faup.html?param=value#faup";
set $.faup = faup($!url);
*$.faup will contain:*
.. code-block:: none
{
"scheme": "https",
"credential": "user:pass",
"subdomain": "www",
"domain": "rsyslog.com",
"domain_without_tld": "rsyslog",
"host": "www.rsyslog.com",
"tld": "com",
"port": "443",
"resource_path": "\/doc\/v8-stable\/rainerscript\/functions\/mo-ffaup.html",
"query_string": "?param=value",
"fragment": "#faup"
}
.. note::
This is a classic rsyslog variable, and you can access every sub-key with `$.faup!domain`, `$.faup!resource_path`, etc...
Using the `faup_<field>()` functions
""""""""""""""""""""""""""""""""""""""""
Using the field functions is even simpler: for each field returned by the `faup()` function, there exists a corresponding function to get only that one field.
For example, if the goal is to recover the domain without the tld, the example above could be modified as follows:
*example code:*
.. code-block:: none
set $!url = "https://user:pass@www.rsyslog.com:443/doc/v8-stable/rainerscript/functions/mo-faup.html?param=value#faup";
set $.faup = faup_domain_without_tld($!url);
*$.faup will contain:*
.. code-block:: none
rsyslog
.. note::
The returned value is no longer a json object, but a simple string
Requirements
============
This module relies on the `faup <https://github.com/stricaud/faup>`_ library.
The library should be compiled (see link for instructions on how to compile) and installed on build and runtime machines.
.. warning::
Even if faup is statically compiled to rsyslog, the library still needs an additional file to work properly: the mozilla.tlds stored by the libfaup library in /usr/local/share/faup. It permits to properly match second-level TLDs and allow URLs such as www.rsyslog.co.uk to be correctly parsed into \<rsyslog:domain\>.\<co.uk:tld\> and not \<rsyslog:subdomain\>.\<co:domain\>.\<uk:tld\>
Motivations
===========
Those functions are the answer to a growing problem encountered in Rsyslog when using modules to enrich logs : some mechanics (like lookup tables or external module calls) require "strict" URL/hostname formats that are often not formatted correctly, resulting in lookup failures/misses.
This ensures getting stable inputs to provide to lookups/modules to enrich logs.
|