File: architecture.pod

package info (click to toggle)
torrus 3.00-2
links: PTS, VCS
area: main
in suites: bookworm, sid, trixie
size: 4,548 kB
sloc: perl: 34,614; xml: 20,934; sh: 3,650; makefile: 713
file content (514 lines) | stat: -rw-r--r-- 17,152 bytes
parent folder | download | duplicates (3)
#  architecture.pod: The Torrus internals
#  Copyright (C) 2016  Stanislav Sinyagin
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#  along with this program; if not, write to the Free Software
#  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.

# Stanislav Sinyagin <ssinyagin@k-open.com>
#
#

=head1 Torrus Architecture

=head2 Configuration processing

The XML configuration is compiled into the database representation by
operator's manual request.

A compiled version of configuration is not a one-to-one representation
of the XML version. All templates are expanded. Backward restoration of
XML from the database is available with the snapshot utility.

A template defines a piece of configuration which can be used in
multiple places. Templates can be nested.

The configuration consists of multiple XML files. They are processed in
the order as specified in the tree configuration. Each new file is
treated as an additive information to the existing tree.

The XML configuration compiler validates all the mandatory parameters.





=head2 Data storage

Three types of data stores are used in Torrus:

=over

=item * Git for storing the tree data;

=item * Redis for storing run-time data, sending notifications, and locking;

=back




=head2 Tree configuration

The configuration consists of multiple trees. A tree consists of nodes,
and each node can be of type "leaf" or "subtree". Subtrees contain child
subtrees or leaves, and a leaf does not contain any child elements. Each
node has an arbitrary number of parameters. Some parameters can prohibit
recursion, but most of parameters are calculated by traversing the tree
upwards, until a value is found.

Each node has a path within a tree. A subtree path ends always with a
slash, and a leaf path ends with a word character. The top of the tree
is identified by a single slash symbol. The node names in the path allow
alphanumeric characters, dash and underscore.

Each node is identified by a token. A node token a 40-character SHA-1
checksum calculated from the tree name, followed by a colon, and the
path.




=head2 C<ConfigTree> objects

C<ConfigTree> Perl module provides an API for accessing the
configuration trees, as well as other types of data. Each data element
is referred to by a token, as follows:

Tree node token is a 40-character SHA-1 checksum of a node as described above.

Tokenset name starts with letter I<S>. The rest is an arbitrary sequence of
word characters.

The special token I<SS> is reserved for tokensets list. Also tokenset
parameters are inherited from this token's parameters.

View and monitor names must be unique, and must
start with a lower case letter.




=head2 Git storage

A Torrus instance uses a number of Git branches for each tree
configuration. The XML compiler is the only writer, and consumers are
only reading from the Git repositories. Both writing and reading is done
directly on local Git repositories, so working directories are not
needed. If the writer and reader are on the same host, they use the same
Git repository. Otherwise, the writer pushes its commits to a remote
repository, and the reader pulls from it. The reader sets an exclusive
lock on the repository for the time of fetching and merging, so that
other readers don't try to pull at the same time.

Each tree has its own set of branches. The C<I<TREE>_configtree> branch
contains a full hierarchy of objects, so all parameters that are defined
in the input XML are retrievable. Typically the Web UI renderer is
consuming this data.

The C<I<TREE>_srcfiles> branch contains XML files that are used by the
XML compiler. While processing the XML files, it adds them into this
branch in order to track the changes in XML sources.

Also there's a number of agent branches, one per agent instance
(collectors and monitors are typical agents):
C<I<TREE>_I<DAEMON>_I<INSTANCE>>. Each such branch contains only the
information and parameters needed by the agent, so that the agent
process can start and update its data as fast as possible.

A Git reference C<refs/heads/I<TREE>_agents_ref> is used to indicate the
commit in C<I<TREE>_configtree> branch that corresponds to the current
heads in agent branches. This reference is moved when the agent branches
are updated.

The C<I<TREE>_agent_tokens> branch is used to store the information
which agent branches have which tokens. It is needed for deleting tokens
from agent branches when they are deleted from configtree branch. The
branch contains two-level 256-way directory hierarchy, for every token,
and the JSON files contain arrays with agent branch names where a
particular token is used.

The C<I<TREE>_configtree> branch has subdirectories as follows. The
directories C<nodes> and C<children> contain JSON files named after the
node tokens, arranged into two-level 256-way tree structure. For
example, the file for token C<b7ba0d88a0b14a4e6c3c61f5446aa619a537098f>
is stored as C<b7/ba/0d88a0b14a4e6c3c61f5446aa619a537098f>.

=over

=item * C<nodes/>: the JSON content defines the node's type, name,
  parent's token, and parameters.

=item * C<children/>: tor each subtree node, there is a JSON file
  defining a hash with child tokens as keys and "1" as values.

=item * C<srcrefs>: a JSON hash representing dependencies of nodes from
  XML files. It's a two-level hash: the first key is the source file
  name, the second key is the token of topmost dependent node, and the
  value is "1".

=item * C<srcglobaldeps>: a JSON hash representing the XML source files
  which define parameter properties, definitions, or templates. The keys
  are file names, and values are "1".

=item * C<srcrev>: A file containing a JSON scalar referring to the
  commit in XML sources.

=item * C<srcincludes>: A file containing a JSON hash with source XML
  files as keys and arrays of included file names as values. The order
  in the array is the same as the order of "include" statements in the
  XML file. The key C<__ROOT__> indicates the XML files where the
  compilation started. Every source XML file is listed here, and those
  which do not include other files, have empty arrays.

=item * C<nodeid/>: two-level 256-way directory structure. Each file is
  a SHA-1 digest of I<nodeid> value. The content of the file is a JSON
  array of the nodeid value and node token.

=item * C<nodeidpx/>: I<nodeid> prefix searching database. Each
  I<nodeid> value is split by standard delimiters (two consecutive
  slashes), and each resulting prefix is used to build a key in this
  hierarchy. Two-level 256-way directory structure is built from SHA-1
  digests of these keys. The directories contain zero-length files
  representing the SHA-1 digests of I<nodeid> values.

=item * C<definitions/>: each file is a definition name, and the content
  is a JSON scalar returning the definition value.

=item * C<other/>: JSON objects for views, monitors, and actions
  definitions. Files are named after the view, monitor, or action name,
  and the content is a hash with parameters defining each object. The
  following special files are JSON hashes of object names and "1" as
  values of corresponding types: C<__VIEWS__>, C<__MONITORS__>,
  C<__ACTIONS__>.

=item * C<paramprops>: a single JSON file defining parameter properties
  in two-level hash.

=back

The C<srcdef> structure is mainly required for recursive deletion of
nodes if a corresponding XML file is changed or deleted.

Each Git commit refers to a complete and consistent tree structure. If
the compiler finds an error, it does not create a new commit, and rolls
back to the latest HEAD.

The JSON files within C<nodes> hierarchy are hashes with the following
keys and values:

=over 4

=item * C<is_subtree>: 1 for subtree, 0 for a leaf.

=item * C<parent>: token of the parent node, or empty string if this is
  the top of the tree.

=item * C<path>: the full node name. Subtree names must end with slash,
  and leaf names should end with alphanumeric characters.

=item * C<params>: hash with parameter names and values.

=item * C<vars>: hash with variable values (used in setvar, iftrue and
  iffalse XML statements).

=item * C<src>: optional hash of source XML file names as keys and "1"
  as values. It's only defined on the topmost node that is affected by a
  given XML. If an XML file updates a previously defined node, the
  C<src> content is copied from the nearest parent where it's defined.

=back


The JSON files within C<other> are hashes with the following keys and
values:

=over 4

=item * C<params>: hash with parameter names values.

=back


The agent branches contain JSON files named after
the node tokens, arranged into two-level 256-way tree structure. Each
daemon that needs a quick access to a subset of leaf nodes (primarily,
I<collector>, and also I<monitor>) retrieve the node configurations from
this structure. The instance number is a 4-digit lower-case hexademical
number. The JSON files are hashes defining all parameter values needed
by the daemon. These files are populated by the XML compiler after the
tree is processed.


An optional C<searchdb> branch is used for indexing the node parameters
in order to provide the search in GUI. It consists of the following
directories:

=over 4

=item * C<words/I<TREE>/> contains zero-length files in the following
  hierarchy: C<I<KEYWORD>/I<TOKEN>/I<PARAM>>. If a keyword is matched in
  the subtree or leaf name, the file name is C<__NODENAME__>.

=item * C<wordsglobal/> is the same as above, but for global search. In
  addition, a file called C<__TREENAME__> contains a JSON scalar with
  the tree name where this token is defined.

=item * C<tokens/> is a two-level 256-way hierarchy of directories based
  on token ID's. These directories contain zero-length files named after
  keywords.

=item * C<configtree_ref/> is a directory containing files named after
  the config tree names. Each file is a JSON scalar indicating the
  commit ID in corresponding configtree branch.

=back


    

=head2 Redis database

Redis is an in-memory database, supporting key/value hashes and linear
arrays, with periodic saving to disk storage. Torrus keeps all run-time
and dynamic information in Redis.

All Redis keys that are used within a single Torrus installation are
prefixed with a configurable prefix ("torrus:" by default), thus
allowing multiple Torrus installations to use the same Redis
instance. Further in this document, the prefix is omitted for easier
reading.

=over 4

=item * C<gitlock:I<REPOPATH>> -- this key is used as a mutex that
  protects a local Git repository from simultaneous initialization by
  multiple processes. Before accessing the repository in writer mode,
  the writer sets this Redis key to the current UNIX timestamp.

=item * C<writer:I<REPOPATH>> -- this is a hash representing active
  Torrus::ConfigTree::Writer objects. Each key is the process ID, and
  values are the UNIX timestamps when the writer objects were
  created. Entries older than 24 hours are automatically removed. This
  hash aims to prevent Git garbage collector from running while there
  are active compiler processes.

=item * C<githeads> -- this is a hash containing commit numbers written
  by the compiler. The keys are branch names, and the values are the Git
  commit numbers of corresponding tops of the branches. The consumer
  process compares this with the current known commit and pulls the
  updates if needed.

=item * C<tsets:I<TREE>> -- hash of tokenset names as keys and "1" as
  values.

=item * C<tset:I<TREE>:I<TSET>> -- hash of tokenset members. Tokens are
  the keys, and the values indicate the origins. Currently known origins
  are "static" and "monitor".

=item * C<tsetparam:I<TREE>:I<TSET>> -- a hash of tokenset parameters.

=item * C<users> -- a hash containing users and groups, as described below.

=item * C<acl> -- a hash containing the access privileges for groups, as
  described below.

=item * C<monitor_alarms:I<TREE>> is a hash that keeps alarm status
  information from previous runs of Monitor, with the keys and values as
  described below.

=item * C<scheduler_stats:I<TREE>> is a hash which stores the runtime
  statistics of Scheduler tasks. Each key is of structure
  C<I<TYPE>:I<TASKNAME>:I<INSTANCE>:I<PERIOD>:I<OFFSET>>#I<VARIABLE>>,
  and the value is a number representing the current value of the
  variable.  Depending on variable purpose, the number is floating point
  or integer.

=item * C<serviceid_params> is a hash containing properties for each
  Service ID (exported collector information, usually stored in an SQL
  database). The keys are Service IDs, and values are JSON hashes
  describing the properties. Known parameters are: C<trees>, C<token>,
  C<dstype>, C<units>.

=item * C<serviceid_tokens> is a hash with tokens as keys and Service ID
  as values.

=item * C<snmp_failures:I<TREE>> -- a hash listing SNMP failures in the
  collector, as described below.

=back

PubSub channels:

=over 4

=item * C<treecommits:I<TREE>> -- the value of every new Git commit in
  C<I<TREE>_configtree> branch is published to this channel.

=back



=head3 C<users> contents

=over 4

=item * C<ua:I<UID>:I<ATTR>> => C<I<VALUE>>

User attributes, such as C<cn> (Common name) or C<userPassword>, are
stored here. For each user, there is a record consisting of the
attribute C<uid>, with the value equal to the user identifier.

=item * C<uA:I<UID>> => C<I<ATTR>,...>

Comma-separated list of attribute names for the given user.

=item * C<gm:I<UID>> => C<I<group>,...>

For each user ID, stores the comma-separated list of groups it belongs to.

=item * C<ga:I<GROUP>:I<ATTR>> => C<I<VALUE>>

Group attributes, such as group description.

=item * C<gA:I<GROUP>> => C<I<ATTR>,...>

Comma-separated list of attribute names for the given group.

=item * C<G:> => C<I<GROUP>,...>

List of all groups

=back



=head3 C<acl> contents

=over 4

=item * C<I<GROUP>:I<OBJECT>:I<PRIVILEGE>> => C<1>

The entry exists if and only if the group members have this privilege
over the object given. Most common privilege is C<DisplayTree>, where
the object is the tree name.

=back



=head3 C<monitor_alarms> contents

=over 4

=item * C<I<MNAME>:I<TOKEN>> =>
  C<I<T_SET>:I<T_EXPIRES>:I<STATUS>:I<T_LAST_CHANGE>
  [:I<ESCALATION>[:I<ESCALATION>...]]>


Key consists of the monitor name and leaf token. In the value, C<T_SET>
  is the time when the alarm was raised. If two subsequent runs of
  Monitor raise the same alarm, C<T_SET> does not change. C<T_EXPIRES>
  is the timestamp that shows until when it's still important to keep
  the entry after the alarm is cleared. C<STATUS> is 1 if the alarm is
  active, and 0 otherwise.  C<T_LAST_CHANGE> is the timestamp of last
  status change. Following values are optional escalation times if
  escalation events were fired.

If C<STATUS> is 1, the record is kept regardless of timestamps.  If
  C<STATUS> is 0, and the current time is more than C<T_EXPIRES>, the
  record is not reliable and may be deleted by Monitor.

=back



=head3 C<serviceid_params> contents

=over 4

=item * C<a:> => C<I<SERVICEID>,...>

Lists all known service IDs

=item * C<t:I<TREE>> => C<I<SERVICEID>,...>

Lists service IDs exported by a given datasource tree.

=item * C<p:I<SERVICEID>:I<PARAM>> => C<I<VALUE>>

Parameter value for a given service ID. Mandatory parameters are:
C<tree>, C<token>, C<dstype>. Optional: C<units>.

=item * C<P:I<serviceid>> => C<I<PARAM>,...>

List of parameter names for a service ID.

=back



=head3 C<snmp_failures> contents

=over 4

=item * C<c:I<counter>> => C<I<number>>

A counter with a name. Known names: C<unreachable>, C<deleted>, C<mib_errors>.

=item * C<h:I<hosthash>> => C<I<failure>:I<timestamp>>

SNMP host failure information. Hosthash is a concatenation of hostname,
UDP port, and SNMP community, separated by "|". Known failures:
C<unreachable>, C<deleted>. Timestamp is a UNIX time of the event.

=item * C<m:I<TOKEN>> => C<I<timestamp>>

MIB failures (I<noSuchObject>, I<noSuchInstance>, and I<endOfMibView>)
for a given host, with the tree path of their occurence and the UNIX timestamp.

=item * C<M:I<hosthash>> => C<I<number>>

Count of MIB failures per SNMP host.

=back









=head2 Search and indexing service

Searching within trees is implemented in a standalone service,
consisting of two parts:

=over 4

=item 1. the daemon that subscribes to C<treecommits:*> channels and
  updates its database after every commit;

=item 2. a RESTful API service for retrieving the search results

=back





=head1 Author

Copyright (c) 2016-2017 Stanislav Sinyagin ssinyagin@k-open.com