File: dirfile.5

package info (click to toggle)
libgetdata 0.11.0-17
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 13,144 kB
  • sloc: ansic: 100,814; cpp: 4,843; fortran: 4,548; f90: 2,561; python: 2,406; perl: 2,274; makefile: 1,487; php: 1,465; sh: 86
file content (153 lines) | stat: -rw-r--r-- 6,170 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
.\" dirfile.5.  The dirfile man page.
.\"
.\" Copyright (C) 2005, 2006, 2008, 2009, 2014, 2016 D. V. Wiebe
.\"
.\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
.\"
.\" This file is part of the GetData project.
.\"
.\" Permission is granted to copy, distribute and/or modify this document
.\" under the terms of the GNU Free Documentation License, Version 1.2 or
.\" any later version published by the Free Software Foundation; with no
.\" Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
.\" Texts.  A copy of the license is included in the `COPYING.DOC' file
.\" as part of this distribution.
.\"
.TH dirfile 5 "19 November 2016" "Standards Version 10" "DATA FORMATS"
.SH NAME
dirfile \(em a filesystem-based database format for time-ordered binary data
.SH DESCRIPTION
The
.I dirfile
database format is designed to provide a fast, simple format for storing and
reading binary time-ordered data.  Dirfiles can be read using the GetData
Library, which provides a reference implementaiton of these Standards.

The dirfile database is centred around one or more time-ordered data streams (a
.IR "time stream" ).
Each time stream is written to the filesystem in a separate file, as binary
data.  The name of these binary files correspond to the time stream's
.IR "field name" .
Dirfiles support binary data fields for signed and unsigned integer types of 8
to 64 bits, as well as single and double precision floating-point real or
complex data types.

Two time streams may have different constant sampling frequencies and mechanisms
exist within the dirfile format to ensure these time streams remain properly
sequenced in time.

To do this, the time streams in the dirfile are subdivided into
.IR frames .
Each frame contains a fixed integer number of samples of each time stream.
Two time streams in the same dirfile may have different numbers of samples
per frame, but the number of samples per frame of any given time stream is
fixed.

When synchronous retrieval of data from more than one time stream is required,
position in the dirfile can be specified in frames, which will ensure
synchronicity.

The binary files are all located in one ore more filesystem directories,
rooted around a central directory, known as the
.IR "dirfile directory" .
The dirfile as a whole may be referred to by its dirfile directory path.

Included in the dirfile along with the time streams is the
.IR "dirfile format specification" ,
which is one or more ASCII text files containing the dirfile database metadata.
The primary file is the file called
.B format
located in the dirfile directory.  This file and any additional files that
it names, fully specify the dirfile's metadata.  For the syntax of these files,
see
.BR dirfile\-format (5).

Version 3 of the Dirfile Standards introduced the
.I "large dirfile"
extension.  This extension added the ability to distribute the dirfile metadata
among multiple files (called
.IR fragments )
in addition to the 
.B format
file, as well as the ability to house portions of the database in
.IR subdirfiles .
These subdirfiles may be fully fledged dirfiles in their own right, but may also
be contained within a larger, parent dirfile.  See
.BR dirfile\-format (5)
for information on specifying these subdirfiles.

In addition to the raw fields on disk, the dirfile format specification may
also specify
.I derived fields
which are calculated by performing simple element-wise operations on one or
more input fields.  Derived fields behave identically to raw fields when read
via GetData.  See
.BR dirfile\-format (5)
for a complete list of derived field types.  Dirfiles may also contain both
numerical and character string constant
.IR "scalar fields" ,
also further outlined in
.BR dirfile\-format (5).

Dirfiles are designed to be written to and read simultaneously. The dirfile
specification dictates that one particular raw field (specified either
explicitly or implicitly by the dirfile metadata) is to be used as the
.IR "reference field" :
all other vector fields are assumed to have at least as many frames as the
reference field has, and the size (in frames) of the reference field is used as
the size of the dirfile as a whole.

Version 6 of the Dirfile Standards added the ability to encode the binary files
on disk.  Each
.I fragment
may have its own encoding scheme.  Most commonly, encodings are used to compress
the data files to same space.  See
.BR dirfile\-encoding (5)
for information on encoding schemes.

.SS Complex Number Storage Format
Version 7 of the Dirfile Standards added support for complex valued data.
Two types of complex valued data are supported by the Dirfile Standards:
.IP \(bu 4
A 64-bit complex number consisting of a IEEE-754 standard 32-bit single
precision floating point real part and a IEEE-754 standard 32-bit single
precision floating point imaginary part, and
.IP \(bu 4
A 128-bit complex number consisting of a IEEE-754 standard 64-bit double
precision floating point real part and a IEEE-754 standard 64-bit double
precision floating point imaginary part.
.PP
No integer-type complex numbers are supported.

Unencoded complex numbers are stored on disk in "Fortran order", that is
with the IEEE-754 real part followed by the IEEE-754 imaginary part.  The
specified endianness of the two components follows that of purely real floating
point numbers.  Endianness does not affect the ordering of the real and
imaginary parts.  This format also conforms to the C99 and C++11 standards.

To aid in using complex valued data, dirfile field codes may contain a
.I representation suffix
which specifies a function to apply to the complex valued data to map it into
purely real data.  See
.BR dirfile\-format (5).

.SH AUTHORS

The Dirfile format was created by C. B. Netterfield
.nh
<netterfield@astro.utoronto.ca>.
.hy 1
It is now maintained by D. V. Wiebe
.nh
<getdata@ketiltrout.net>.
.hy 1

.SH SEE ALSO
.BR dirfile\-encoding (5),
.BR dirfile\-format (5)
.PP
For an introduction to the GetData Library reference implementation, see
.BR gd_open (3), or
.BR gd_getdata (3),
or visit the GetData Project website:
<http://getdata.sourceforge.net/>.