File: dateutils.texi

package info (click to toggle)
dateutils 0.4.3-1
  • links: PTS, VCS
  • area: main
  • in suites: buster
  • size: 4,924 kB
  • sloc: ansic: 22,097; makefile: 1,666; yacc: 198; sh: 168; lex: 108
file content (403 lines) | stat: -rw-r--r-- 15,164 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
\input texinfo  @c -*-texinfo-*-
@setfilename dateutils.info
@comment  node-name,  next,  previous,  up

@ifinfo
@dircategory Miscellaneous
@direntry
* dateutils: (dateutils).	Command line tools to fiddle with dates.
@end direntry

@dircategory Individual utilities
@direntry
* dateadd: (dateutils)dateadd.          Add durations to dates or times.
* dateconv: (dateutils)dateconv.        Convert dates between
                                          calendars or time zones.
* datediff: (dateutils)datediff.        Compute durations between dates
                                          and times.
* dategrep: (dateutils)dategrep.        Find date or time matches in
                                          input stream.
* dateround: (dateutils)dateround.      Round dates or times to
                                          designated values.
* dateseq: (dateutils)dateseq.          Sequences of dates or times.
* datesort: (dateutils)datesort.        Sort chronologically.
* datetest: (dateutils)datetest.        Compare dates or times.
* datezone: (dateutils)datezone.        Convert date/times to
                                          different timezones in bulk.
* strptime: (dateutils)strptime.        Command line version
                                          of the C function.
@end direntry

This manual documents the dateutils package.

Copyright @copyright{} 2011-2016 Sebastian Freundt.

Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License,
Version 1.3 or any later version published by the Free Software
Foundation; with no Invariant Sections, with no Front-Cover Texts,
and with no Back-Cover Texts.  A copy of the license is included
in the section entitled "GNU Free Documentation License".
@end ifinfo
@c
@setchapternewpage odd
@settitle dateutils User's Manual
@c
@titlepage
@sp 6
@center @titlefont{dateutils User's Manual}
@sp 4
@sp 1
@sp 1
@center September 2011
@sp 5
@center Sebastian Freundt
@page
@vskip 0pt plus 1filll
Copyright @copyright{} 2011-2016 Sebastian Freundt.

Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License,
Version 1.3 or any later version published by the Free Software
Foundation; with no Invariant Sections, with no Front-Cover Texts,
and with no Back-Cover Texts.  A copy of the license is included
in the section entitled "GNU Free Documentation License".
@end titlepage
@page

@node Top
@top dateutils

dateutils are a bunch of tools that revolve around fiddling with dates
and times in the command line with a strong focus on use cases that
arise when dealing with large amounts of financial data.

@menu
* Introduction::        Motivation, background, etc.
* Calendars::           Calendar concepts and format specs
* Format specifiers and units:: How to control input and output
* Localised input and output::  Date/times specific to regions
* Algorithms::          Some notes on the algorithms used
* dateadd::             Add durations to dates or times
* dateconv::            Convert dates between calendars or time zones
* datediff::            Compute durations between dates and times
* dategrep::            Find date or time matches in input stream
* dateround::           Round dates or times to designated values
* dateseq::             Generate sequences of dates or times
* datesort::            Sort the contents of files chronologically
* datetest::            Compare dates or times
* datezone::            Convert date/times to timezones in bulk
* strptime::            Command line version of the C function
@end menu


@node Introduction
@chapter Introduction

The dateutils package is an attempt to provide a comprehensive set of
command line tools that behave intuitively, support a similar variety of
inputs, allow for normalised output and, most importantly, cover nearly
all tasks that arise whilst dealing with huge amounts of dates and
unusual calendars.

In day to day business, the authors have run across a plethora of
approaches to tackle seemingly simple problems.  When the command line
is a necessity most of these involve perl or awk, which suffer from bad
maintainability simply because those tools weren't made to process dates
and times in particular, and for the same reason those approaches are
anything but fast.

And conversely, good maintainability and fast processing is often seen
in integrated systems, such as R or database solutions but they come at
a price: little flexibility.  It's very uncommon to import billions of
dates into a database just to compare if they are greater than or equal
to a given date.  Besides, most of the time those billions of dates come
from external sources and have to be brought into shape before or whilst
importing them into an integrated system.


@node Calendars
@chapter Calendars

All of the dateutils command line apps are built around a library,
libdut.  The textual representation of dates is converted to an
internal one first, then the date operation is performed and the
internal representation in the end is converted back to a textual
one.

While many of the tools that provide date arithmetic or conversions
employ a single standard internal representation (mostly days since a
reference date, or seconds since the epoch), the date-core library
provides a variety of calendars, each suitable for one or more
specific tasks.

Usually it's the tool's task to find out which calendar is the best
for the task at hand, but to use the library directly we give an
overview of the supported calendars here along with some of their
properties.

@section ymd, aka gregorian

This is the `ordinary' calendar, holding a year, a month and the day
of the month.  Years are subdivided into 12 months, months are
subdivided into 28 to 31 days.

The canonical input and output format is @code{%Y-%m-%d} (ISO 8601).
For outputs the format string @code{ymd} can be used as well.

@section ymcw

This is a calendric system used at some security exchanges, and can be
thought of as a mixture of the year-month-day and year-week-day
calendars.  It holds a year, a month, a weekday and the count of the
weekday within the month.  So dates like the third Thursday in March
2011 can be expressed trivially in this system.  Years are subdivided
into months as in the ymd calendar, months consist of at least 4 and up
to 5 weeks and weeks are subdivided into weekdays, Monday, Tuesday,
Wednesday, Thursday, Friday, Saturday and Sunday.

The canonical input and output format is @code{%Y-%m-%c-%w}.  The
week is enumerated by 01 for Monday, 02 for Tuesday, @dots{}, 07 for Sunday,
as mandated by ISO 8601.  For outputs the format string @code{ymcw} can
be used as well.

@section ywd

This calendar is ISO's 8601 year-week-day calendar.  Every year is
divided into 52 weeks.  Every week is divided into 7 days, starting with
Monday.  To make up for ``missing days'' a leap week is introduced every
now and again, i.e. years with leap weeks are 53 weeks long and called
leap years (note: ISO does not use the term leap year or leap week).

The canonical input and output format is @code{%rY-W%V-%u}, so
e.g. 2012-W04-02.  Note that ywd years don't necessarily correspond to
Gregorian/ymd years, hence the @code{r} modifier in the spec
(@code{%rY} is used in calendars to select the ``real'' calendar year).
The format string @code{ywd} can be used as well.

@section daisy

This is one of the most common calendric systems, it counts the days
since a given reference date (we originally went for daysi, short for
days since, but after the 100th spelling mistake, because daisy is so
much more natural, we decided to stick with daisy).  In the Julian
calendar this was the founding of Rome, in many database systems this is
the 1st Jan 1, or other convenient dates like the 1st Jan 1970 or 1st
Jan 1900.

The (internal) dateutils reference day is the 31st Dec 1600, or
equivalently the Naught-th of Jan 1601.  This may look like an
arbitrary date but it has some interesting properties:
- it is a Sunday, so weekday computation is easy as it is a mod 7
  operation only
- the year is 1 mod 4, so the number of leap years between the reference
  year and any other year Y is simply Y div 4.

In earlier versions we used 1917 for the reference year because it's the
first year after 1900 that meets the conditions above.  But then we
relaxed leap year rules to support dates beyond 2100 (2100 isn't a leap
year) so for symmetry reasons we should support years before 1917 in a
similar fashion.

The daisy calendar has no notion of years or months.

The daisy calendar as such has no associated input or output format.
This is because we reserve the freedom to change the reference date in
the future (like we have done in the past).  However, daisy is easily
converted to and from other similar day counting systems so their input
and output format may be used instead.

Also noteworthy on a more informative level, dateutils globally assumes
the Gregorian date switch to have taken place in October 1582.  This is
unlike some other well-known tools (Unix cal for instance) which assume
the reformation to have occurred in September 1752.

@subsection julian, jdn

A close sibling to the daisy system described before is the Julian day
number, with reference date 01 Jan 4713 BC in the proleptic Julian
calendar.

The input and output format string is @code{jdn} or @code{julian}, there
is no format specifier.

@subsection lilian, ldn

Another close sibling to the daisy system is the Lilian day number, its
reference date is the first day of the Gregorian calendar, 15 Oct 1582.

The input and output format string is @code{ldn} or @code{lilian}, there
is no format specifier.

@subsection matlab, mdn

Taking the Lilian day number further (in a proleptic sense) to the
beginning of time, the First of January in year 0, you get the Matlab
day number.

The input and output format string is @code{mdn} or @code{matlab}, there
is no format specifier.

@section bizda

This calendric system is actually a whole tribe of calendars, in its
canonical form this tracks the number of business days after last
month's ultimo.  Where business day means any non-weekend day.  Years
are divided into 12 months, each of which consists of 20 to 23
business days.

The canonical input and output format is @code{%Y-%m-%db}, where
@code{b} is a suffix modifier meaning business days since last month's
ultimo.  Similarly, @code{%Y-%m-%dB} stands for the business days until
this month's ultimo.


@node Format specifiers and units
@chapter Format specifiers and units

The tools provided by dateutils would be pretty meaningless if you could
not control how input is to be read or results are to be printed.
Following the footsteps of proven systems (such as libc) output
formatting is controlled through format specifiers (and flags), as
covered by the next 2 sections.

Nonetheless, some tools (@samp{dadd}, @samp{dseq}) require durations to
be input.  Their syntax is covered in the final section.

@include format.texi
@include format-ddiff.texi
@include units.texi

@node Localised input and output
@chapter Localised input and output

Input and output may need to be refined even beyond what simple format
specifiers can accomplish, for example time zone information might not
be present in the input (or must not be present in the output); or using
named weekdays (@samp{%a}, @samp{%A}) or months (@samp{%b}, @samp{%B})
depending on region and language.

We present dateutils' solution in the next 2 sections.

@include zone.texi
@include locale.texi


@node Algorithms
@chapter Algorithms

Coming from a financial background, shaving milliseconds off of code is
daily core business.  And that in turn is why some of the algorithms
used in dateutils might look different to what other people do.

In this chapter key concepts are briefly introduced.  For implementation
details see the @file{*-core.c} and @file{*-core.h} files.

@section 32-adic year-days

Months, in the Gregorian sense, are roughly 30 days long.  Some are
longer, one is shorter.  It comes in handy though that the average month
length is close to a 2-power, 32.  So we can introduce a 32-adic year as
follows:

@verbatim
year-day        traditional     32-adic m/d
0       19      Jan|00          00|19
1       20      Jan|01          00|20
31      50      Jan|31          01|18

32      51      Feb|01          01|19
59      78      Feb|28          02|14

60      79      Mar|01          02|15
90      109     Mar|31          03|13

91      110     Apr|01          03|14
120     139     Apr|30          04|11

121     140     May|01          04|12
151     170     May|31          05|10

181     200     Jun|30          06|08
212     231     Jul|31          07|07
243     262     Aug|31          08|06
273     292     Sep|30          09|04
304     323     Oct|31          10|03
334     353     Nov|30          11|01
365     384     Dec|31          12|00
@end verbatim

As can be seen, the year has 19 unexplained days at the beginning, and,
more unintuitively, (traditional) months start in one (32-adic) month
and end in another.  But the ending always coincides with the
traditional months.

Another nice thing about this year is that all months end on a
different day; this is nice because you can deduce the month just
by naming its ending (or the beginning respectively, as all months as a
consequence also start on a different day).

The final nice thing lies in the property, that months that end (or
start) on a day <=16, are affected by the leap day introduced in Feb in
the traditional Gregorian calendar.  By affected we mean, that the
months start (and end) one day off in a leap year.

So to cut this story short, the 32-adic year makes computations easier
that need a lot of mapping between the day in the year and the month+day
representation, e.g. weekday computations (the 32-adic year preserves
weekdays as 32 and 7 are co-prime), day differences, etc.

The algorithm to get the day of the year from a 32-adic month+day is:
@verbatim
Input: m <- 32-adic month
       d <- 32-adic day of month
Output: yd -> day of year

yd <- m * 32 + d - 19
@end verbatim

And conversely.  The algorithm to get the 32-adic month+day is trivial.

To convert between 32-adic month+day and Gregorian month+day:
@verbatim
Input: m <- 32-adic month
       d <- 32-adic day of month
Output: m -> 32-adic month
        d -> 32-adic day of month

/* day of month of the beginning and end of m */
bom <- get_beg_of_month(m)
eom <- get_end_of_month(m)

/* get yd as described above */
yd <- get_yd(m, d)

if d <= eom
        d <- yd - (32 * (m - 1) - 19 + bom - 1)
else
        d <- yd - (32 * m - 19 + eom)
        m <- m + 1
@end verbatim


@include dateadd.texi
@include dateconv.texi
@include datediff.texi
@include dategrep.texi
@include dateround.texi
@include dateseq.texi
@include datesort.texi
@include datetest.texi
@include datezone.texi
@include strptime.texi

@summarycontents
@contents
@bye


@c Remember to delete these lines before creating the info file.
@iftex
@bindingoffset = 0.5in
@parindent = 0pt
@end iftex