1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403
|
\input texinfo @c -*-texinfo-*-
@setfilename dateutils.info
@comment node-name, next, previous, up
@ifinfo
@dircategory Miscellaneous
@direntry
* dateutils: (dateutils). Command line tools to fiddle with dates.
@end direntry
@dircategory Individual utilities
@direntry
* dateadd: (dateutils)dateadd. Add durations to dates or times.
* dateconv: (dateutils)dateconv. Convert dates between
calendars or time zones.
* datediff: (dateutils)datediff. Compute durations between dates
and times.
* dategrep: (dateutils)dategrep. Find date or time matches in
input stream.
* dateround: (dateutils)dateround. Round dates or times to
designated values.
* dateseq: (dateutils)dateseq. Sequences of dates or times.
* datesort: (dateutils)datesort. Sort chronologically.
* datetest: (dateutils)datetest. Compare dates or times.
* datezone: (dateutils)datezone. Convert date/times to
different timezones in bulk.
* strptime: (dateutils)strptime. Command line version
of the C function.
@end direntry
This manual documents the dateutils package.
Copyright @copyright{} 2011-2016 Sebastian Freundt.
Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License,
Version 1.3 or any later version published by the Free Software
Foundation; with no Invariant Sections, with no Front-Cover Texts,
and with no Back-Cover Texts. A copy of the license is included
in the section entitled "GNU Free Documentation License".
@end ifinfo
@c
@setchapternewpage odd
@settitle dateutils User's Manual
@c
@titlepage
@sp 6
@center @titlefont{dateutils User's Manual}
@sp 4
@sp 1
@sp 1
@center September 2011
@sp 5
@center Sebastian Freundt
@page
@vskip 0pt plus 1filll
Copyright @copyright{} 2011-2016 Sebastian Freundt.
Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License,
Version 1.3 or any later version published by the Free Software
Foundation; with no Invariant Sections, with no Front-Cover Texts,
and with no Back-Cover Texts. A copy of the license is included
in the section entitled "GNU Free Documentation License".
@end titlepage
@page
@node Top
@top dateutils
dateutils are a bunch of tools that revolve around fiddling with dates
and times in the command line with a strong focus on use cases that
arise when dealing with large amounts of financial data.
@menu
* Introduction:: Motivation, background, etc.
* Calendars:: Calendar concepts and format specs
* Format specifiers and units:: How to control input and output
* Localised input and output:: Date/times specific to regions
* Algorithms:: Some notes on the algorithms used
* dateadd:: Add durations to dates or times
* dateconv:: Convert dates between calendars or time zones
* datediff:: Compute durations between dates and times
* dategrep:: Find date or time matches in input stream
* dateround:: Round dates or times to designated values
* dateseq:: Generate sequences of dates or times
* datesort:: Sort the contents of files chronologically
* datetest:: Compare dates or times
* datezone:: Convert date/times to timezones in bulk
* strptime:: Command line version of the C function
@end menu
@node Introduction
@chapter Introduction
The dateutils package is an attempt to provide a comprehensive set of
command line tools that behave intuitively, support a similar variety of
inputs, allow for normalised output and, most importantly, cover nearly
all tasks that arise whilst dealing with huge amounts of dates and
unusual calendars.
In day to day business, the authors have run across a plethora of
approaches to tackle seemingly simple problems. When the command line
is a necessity most of these involve perl or awk, which suffer from bad
maintainability simply because those tools weren't made to process dates
and times in particular, and for the same reason those approaches are
anything but fast.
And conversely, good maintainability and fast processing is often seen
in integrated systems, such as R or database solutions but they come at
a price: little flexibility. It's very uncommon to import billions of
dates into a database just to compare if they are greater than or equal
to a given date. Besides, most of the time those billions of dates come
from external sources and have to be brought into shape before or whilst
importing them into an integrated system.
@node Calendars
@chapter Calendars
All of the dateutils command line apps are built around a library,
libdut. The textual representation of dates is converted to an
internal one first, then the date operation is performed and the
internal representation in the end is converted back to a textual
one.
While many of the tools that provide date arithmetic or conversions
employ a single standard internal representation (mostly days since a
reference date, or seconds since the epoch), the date-core library
provides a variety of calendars, each suitable for one or more
specific tasks.
Usually it's the tool's task to find out which calendar is the best
for the task at hand, but to use the library directly we give an
overview of the supported calendars here along with some of their
properties.
@section ymd, aka gregorian
This is the `ordinary' calendar, holding a year, a month and the day
of the month. Years are subdivided into 12 months, months are
subdivided into 28 to 31 days.
The canonical input and output format is @code{%Y-%m-%d} (ISO 8601).
For outputs the format string @code{ymd} can be used as well.
@section ymcw
This is a calendric system used at some security exchanges, and can be
thought of as a mixture of the year-month-day and year-week-day
calendars. It holds a year, a month, a weekday and the count of the
weekday within the month. So dates like the third Thursday in March
2011 can be expressed trivially in this system. Years are subdivided
into months as in the ymd calendar, months consist of at least 4 and up
to 5 weeks and weeks are subdivided into weekdays, Monday, Tuesday,
Wednesday, Thursday, Friday, Saturday and Sunday.
The canonical input and output format is @code{%Y-%m-%c-%w}. The
week is enumerated by 01 for Monday, 02 for Tuesday, @dots{}, 07 for Sunday,
as mandated by ISO 8601. For outputs the format string @code{ymcw} can
be used as well.
@section ywd
This calendar is ISO's 8601 year-week-day calendar. Every year is
divided into 52 weeks. Every week is divided into 7 days, starting with
Monday. To make up for ``missing days'' a leap week is introduced every
now and again, i.e. years with leap weeks are 53 weeks long and called
leap years (note: ISO does not use the term leap year or leap week).
The canonical input and output format is @code{%rY-W%V-%u}, so
e.g. 2012-W04-02. Note that ywd years don't necessarily correspond to
Gregorian/ymd years, hence the @code{r} modifier in the spec
(@code{%rY} is used in calendars to select the ``real'' calendar year).
The format string @code{ywd} can be used as well.
@section daisy
This is one of the most common calendric systems, it counts the days
since a given reference date (we originally went for daysi, short for
days since, but after the 100th spelling mistake, because daisy is so
much more natural, we decided to stick with daisy). In the Julian
calendar this was the founding of Rome, in many database systems this is
the 1st Jan 1, or other convenient dates like the 1st Jan 1970 or 1st
Jan 1900.
The (internal) dateutils reference day is the 31st Dec 1600, or
equivalently the Naught-th of Jan 1601. This may look like an
arbitrary date but it has some interesting properties:
- it is a Sunday, so weekday computation is easy as it is a mod 7
operation only
- the year is 1 mod 4, so the number of leap years between the reference
year and any other year Y is simply Y div 4.
In earlier versions we used 1917 for the reference year because it's the
first year after 1900 that meets the conditions above. But then we
relaxed leap year rules to support dates beyond 2100 (2100 isn't a leap
year) so for symmetry reasons we should support years before 1917 in a
similar fashion.
The daisy calendar has no notion of years or months.
The daisy calendar as such has no associated input or output format.
This is because we reserve the freedom to change the reference date in
the future (like we have done in the past). However, daisy is easily
converted to and from other similar day counting systems so their input
and output format may be used instead.
Also noteworthy on a more informative level, dateutils globally assumes
the Gregorian date switch to have taken place in October 1582. This is
unlike some other well-known tools (Unix cal for instance) which assume
the reformation to have occurred in September 1752.
@subsection julian, jdn
A close sibling to the daisy system described before is the Julian day
number, with reference date 01 Jan 4713 BC in the proleptic Julian
calendar.
The input and output format string is @code{jdn} or @code{julian}, there
is no format specifier.
@subsection lilian, ldn
Another close sibling to the daisy system is the Lilian day number, its
reference date is the first day of the Gregorian calendar, 15 Oct 1582.
The input and output format string is @code{ldn} or @code{lilian}, there
is no format specifier.
@subsection matlab, mdn
Taking the Lilian day number further (in a proleptic sense) to the
beginning of time, the First of January in year 0, you get the Matlab
day number.
The input and output format string is @code{mdn} or @code{matlab}, there
is no format specifier.
@section bizda
This calendric system is actually a whole tribe of calendars, in its
canonical form this tracks the number of business days after last
month's ultimo. Where business day means any non-weekend day. Years
are divided into 12 months, each of which consists of 20 to 23
business days.
The canonical input and output format is @code{%Y-%m-%db}, where
@code{b} is a suffix modifier meaning business days since last month's
ultimo. Similarly, @code{%Y-%m-%dB} stands for the business days until
this month's ultimo.
@node Format specifiers and units
@chapter Format specifiers and units
The tools provided by dateutils would be pretty meaningless if you could
not control how input is to be read or results are to be printed.
Following the footsteps of proven systems (such as libc) output
formatting is controlled through format specifiers (and flags), as
covered by the next 2 sections.
Nonetheless, some tools (@samp{dadd}, @samp{dseq}) require durations to
be input. Their syntax is covered in the final section.
@include format.texi
@include format-ddiff.texi
@include units.texi
@node Localised input and output
@chapter Localised input and output
Input and output may need to be refined even beyond what simple format
specifiers can accomplish, for example time zone information might not
be present in the input (or must not be present in the output); or using
named weekdays (@samp{%a}, @samp{%A}) or months (@samp{%b}, @samp{%B})
depending on region and language.
We present dateutils' solution in the next 2 sections.
@include zone.texi
@include locale.texi
@node Algorithms
@chapter Algorithms
Coming from a financial background, shaving milliseconds off of code is
daily core business. And that in turn is why some of the algorithms
used in dateutils might look different to what other people do.
In this chapter key concepts are briefly introduced. For implementation
details see the @file{*-core.c} and @file{*-core.h} files.
@section 32-adic year-days
Months, in the Gregorian sense, are roughly 30 days long. Some are
longer, one is shorter. It comes in handy though that the average month
length is close to a 2-power, 32. So we can introduce a 32-adic year as
follows:
@verbatim
year-day traditional 32-adic m/d
0 19 Jan|00 00|19
1 20 Jan|01 00|20
31 50 Jan|31 01|18
32 51 Feb|01 01|19
59 78 Feb|28 02|14
60 79 Mar|01 02|15
90 109 Mar|31 03|13
91 110 Apr|01 03|14
120 139 Apr|30 04|11
121 140 May|01 04|12
151 170 May|31 05|10
181 200 Jun|30 06|08
212 231 Jul|31 07|07
243 262 Aug|31 08|06
273 292 Sep|30 09|04
304 323 Oct|31 10|03
334 353 Nov|30 11|01
365 384 Dec|31 12|00
@end verbatim
As can be seen, the year has 19 unexplained days at the beginning, and,
more unintuitively, (traditional) months start in one (32-adic) month
and end in another. But the ending always coincides with the
traditional months.
Another nice thing about this year is that all months end on a
different day; this is nice because you can deduce the month just
by naming its ending (or the beginning respectively, as all months as a
consequence also start on a different day).
The final nice thing lies in the property, that months that end (or
start) on a day <=16, are affected by the leap day introduced in Feb in
the traditional Gregorian calendar. By affected we mean, that the
months start (and end) one day off in a leap year.
So to cut this story short, the 32-adic year makes computations easier
that need a lot of mapping between the day in the year and the month+day
representation, e.g. weekday computations (the 32-adic year preserves
weekdays as 32 and 7 are co-prime), day differences, etc.
The algorithm to get the day of the year from a 32-adic month+day is:
@verbatim
Input: m <- 32-adic month
d <- 32-adic day of month
Output: yd -> day of year
yd <- m * 32 + d - 19
@end verbatim
And conversely. The algorithm to get the 32-adic month+day is trivial.
To convert between 32-adic month+day and Gregorian month+day:
@verbatim
Input: m <- 32-adic month
d <- 32-adic day of month
Output: m -> 32-adic month
d -> 32-adic day of month
/* day of month of the beginning and end of m */
bom <- get_beg_of_month(m)
eom <- get_end_of_month(m)
/* get yd as described above */
yd <- get_yd(m, d)
if d <= eom
d <- yd - (32 * (m - 1) - 19 + bom - 1)
else
d <- yd - (32 * m - 19 + eom)
m <- m + 1
@end verbatim
@include dateadd.texi
@include dateconv.texi
@include datediff.texi
@include dategrep.texi
@include dateround.texi
@include dateseq.texi
@include datesort.texi
@include datetest.texi
@include datezone.texi
@include strptime.texi
@summarycontents
@contents
@bye
@c Remember to delete these lines before creating the info file.
@iftex
@bindingoffset = 0.5in
@parindent = 0pt
@end iftex
|