File: sys-parsing.Rd

package info (click to toggle)
r-cran-clock 0.7.2-1
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 3,856 kB
sloc: cpp: 19,564; sh: 17; makefile: 2
file content (357 lines) | stat: -rw-r--r-- 14,881 bytes
parent folder | download | duplicates (2)
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/sys-time.R
\name{sys-parsing}
\alias{sys-parsing}
\alias{sys_time_parse}
\alias{sys_time_parse_RFC_3339}
\title{Parsing: sys-time}
\usage{
sys_time_parse(
  x,
  ...,
  format = NULL,
  precision = "second",
  locale = clock_locale()
)

sys_time_parse_RFC_3339(
  x,
  ...,
  separator = "T",
  offset = "Z",
  precision = "second"
)
}
\arguments{
\item{x}{\verb{[character]}

A character vector to parse.}

\item{...}{These dots are for future extensions and must be empty.}

\item{format}{\verb{[character / NULL]}

A format string. A combination of the following commands, or \code{NULL},
in which case a default format string is used.

A vector of multiple format strings can be supplied. They will be tried in
the order they are provided.

\strong{Year}
\itemize{
\item \verb{\%C}: The century as a decimal number. The modified command \verb{\%NC} where
\code{N} is a positive decimal integer specifies the maximum number of
characters to read. If not specified, the default is \code{2}. Leading zeroes
are permitted but not required.
\item \verb{\%y}: The last two decimal digits of the year. If the century is not
otherwise specified (e.g. with \verb{\%C}), values in the range \verb{[69 - 99]} are
presumed to refer to the years \verb{[1969 - 1999]}, and values in the range
\verb{[00 - 68]} are presumed to refer to the years \verb{[2000 - 2068]}. The
modified command \verb{\%Ny}, where \code{N} is a positive decimal integer, specifies
the maximum number of characters to read. If not specified, the default is
\code{2}. Leading zeroes are permitted but not required.
\item \verb{\%Y}: The year as a decimal number. The modified command \verb{\%NY} where \code{N}
is a positive decimal integer specifies the maximum number of characters to
read. If not specified, the default is \code{4}. Leading zeroes are permitted
but not required.
}

\strong{Month}
\itemize{
\item \verb{\%b}, \verb{\%B}, \verb{\%h}: The \code{locale}'s full or abbreviated case-insensitive
month name.
\item \verb{\%m}: The month as a decimal number. January is \code{1}. The modified command
\verb{\%Nm} where \code{N} is a positive decimal integer specifies the maximum number
of characters to read. If not specified, the default is \code{2}. Leading zeroes
are permitted but not required.
}

\strong{Day}
\itemize{
\item \verb{\%d}, \verb{\%e}: The day of the month as a decimal number. The modified
command \verb{\%Nd} where \code{N} is a positive decimal integer specifies the maximum
number of characters to read. If not specified, the default is \code{2}. Leading
zeroes are permitted but not required.
}

\strong{Day of the week}
\itemize{
\item \verb{\%a}, \verb{\%A}: The \code{locale}'s full or abbreviated case-insensitive weekday
name.
\item \verb{\%w}: The weekday as a decimal number (\code{0-6}), where Sunday is \code{0}. The
modified command \verb{\%Nw} where \code{N} is a positive decimal integer specifies
the maximum number of characters to read. If not specified, the default is
\code{1}. Leading zeroes are permitted but not required.
}

\strong{ISO 8601 week-based year}
\itemize{
\item \verb{\%g}: The last two decimal digits of the ISO week-based year. The
modified command \verb{\%Ng} where \code{N} is a positive decimal integer specifies
the maximum number of characters to read. If not specified, the default is
\code{2}. Leading zeroes are permitted but not required.
\item \verb{\%G}: The ISO week-based year as a decimal number. The modified command
\verb{\%NG} where \code{N} is a positive decimal integer specifies the maximum number
of characters to read. If not specified, the default is \code{4}. Leading zeroes
are permitted but not required.
\item \verb{\%V}: The ISO week-based week number as a decimal number. The modified
command \verb{\%NV} where \code{N} is a positive decimal integer specifies the maximum
number of characters to read. If not specified, the default is \code{2}. Leading
zeroes are permitted but not required.
\item \verb{\%u}: The ISO weekday as a decimal number (\code{1-7}), where Monday is \code{1}.
The modified command \verb{\%Nu} where \code{N} is a positive decimal integer
specifies the maximum number of characters to read. If not specified, the
default is \code{1}. Leading zeroes are permitted but not required.
}

\strong{Week of the year}
\itemize{
\item \verb{\%U}: The week number of the year as a decimal number. The first Sunday
of the year is the first day of week \code{01}. Days of the same year prior to
that are in week \code{00}. The modified command \verb{\%NU} where \code{N} is a positive
decimal integer specifies the maximum number of characters to read. If not
specified, the default is \code{2}. Leading zeroes are permitted but not
required.
\item \verb{\%W}: The week number of the year as a decimal number. The first Monday
of the year is the first day of week \code{01}. Days of the same year prior to
that are in week \code{00}. The modified command \verb{\%NW} where \code{N} is a positive
decimal integer specifies the maximum number of characters to read. If not
specified, the default is \code{2}. Leading zeroes are permitted but not
required.
}

\strong{Day of the year}
\itemize{
\item \verb{\%j}: The day of the year as a decimal number. January 1 is \code{1}. The
modified command \verb{\%Nj} where \code{N} is a positive decimal integer specifies
the maximum number of characters to read. If not specified, the default is
\code{3}. Leading zeroes are permitted but not required.
}

\strong{Date}
\itemize{
\item \verb{\%D}, \verb{\%x}: Equivalent to \verb{\%m/\%d/\%y}.
\item \verb{\%F}: Equivalent to \verb{\%Y-\%m-\%d}. If modified with a width (like \verb{\%NF}),
the width is applied to only \verb{\%Y}.
}

\strong{Time of day}
\itemize{
\item \verb{\%H}: The hour (24-hour clock) as a decimal number. The modified command
\verb{\%NH} where \code{N} is a positive decimal integer specifies the maximum number
of characters to read. If not specified, the default is \code{2}. Leading zeroes
are permitted but not required.
\item \verb{\%I}: The hour (12-hour clock) as a decimal number. The modified command
\verb{\%NI} where \code{N} is a positive decimal integer specifies the maximum number
of characters to read. If not specified, the default is \code{2}. Leading zeroes
are permitted but not required.
\item \verb{\%M}: The minutes as a decimal number. The modified command \verb{\%NM} where
\code{N} is a positive decimal integer specifies the maximum number of
characters to read. If not specified, the default is \code{2}. Leading zeroes
are permitted but not required.
\item \verb{\%S}: The seconds as a decimal number. Leading zeroes are permitted but
not required. If encountered, the \code{locale} determines the decimal point
character. Generally, the maximum number of characters to read is
determined by the precision that you are parsing at. For example, a
precision of \code{"second"} would read a maximum of 2 characters, while a
precision of \code{"millisecond"} would read a maximum of 6 (2 for the values
before the decimal point, 1 for the decimal point, and 3 for the values
after it). The modified command \verb{\%NS}, where \code{N} is a positive decimal
integer, can be used to exactly specify the maximum number of characters to
read. This is only useful if you happen to have seconds with more than 1
leading zero.
\item \verb{\%p}: The \code{locale}'s equivalent of the AM/PM designations associated with
a 12-hour clock. The command \verb{\%I} must precede \verb{\%p} in the format string.
\item \verb{\%R}: Equivalent to \verb{\%H:\%M}.
\item \verb{\%T}, \verb{\%X}: Equivalent to \verb{\%H:\%M:\%S}.
\item \verb{\%r}: Equivalent to \verb{\%I:\%M:\%S \%p}.
}

\strong{Time zone}
\itemize{
\item \verb{\%z}: The offset from UTC in the format \verb{[+|-]hh[mm]}. For example
\code{-0430} refers to 4 hours 30 minutes behind UTC. And \code{04} refers to 4 hours
ahead of UTC. The modified command \verb{\%Ez} parses a \code{:} between the hours and
minutes and leading zeroes on the hour field are optional:
\verb{[+|-]h[h][:mm]}. For example \code{-04:30} refers to 4 hours 30 minutes behind
UTC. And \code{4} refers to 4 hours ahead of UTC.
\item \verb{\%Z}: The full time zone name or the time zone abbreviation, depending on
the function being used. A single word is parsed. This word can only
contain characters that are alphanumeric, or one of \code{'_'}, \code{'/'}, \code{'-'} or
\code{'+'}.
}

\strong{Miscellaneous}
\itemize{
\item \verb{\%c}: A date and time representation. Equivalent to
\verb{\%a \%b \%d \%H:\%M:\%S \%Y}.
\item \code{\%\%}: A \verb{\%} character.
\item \verb{\%n}: Matches one white space character. \verb{\%n}, \verb{\%t}, and a space can be
combined to match a wide range of white-space patterns. For example \code{"\%n "}
matches one or more white space characters, and \code{"\%n\%t\%t"} matches one to
three white space characters.
\item \verb{\%t}: Matches zero or one white space characters.
}}

\item{precision}{\verb{[character(1)]}

A precision for the resulting time point. One of:
\itemize{
\item \code{"day"}
\item \code{"hour"}
\item \code{"minute"}
\item \code{"second"}
\item \code{"millisecond"}
\item \code{"microsecond"}
\item \code{"nanosecond"}
}

Setting the \code{precision} determines how much information \verb{\%S} attempts
to parse.}

\item{locale}{\verb{[clock_locale]}

A locale object created from \code{\link[=clock_locale]{clock_locale()}}.}

\item{separator}{\verb{[character(1)]}

The separator between the date and time components of the string. One of:
\itemize{
\item \code{"T"}
\item \code{"t"}
\item \code{" "}
}}

\item{offset}{\verb{[character(1)]}

The format of the offset from UTC contained in the string. One of:
\itemize{
\item \code{"Z"}
\item \code{"z"}
\item \code{"\%z"} to parse a numeric offset of the form \code{"+0430"}
\item \code{"\%Ez"} to parse a numeric offset of the form \code{"+04:30"}
}}
}
\value{
A sys-time.
}
\description{
There are two parsers into a sys-time, \code{sys_time_parse()} and
\code{sys_time_parse_RFC_3339()}.
\subsection{sys_time_parse()}{

\code{sys_time_parse()} is useful when you have date-time strings like
\code{"2020-01-01T01:04:30"} that you know should be interpreted as UTC, or like
\code{"2020-01-01T01:04:30-04:00"} with a UTC offset but no zone name. If you find
yourself in the latter situation, then parsing this string as a sys-time
using the \verb{\%Ez} command to capture the offset is probably your best option.
If you know that this string should be interpreted in a specific time zone,
parse as a sys-time to get the UTC equivalent, then use \code{\link[=as_zoned_time]{as_zoned_time()}}.

The default options assume that \code{x} should be parsed at second precision,
using a \code{format} string of \code{"\%Y-\%m-\%dT\%H:\%M:\%S"}. This matches the default
result from calling \code{format()} on a sys-time.

\code{sys_time_parse()} is nearly equivalent to \code{\link[=naive_time_parse]{naive_time_parse()}}, except for
the fact that the \verb{\%z} command is actually used. Using \verb{\%z} assumes that the
rest of the date-time string should be interpreted as a naive-time, which is
then shifted by the UTC offset found in \verb{\%z}. The returned time can then be
validly interpreted as UTC.

\emph{\code{sys_time_parse()} ignores the \verb{\%Z} command.}
}

\subsection{sys_time_parse_RFC_3339()}{

\code{sys_time_parse_RFC_3339()} is a wrapper around \code{sys_time_parse()} that is
intended to parse the extremely common date-time format outlined by
\href{https://datatracker.ietf.org/doc/html/rfc3339}{RFC 3339}. This document
outlines a profile of the ISO 8601 format that is even more restrictive.

In particular, this function is intended to parse the following three
formats:

\if{html}{\out{<div class="sourceCode">}}\preformatted{2019-01-01T00:00:00Z
2019-01-01T00:00:00+0430
2019-01-01T00:00:00+04:30
}\if{html}{\out{</div>}}

This function defaults to parsing the first of these formats by using
a format string of \code{"\%Y-\%m-\%dT\%H:\%M:\%SZ"}.

If your date-time strings use offsets from UTC rather than \code{"Z"}, then set
\code{offset} to one of the following:
\itemize{
\item \code{"\%z"} if the offset is of the form \code{"+0430"}.
\item \code{"\%Ez"} if the offset is of the form \code{"+04:30"}.
}

The RFC 3339 standard allows for replacing the \code{"T"} with a \code{"t"} or a space
(\code{" "}). Set \code{separator} to adjust this as needed.

For this function, the \code{precision} must be at least \code{"second"}.
}
}
\details{
If your date-time strings contain a full time zone name and a UTC offset, use
\code{\link[=zoned_time_parse_complete]{zoned_time_parse_complete()}}. If they contain a time zone abbreviation, use
\code{\link[=zoned_time_parse_abbrev]{zoned_time_parse_abbrev()}}.

If your date-time strings don't contain an offset from UTC and you aren't
sure if they should be treated as UTC or not, you might consider using
\code{\link[=naive_time_parse]{naive_time_parse()}}, since the resulting naive-time doesn't come with an
assumption of a UTC time zone.
}
\section{Full Precision Parsing}{


It is highly recommended to parse all of the information in the date-time
string into a type at least as precise as the string. For example, if your
string has fractional seconds, but you only require seconds, specify a
sub-second \code{precision}, then round to seconds manually using whatever
convention is appropriate for your use case. Parsing such a string directly
into a second precision result is ambiguous and undefined, and is unlikely to
work as you might expect.
}

\examples{
sys_time_parse("2020-01-01T05:06:07")

# Day precision
sys_time_parse("2020-01-01", precision = "day")

# Nanosecond precision, but using a day based format
sys_time_parse("2020-01-01", format = "\%Y-\%m-\%d", precision = "nanosecond")

# Multiple format strings are allowed for heterogeneous times
sys_time_parse(
  c("2019-01-01", "2019/1/1"),
  format = c("\%Y/\%m/\%d", "\%Y-\%m-\%d"),
  precision = "day"
)

# The `\%z` command shifts the date-time by subtracting the UTC offset so
# that the returned sys-time can be interpreted as UTC
sys_time_parse(
  "2020-01-01 02:00:00 -0400",
  format = "\%Y-\%m-\%d \%H:\%M:\%S \%z"
)

# Remember that the `\%Z` command is ignored entirely!
sys_time_parse("2020-01-01 America/New_York", format = "\%Y-\%m-\%d \%Z")

# ---------------------------------------------------------------------------
# RFC 3339

# Typical UTC format
x <- "2019-01-01T00:01:02Z"
sys_time_parse_RFC_3339(x)

# With a UTC offset containing a `:`
x <- "2019-01-01T00:01:02+02:30"
sys_time_parse_RFC_3339(x, offset = "\%Ez")

# With a space between the date and time and no `:` in the offset
x <- "2019-01-01 00:01:02+0230"
sys_time_parse_RFC_3339(x, separator = " ", offset = "\%z")
}