File: encoder.Rd

package info (click to toggle)
r-cran-urltools 1.7.2%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: buster
  • size: 512 kB
  • sloc: cpp: 1,234; ansic: 303; sh: 13; makefile: 2
file content (65 lines) | stat: -rw-r--r-- 2,685 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/RcppExports.R
\name{url_decode}
\alias{url_decode}
\alias{url_encode}
\title{Encode or decode a URI}
\usage{
url_decode(urls)

url_encode(urls)
}
\arguments{
\item{urls}{a vector of URLs to decode or encode.}
}
\value{
a character vector containing the encoded (or decoded) versions of "urls".
}
\description{
encodes or decodes a URI/URL
}
\details{
URL encoding and decoding is an essential prerequisite to proper web interaction
and data analysis around things like server-side logs. The
\href{http://tools.ietf.org/html/rfc3986}{relevant IETF RfC} mandates the percentage-encoding
of non-Latin characters, including things like slashes, unless those are reserved.

Base R provides \code{\link{URLdecode}} and \code{\link{URLencode}}, which handle
URL encoding - in theory. In practise, they have a set of substantial problems
that the urltools implementation solves::

\itemize{
\item{No vectorisation: }{Both base R functions operate on single URLs, not vectors of URLs.
      This means that, when confronted with a vector of URLs that need encoding or
      decoding, your only option is to loop from within R. This can be incredibly
      computationally costly with large datasets. url_encode and url_decode are
      implemented in C++ and entirely vectorised, allowing for a substantial
      performance improvement.}
\item{No scheme recognition: }{encoding the slashes in, say, http://, is a good way
      of making sure your URL no longer works. Because of this, the only thing
      you can encode in URLencode (unless you refuse to encode reserved characters)
      is a partial URL, lacking the initial scheme, which requires additional operations
      to set up and increases the complexity of encoding or decoding. url_encode
      detects the protocol and silently splits it off, leaving it unencoded to ensure
      that the resulting URL is valid.}
\item{ASCII NULs: }{Server side data can get very messy and sometimes include out-of-range
      characters. Unfortunately, URLdecode's response to these characters is to convert
      them to NULs, which R can't handle, at which point your URLdecode call breaks.
      \code{url_decode} simply ignores them.}
}
}
\examples{

url_decode("https://en.wikipedia.org/wiki/File:Vice_City_Public_Radio_\%28logo\%29.jpg")
url_encode("https://en.wikipedia.org/wiki/File:Vice_City_Public_Radio_(logo).jpg")

\dontrun{
#A demonstrator of the contrasting behaviours around out-of-range characters
URLdecode("\%gIL")
url_decode("\%gIL")
}
}
\seealso{
\code{\link{puny_decode}} and \code{\link{puny_encode}}, for punycode decoding
and encoding.
}