File: basicTextGatherer.Rd

package info (click to toggle)
r-cran-rcurl 1.95-4.8-2
  • links: PTS, VCS
  • area: main
  • in suites: stretch
  • size: 4,140 kB
  • ctags: 515
  • sloc: ansic: 3,135; xml: 1,734; asm: 993; sh: 12; makefile: 2
file content (184 lines) | stat: -rw-r--r-- 7,146 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
\name{basicTextGatherer}
\alias{basicTextGatherer}
\alias{multiTextGatherer}
\alias{debugGatherer}
\title{Cumulate text across callbacks (from an HTTP response)}
\description{
  These functions create callback functions that can be used
  to with the libcurl engine  when it passes information to us
  when it is available as part of the HTTP response.

  \code{basicTextGatherer} is a generator function that returns a closure which is
  used to cumulate text provided in callbacks from the libcurl
  engine when it reads the response from an HTTP request.

  \code{debugGatherer} can be used with the \code{debugfunction}
  libcurl option in a call and the associated \code{update}
  function is called whenever libcurl has information
  about the header, data and general messages about the
  request.

  These functions return a list of functions.
  Each time one calls \code{basicTextGatherer} or
  \code{debugGatherer}, one gets a new, separate
  collection of functions.  However, each
  collection of functions (or instance) shares
  the variables across the functions and across calls.
  This allows them to store data persistently across
  the calls without using a global variable.
  In this way, we can have multiple instances of the collection
  of functions, with each instance updating its own local state
  and not interfering with those of the others.
  
  We use an S3 class named \code{RCurlCallbackFunction} to indicate
  that the collection of funcions can be used as a callback.
  The \code{update} function is the one that is actually used
  as the callback function in the CURL option.
  The \code{value} function can be invoked to get the current
  state that has been accumulated by the
  \code{update} function.  This is typically used
  when the request is complete.
  One can reuse the same collection of functions across
  different requests. The information will be cumulated.
  Sometimes it is convenient to reuse the object but
  reset the state to its original empty value, as it had
  been created afresh. The \code{reset} function in the collection
  permits this.

  \code{multiTextGatherer} is used when we are downloading multiple
  URIs concurrently in a single libcurl operation.  This merely
  uses the tools of \code{basicTextGatherer} applied to each of
  several URIs. See \code{\link{getURIAsynchronous}}.
}
\usage{
basicTextGatherer(txt = character(), max = NA, value = NULL,
                    .mapUnicode = TRUE)
multiTextGatherer(uris, binary = rep(NA, length(uris)))
debugGatherer()
}
%- maybe also 'usage' for other objects documented here.
\arguments{
  \item{txt}{an initial character vector to start things.
    We allow this to be specified so that one can initialize
    the content. 
  }
  \item{max}{if specified as an integer this controls  the total number
    of characters that will be read.  If more are read, the function
    tells libcurl to stop!}
  \item{uris}{for \code{multiTextGatherer}, this is either the number
    or the names of the uris being downloaded and for which we
    need a separate writer function.
  }
  \item{value}{if specified, a function that is called when retrieving
    the text usually after the completion of the request and the
    processing of the response. This function can be used to convert the
    result into a different format, e.g. parse an XML document,
    read values from table in the text.}
  \item{.mapUnicode}{a logical value that controls whether the resulting
  text is processed to map components of the form \\uxxxx to their
  appropriate Unicode representation.}
 \item{binary}{a logical vector that indicates which URIs yield binary content}
}
\details{
  This is called when the libcurl engine finds sufficient
  data on the stream from which it is reading the response.
  It cumulates these bytes and hands them to a C routine in
  this package which calls the actual gathering function (or a suitable
  replacement) returned as the \code{update} component from this function.
  
}
\value{
  Both the \code{basicTextGatherer} and \code{debugGatherer}
  functions return an object of class
  \code{RCurlCallbackFunction}.
  \code{basicTextGatherer} extends this with the class
  \code{RCurlTextHandler}
  and 
  \code{debugGatherer} extends this with the class
  \code{RCurlDebugHandler}.
  Each of these has the same basic structure,
  being a list of 3 functions.
  \item{update}{the function that is called with the text from the
    callback routine and which processes this text by accumulating it
    into a vector}
  \item{value}{a function that returns the text cumulated across the
    callbacks. This takes an argument \code{collapse} (and additional ones)
    that are handed to \code{\link[base]{paste}}.
    If the value of  \code{collapse} is given as \code{NULL},
    the vector of elements containing the different text for each
    callback is returned. This is convenient when debugging or if one
    knows something about the nature of the callbacks, e.g. the regular
    size that causes iit to identify records in a natural way.
  }
  \item{reset}{a function that resets the internal state to its
    original, empty value. This can be used to reuse the same object
    across requests but to avoid cumulating new input with the material from previous requests.}


  \code{multiTextGatherer} returns a list with an element corresponding
  to each URI. Each element is an object obtained by calling
  \code{basicTextGatherer}, i.e. a collection of 3 functions with
  shared state.
   
}
\references{Curl homepage \url{http://curl.haxx.se}}
\author{Duncan Temple Lang <duncan@wald.ucdavis.edu>}


\seealso{
  \code{\link{getURL}}
  \code{\link{dynCurlReader}}  
}
\examples{
if(url.exists("http://www.omegahat.net/RCurl/index.html")) {
  txt = getURL("http://www.omegahat.net/RCurl/index.html", write = basicTextGatherer())

  h = basicTextGatherer()
  txt = getURL("http://www.omegahat.net/RCurl/index.html", write = h$update)
    # Cumulate across pages.
  txt = getURL("http://www.omegahat.net/index.html", write = h$update)


  headers = basicTextGatherer()
  txt = getURL("http://www.omegahat.net/RCurl/index.html",
               header = TRUE, headerfunction = headers$update)

     # Now read the headers.
  headers$value()
  headers$reset()


    # Debugging callback
  d = debugGatherer()
  x = getURL("http://www.omegahat.net/RCurl/index.html", debugfunction = d$update, verbose = TRUE)
  names(d$value())
  d$value()[["headerIn"]]


  uris = c("http://www.omegahat.net/RCurl/index.html",
           "http://www.omegahat.net/RCurl/philosophy.html")
  g = multiTextGatherer(uris)
  txt = getURIAsynchronous(uris,  write = g)
  names(txt)
  nchar(txt)

   # Now don't use names for the gatherer elements.
  g = multiTextGatherer(length(uris))
  txt = getURIAsynchronous(uris,  write = g)
  names(txt)
  nchar(txt)
}


\dontrun{
 Sys.setlocale(,"en_US.latin1")
 Sys.setlocale(,"en_US.UTF-8")
 uris = c("http://www.omegahat.net/RCurl/index.html",
          "http://www.omegahat.net/RCurl/philosophy.html")
 g = multiTextGatherer(uris)
 txt = getURIAsynchronous(uris,  write = g)
}
}
\keyword{IO}