File: scrapingData.xml

package info (click to toggle)
r-cran-xml 3.98-1.5-1
  • links: PTS
  • area: main
  • in suites: stretch
  • size: 9,464 kB
  • ctags: 636
  • sloc: xml: 79,579; ansic: 6,518; asm: 644; sh: 16; makefile: 1
file content (85 lines) | stat: -rw-r--r-- 2,044 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
<article xmlns:r="http://www.r-project.org">
<title>Scraping Data from the Web with R</title>

<section>
<title>Scraping Data from the Web with R</title>
<para>
It is becoming more common to need/want to access data from Web sites
and this activity is likely to increase as services and data become
more Web-based.  We have anticipated this for almost a decade and have
developed the XML package (initial release in 2000) and the RCurl
package (initial release in 2004).  On top of these, we have the
SSOAP, XMLRPC and RHTMLForms packages.
</para>
<para>
R provides some facilities for accessing data over the web,
specifically making HTTP or FTP requests.  In many cases, these are
sufficient.  One can use <r:func>download.file</r:func> to make an
HTTP/FTP request and save the result to a file on disk. Then one can
read the contents locally.
</para>
<para>
<r:func>url</r:func> is a more  low-level, flexible mechanism
that allows one to make an HTTP request and read the result
as if it were a local connection.
</para>
<para>
While these two built-in facilities will suffice for many, many
situations (the majority at present), they will not work when
<ul>
<li>you need to use HTTPS, a secure HTTP request using SSL,</li>
<li>you need to POST a form request rather than using a simple GET operation in HTTP</li>
<li>you need to customize the request, e.g. to provide an authentication token</li>
</ul>


If you are dealing with a simple situation

</para>

<para>
</para>

<section>
<title>Software</title>

<dl>
  <dt>
  <li> <a href="RSXML">XML package</a></li>
  </dt>
  <dd>
  </dd>

  <dt>
  <li> <a href="RCurl">RCurl package</a></li>
  </dt>
  <dd>
  </dd>
  <dt>
  <li> <a href="SSOAP">SSOAP package</a></li>
  </dt>
  <dd>
  </dd>
  <dt>
  <li> <a href="XMLRPC">SSOAP package</a></li>
  </dt>
  <dd>
  </dd>
  <dt>
  <li> <a href="SSOAP">SSOAP package</a></li>
  </dt>
  <dd>      
  </dd>

  <dt>
  <li> <a href="Rcompression">Rcompression package</a></li>
  </dt>
  <dd>      
  </dd>
</dl>



</section>
</section>
</article>