File: bold.Rmd.og

package info (click to toggle)
r-cran-bold 1.1.0%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 12,224 kB
  • sloc: sh: 9; makefile: 5
file content (194 lines) | stat: -rw-r--r-- 4,655 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
---
title: "bold introduction"
author: "Scott Chamberlain"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{bold introduction}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r eval=TRUE, echo=FALSE, warning = FALSE, message = FALSE}
NOT_CRAN <- identical(tolower(Sys.getenv("NOT_CRAN")), "true")
knitr::opts_chunk$set(
  comment = "#>",
  warning = FALSE,
  message = FALSE,
  collapse = TRUE,
  purl = NOT_CRAN,
  eval = NOT_CRAN
)
```

`bold` is an R package to connect to BOLD Systems (https://www.boldsystems.org/) via their API. Functions in `bold` let you search for sequence data, specimen data, sequence + specimen data, and download raw trace files.

### bold info

+ BOLD home page: https://boldsystems.org/
+ BOLD API docs: https://v4.boldsystems.org/index.php/api_home

See also the taxize book for more options for taxonomic workflows with BOLD: https://taxize.dev/

### Using bold

**Install**

Install `bold` from CRAN


```{r eval=FALSE}
install.packages("bold")
```

Or install the development version from GitHub

```{r, eval=FALSE}
remotes::install_github("ropensci/bold")
```

Load the package

```{r}
library("bold")
```


### Search for taxonomic names via names

`bold_tax_name` searches for names with names.

```{r cache=TRUE}
bold_tax_name(name = 'Diplura')
```

```{r cache=TRUE}
bold_tax_name(name = c('Diplura', 'Osmia'))
```


### Search for taxonomic names via BOLD identifiers

`bold_tax_id` searches for names with BOLD identifiers.

```{r cache=TRUE}
bold_tax_id(id = 88899)
```

```{r cache=TRUE}
bold_tax_id(id = c(88899, 125295))
```


### Search for sequence data only

The BOLD sequence API gives back sequence data, with a bit of metadata.

The default is to get a list back

```{r cache=TRUE}
bold_seq(taxon = 'Coelioxys')[1:2]
```

You can optionally get back the `crul` response object

```{r cache=TRUE}
res <- bold_seq(taxon = 'Coelioxys', response = TRUE)
res$response_headers
```

You can do geographic searches

```{r cache=TRUE}
bold_seq(geo = "USA")
```

And you can search by researcher name

```{r cache=TRUE}
bold_seq(researchers = 'Thibaud Decaens')[[1]]
```

by taxon IDs

```{r cache=TRUE}
bold_seq(ids = c('ACRJP618-11', 'ACRJP619-11'))
```

by container (containers include project codes and dataset codes)

```{r cache=TRUE}
bold_seq(container = 'ACRJP')[[1]]
```

by bin (a bin is a _Barcode Index Number_)

```{r cache=TRUE}
bold_seq(bin = 'BOLD:AAA5125')[[1]]
```

And there are more ways to query, check out the docs for `?bold_seq`.


### Search for specimen data only

The BOLD specimen API doesn't give back sequences, only specimen data. By default you download `tsv` format data, which is given back to you as a `data.frame`

```{r cache=TRUE}
res <- bold_specimens(taxon = 'Osmia')
head(res[,1:8])
```

You can optionally get back the data in `XML` format

```{r eval=FALSE}
bold_specimens(taxon = 'Osmia', format = 'xml')
```

```{r eval=FALSE, results='asis'}
<?xml version="1.0" encoding="UTF-8"?>
<bold_records  xsi:noNamespaceSchemaLocation="http://www.boldsystems.org/schemas/BOLDPublic_record.xsd"  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <record>
    <record_id>1470124</record_id>
    <processid>BOM1525-10</processid>
    <bin_uri>BOLD:AAN3337</bin_uri>
    <specimen_identifiers>
      <sampleid>DHB 1011</sampleid>
      <catalognum>DHB 1011</catalognum>
      <fieldnum>DHB1011</fieldnum>
      <institution_storing>Marjorie Barrick Museum</institution_storing>
    </specimen_identifiers>
    <taxonomy>
```

You can choose to get the `crul` response object back if you'd rather work with the raw data returned from the BOLD API.

```{r}
res <- bold_specimens(taxon = 'Osmia', format = 'xml', response = TRUE)
res$url
res$status_code
res$response_headers
```

### Search for specimen plus sequence data

The specimen/sequence combined API gives back specimen and sequence data. Like the specimen API, this one gives by default `tsv` format data, which is given back to you as a `data.frame`. Here, we're setting `sepfasta=TRUE` so that the sequence data is given back as a list, and taken out of the `data.frame` returned so the `data.frame` is more manageable.

```{r cache=TRUE}
res <- bold_seqspec(taxon = 'Osmia', sepfasta = TRUE)
res$fasta[1:2]
```

Or you can index to a specific sequence like

```{r cache=TRUE}
res$fasta['GBAH0293-06']
```

### Get trace files

This function downloads files to your machine - it does not load them into your R session - but prints out where the files are for your information.

```{r eval=FALSE}
bold_trace(taxon = 'Osmia', quiet = TRUE)
```