1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106


title: Detecting all neighbors within range
author:
 name: Aaron Lun
affiliation: Cancer Research UK Cambridge Institute, Cambridge, United Kingdom
date: "Revised: 28 September 2018"
output:
BiocStyle::html_document:
toc_float: true
package: BiocNeighbors
vignette: >
%\VignetteIndexEntry{3. Detecting neighbors within range}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF8}
bibliography: ref.bib

```{r, echo=FALSE, results="hide", message=FALSE}
require(knitr)
opts_chunk$set(error=FALSE, message=FALSE, warning=FALSE)
library(BiocNeighbors)
```
# Identifying all neighbors within range
Another application of the KMKNN or VP tree algorithms is to identify all neighboring points within a certain distance^[The default here is Euclidean, but again, we can set `distance="Manhattan"` in the `BNPARAM` object if so desired.] of the current point.
We first mock up some data:
```{r}
nobs < 10000
ndim < 20
data < matrix(runif(nobs*ndim), ncol=ndim)
```
We apply the `findNeighbors()` function to `data`:
```{r}
fout < findNeighbors(data, threshold=1)
head(fout$index)
head(fout$distance)
```
Each entry of the `index` list corresponds to a point in `data` and contains the row indices in `data` that are within `threshold`.
For example, the 3rd point in `data` has the following neighbors:
```{r}
fout$index[[3]]
```
... with the following distances to those neighbors:
```{r}
fout$distance[[3]]
```
Note that, for this function, the reported neighbors are _not_ sorted by distance.
The order of the output is completely arbitrary and will vary depending on the random seed.
However, the identity of the neighbors is fully deterministic.
# Querying another data set for neighbors
The `queryNeighbors()` function is also provided for identifying all points within a certain distance of a query point.
Given a query data set:
```{r}
nquery < 1000
ndim < 20
query < matrix(runif(nquery*ndim), ncol=ndim)
```
... we apply the `queryNeighbors()` function:
```{r}
qout < queryNeighbors(data, query, threshold=1)
length(qout$index)
```
... where each entry of `qout$index` corresponds to a row of `query` and contains its neighbors in `data`.
Again, the order of the output is arbitrary but the identity of the neighbors is deterministic.
# Further options
Most of the options described for `findKNN()` are also applicable here.
For example:
 `subset` to identify neighbors for a subset of points.
 `get.distance` to avoid retrieving distances when unnecessary.
 `BPPARAM` to parallelize the calculations across multiple workers.
 `raw.index` to return the raw indices from a precomputed index.
Note that the argument for a precomputed index is `precomputed`:
```{r}
pre < buildIndex(data, BNPARAM=KmknnParam())
fout.pre < findNeighbors(BNINDEX=pre, threshold=1)
qout.pre < queryNeighbors(BNINDEX=pre, query=query, threshold=1)
```
Users are referred to the documentation of each function for specific details.
# Session information
```{r}
sessionInfo()
```
