File: VIM.Rmd

package info (click to toggle)
r-cran-vim 6.1.0%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 2,932 kB
  • sloc: cpp: 141; sh: 12; makefile: 2
file content (92 lines) | stat: -rw-r--r-- 2,932 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
---
title: "VIM"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{VIM}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

VIM introduces tools for visualization of missing and imputed values.
Forthermore, methods to impute missing values are featured.
This vignette will give a brief look at a common imputation scenario and
showcase how VIM can be used to both impute the data and also interpret
the results visually.

## Visualize missing values

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5,
  fig.align = "center"
)
```

```{r setup, message=FALSE, fig.height = 3}
library(VIM)
data(sleep)
a <- aggr(sleep, plot = FALSE)
plot(a, numbers = TRUE, prop = FALSE)
```

The left plot shows the amount of missings for each column in the dataset
`sleep` and the right plot shows how often each combination of missings occur.
For example, there are 9 rows wich contain a missing in both `NonD`
and `Dream`.

For simplicity, we will only look at the variables `Dream` and
`Sleep` for the remainer of this vignette. Bivariate datasets can be passed
to special functions that visualize the structure of missings such as
`marginplot()`.

```{r}
x <- sleep[, c("Dream", "Sleep")]
marginplot(x)
```

The __<font color="red">red</font>__ boxplot on the left shows the distrubution of all values of `Sleep`
where `Dream` contains a missing value. The __<font color="#87ceeb">blue</font>__ boxplot on the left shows
the distribution of the values of `Sleep` where `Dream` is observed.

## Impute missing values

In order to impute missing values, `VIM` offers a spectrum of imputation methods
like `kNN()` (k nearest neighbour), `hotdeck()` and so forth. Those functions
can be applied to a `data.frame` and return another `data.frame` where missings
are replaced by imputed values.

```{r}
x_imputed <- kNN(x)
```

To learn more about all implemented imputation methods, three vignettes are
available

- `vignette("donorImp")` explains the donor-based imputation methods `hotdeck()`
  and `kNN()`
- `vignette("modelImp")` gives insight into the model-based imputation methods
  `regressionImp()` and `matchImpute()`
- `vignette("irmi")` showcases the `irmi()` method.

## Visualize imputed values

The same functions that visualize missing values can also visualize the
imputed dataset.

```{r}
marginplot(x_imputed, delimiter = "_imp")
```

In this plot three differnt colors are used in the top-right.
These colors represent the structure of missings.

* __<font color="#8b5a00">brown</font>__ points represent values where `Dream` was missing initially
* __<font color="#ffa500">beige</font>__ points represent values where `Sleep` was missing initially
* __black__ points represent values where both `Dream` and `Sleep` were missing
  initially

The `kNN()` method seemingly preserves the correlation between `Dream` and
`Sleep`.