File: violin_split.Rmd

package info (click to toggle)
r-cran-vioplot 0.4.0-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 8,916 kB
  • sloc: sh: 13; makefile: 10
file content (143 lines) | stat: -rw-r--r-- 6,115 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
---
title: "Split Violin Plots"
author: "Tom Kelly"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
fig_width: 6 
fig_height: 3 
fig_align: 'center'
fig_keep: 'last'
vignette: >
  %\VignetteIndexEntry{vioplot: Split Violin Plots}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

##Violin Plots

Therefore violin plots are a powerful tool to assist researchers to visualise data, particularly in the quality checking and exploratory parts of an analysis. Violin plots have many benefits:

- Greater flexbility for plotting variation than boxplots
- More familiarity to boxplot users than density plots
- Easier to directly compare data types than existing plots

As shown below for the `iris` dataset, violin plots show distribution information that the boxplot is unable to.

###General Set up


```{r, fig.align = 'center', fig.height = 3, fig.width = 6, fig.keep = 'last'}
library("vioplot")
```
We set up the data with two categories (Sepal Width) as follows:

```{r, message=FALSE}
data(iris)
summary(iris$Sepal.Width)
table(iris$Sepal.Width > mean(iris$Sepal.Width))
iris_large <- iris[iris$Sepal.Width > mean(iris$Sepal.Width), ]
iris_small <- iris[iris$Sepal.Width <= mean(iris$Sepal.Width), ]
```

###Boxplots

First we plot Sepal Length on its own:

```{r, fig.align = 'center', fig.height = 3, fig.width = 6, fig.keep = 'last'}
boxplot(Sepal.Length~Species, data=iris, col="grey")
```


An indirect comparison can be achieved with par:

```{r,  fig.align = 'center', fig.height = 6, fig.width = 6, fig.keep = 'last'}
{
  par(mfrow=c(2,1))
boxplot(Sepal.Length~Species, data=iris_small, col = "lightblue")
boxplot(Sepal.Length~Species, data=iris_large, col = "palevioletred")
par(mfrow=c(1,1))
}
```

### Violin Plots

First we plot Sepal Length on its own:

```{r, fig.align = 'center', fig.height = 3, fig.width = 6, fig.keep = 'last'}
vioplot(Sepal.Length~Species, data=iris)
```


An indirect comparison can be achieved with par:

```{r,  fig.align = 'center', fig.height = 6, fig.width = 6, fig.keep = 'last'}
{
  par(mfrow=c(2,1))
vioplot(Sepal.Length~Species, data=iris_small, col = "lightblue", plotCentre = "line")
vioplot(Sepal.Length~Species, data=iris_large, col = "palevioletred", plotCentre = "line")
par(mfrow=c(1,1))
}
```

### Split Violin Plots

A more direct comparison can be made with the `side` argument and `add = TRUE` on the second plot:

```{r, fig.align = 'center', fig.height = 3, fig.width = 6, fig.keep = 'last'}
vioplot(Sepal.Length~Species, data=iris_large, col = "palevioletred", plotCentre = "line", side = "right")
vioplot(Sepal.Length~Species, data=iris_small, col = "lightblue", plotCentre = "line", side = "left", add = T)
title(xlab = "Species", ylab = "Sepal Length")
legend("topleft", fill = c("lightblue", "palevioletred"), legend = c("small", "large"), title = "Sepal Width")
```

### median

The line median option is more suitable for side by side comparisons but the point option is still available also:


```{r, fig.align = 'center', fig.height = 3, fig.width = 6, fig.keep = 'last'}
vioplot(Sepal.Length~Species, data=iris_large, col = "palevioletred", plotCentre = "point", side = "right", pchMed = 21, colMed = "palevioletred4", colMed2 = "palevioletred2")
vioplot(Sepal.Length~Species, data=iris_small, col = "lightblue", plotCentre = "point", side = "left", pchMed = 21, colMed = "lightblue4", colMed2 = "lightblue2", add = T)
title(xlab = "Species", ylab = "Sepal Length")
legend("topleft", fill = c("lightblue", "palevioletred"), legend = c("small", "large"), title = "Sepal Width")
```

It may be necessary to include a `points` command to fix the median being overwritten by the following plots:

```{r, fig.align = 'center', fig.height = 3, fig.width = 6, fig.keep = 'last'}
vioplot(Sepal.Length~Species, data=iris_large, col = "palevioletred", plotCentre = "point", side = "right", pchMed = 21, colMed = "palevioletred4", colMed2 = "palevioletred2")
vioplot(Sepal.Length~Species, data=iris_small, col = "lightblue", plotCentre = "point", side = "left", pchMed = 21, colMed = "lightblue4", colMed2 = "lightblue2", add = T)
points(1:length(levels(iris$Species)), as.numeric(sapply(levels(iris$Species), function(species) median(iris_large[grep(species, iris_large$Species),]$Sepal.Length))), pch = 21, col = "palevioletred4", bg = "palevioletred2")
title(xlab = "Species", ylab = "Sepal Length")
legend("topleft", fill = c("lightblue", "palevioletred"), legend = c("small", "large"), title = "Sepal Width")
```

Similarly points could be added where a line has been used previously:


```{r, fig.align = 'center', fig.height = 3, fig.width = 6, fig.keep = 'last'}
vioplot(Sepal.Length~Species, data=iris_large, col = "palevioletred", plotCentre = "line", side = "right", pchMed = 21, colMed = "palevioletred4", colMed2 = "palevioletred2")
vioplot(Sepal.Length~Species, data=iris_small, col = "lightblue", plotCentre = "line", side = "left", pchMed = 21, colMed = "lightblue4", colMed2 = "lightblue2", add = T)
points(1:length(levels(iris$Species)), as.numeric(sapply(levels(iris$Species), function(species) median(iris_large[grep(species, iris_large$Species),]$Sepal.Length))), pch = 21, col = "palevioletred4", bg = "palevioletred2")
points(1:length(levels(iris$Species)), as.numeric(sapply(levels(iris$Species), function(species) median(iris_small[grep(species, iris_small$Species),]$Sepal.Length))), pch = 21, col = "lightblue4", bg = "lightblue2")
title(xlab = "Species", ylab = "Sepal Length")
legend("topleft", fill = c("lightblue", "palevioletred"), legend = c("small", "large"), title = "Sepal Width")
```

Here it is aesthetically pleasing and intuitive to interpret categorical differences in mean and variation in a continuous variable.


#### Sources
These extensions to `vioplot` here are based on those provided here:

* https://gist.github.com/mbjoseph/5852613

These have previously been discussed on the following sites:

* https://mbjoseph.github.io/posts/2018-12-23-split-violin-plots/

* http://tagteam.harvard.edu/hub_feeds/1981/feed_items/209875

* https://www.r-bloggers.com/split-violin-plots/