File: corrplot-intro.Rmd

package info (click to toggle)
r-cran-corrplot 0.84-3
  • links: PTS, VCS
  • area: main
  • in suites: bullseye, sid
  • size: 7,296 kB
  • sloc: sh: 13; makefile: 5
file content (349 lines) | stat: -rw-r--r-- 12,311 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
<!--
%\VignetteEngine{knitr::docco_linear}
%\VignetteIndexEntry{An Introduction to the corrplot package}
-->

An Introduction to **corrplot** Package
=======================================

```{r setup, include=FALSE}
set.seed(0) # we need reproducible results
knitr::opts_chunk$set(
  out.extra = 'style="display:block; margin: auto"',
  fig.align = "center",
  fig.path = "webimg/",
  dev = "png")
```

Introduction
------------
The **corrplot** package is a graphical display of a correlation matrix, 
confidence interval. It also contains some algorithms to do matrix reordering. 
In addition, corrplot is good at details, including choosing color, text labels,
color labels, layout, etc.


Visualization methods
----------------------------
There are seven visualization methods (parameter `method`) in **corrplot** package, named `"circle"`, `"square"`, `"ellipse"`, `"number"`, `"shade"`, `"color"`, `"pie"`.

> Positive correlations are displayed in blue and negative correlations in red
> color. Color intensity and the size of the circle are proportional to the
> correlation coefficients.

```{r methods}
library(corrplot)
M <- cor(mtcars)
corrplot(M, method = "circle")
corrplot(M, method = "square")
corrplot(M, method = "ellipse")
corrplot(M, method = "number") # Display the correlation coefficient
corrplot(M, method = "shade")
corrplot(M, method = "color")
corrplot(M, method = "pie")
```


Layout
-----------------------------
There are three layout types (parameter `type`):
- `"full"` (default) : display full **correlation matrix**
- `"upper"` : display upper triangular of the **correlation matrix**
- `"lower"` : display lower triangular of the **correlation matrix**

```{r layout}
corrplot(M, type = "upper")
corrplot(M, type = "upper")
```

`corrplot.mixed()` is a wrapped function for mixed visualization style.
```{r mixed}
corrplot.mixed(M)
corrplot.mixed(M, lower.col = "black", number.cex = .7)
corrplot.mixed(M, lower = "ellipse", upper = "circle")
corrplot.mixed(M, lower = "square", upper = "circle", tl.col = "black")
```


Reorder a correlation matrix
----------------------------
The correlation matrix can be reordered according to the correlation
coefficient. This is important to identify the hidden structure and pattern in
the matrix. There are four methods in corrplot (parameter `order`), named 
`"AOE"`, `"FPC"`, `"hclust"`, `"alphabet"`.  More algorithms can be found in 
[seriation](cran.r-project.org/package=seriation) package.

You can also reorder the matrix "manually" via function `corrMatOrder()`.

  - `"AOE"` is for the angular order of the eigenvectors. It is calculated from 
    the order of the angles $a_i$,
    
    $$
    a_i = 
    \begin{cases}
    			\tan (e_{i2}/e_{i1}), & \text{if $e_{i1}>0$;}
    			 \newline
    			\tan (e_{i2}/e_{i1}) + \pi, & \text{otherwise.}
    \end{cases}			
    $$
    
    where $e_1$ and $e_2$ are the largest two eigenvalues of the correlation 
    matrix.
    See [Michael Friendly (2002)](http://www.datavis.ca/papers/corrgram.pdf)
    for details.
  
  - `"FPC"` for the first principal component order.
  
  - `"hclust"` for hierarchical clustering order, and `"hclust.method"` for the
    agglomeration method to be used. `"hclust.method"` should be one of
    `"ward"`, `"single"`, `"complete"`, `"average"`, `"mcquitty"`, `"median"` or
    `"centroid"`.

  - `"alphabet"` for alphabetical order.

```{r order}
corrplot(M, order = "AOE")
corrplot(M, order = "hclust")
corrplot(M, order = "FPC")
corrplot(M, order = "alphabet")
```

If using `"hclust"`, `corrplot()` can  draw rectangles around the chart of corrrlation matrix based on the results of  hierarchical clustering.

```{r rectangles}
corrplot(M, order = "hclust", addrect = 2)
corrplot(M, order = "hclust", addrect = 3)
```
```{r hclust-lightblue}
# Change background color to lightblue
corrplot(M, type = "upper", order = "hclust",
         col = c("black", "white"), bg = "lightblue")
```


Using different color spectra
------------------------------
As shown in the above section, the color of the correlogram can be customized.
The function `colorRampPalette()` is very convenient for generating color
spectrum.

```{r color}
col1 <- colorRampPalette(c("#7F0000", "red", "#FF7F00", "yellow", "white",
                           "cyan", "#007FFF", "blue", "#00007F"))
col2 <- colorRampPalette(c("#67001F", "#B2182B", "#D6604D", "#F4A582",
                           "#FDDBC7", "#FFFFFF", "#D1E5F0", "#92C5DE",
                           "#4393C3", "#2166AC", "#053061"))
col3 <- colorRampPalette(c("red", "white", "blue"))	
col4 <- colorRampPalette(c("#7F0000", "red", "#FF7F00", "yellow", "#7FFF7F",
                           "cyan", "#007FFF", "blue", "#00007F"))
whiteblack <- c("white", "black")

## using these color spectra
corrplot(M, order = "hclust", addrect = 2, col = col1(100))
corrplot(M, order = "hclust", addrect = 2, col = col2(50))
corrplot(M, order = "hclust", addrect = 2, col = col3(20))
corrplot(M, order = "hclust", addrect = 2, col = col4(10))
corrplot(M, order = "hclust", addrect = 2, col = whiteblack, bg = "gold2")
```

You can also use the standard color palettes (package `grDevices`)
```{r hclust-stdcolors}
corrplot(M, order = "hclust", addrect = 2, col = heat.colors(100))
corrplot(M, order = "hclust", addrect = 2, col = terrain.colors(100))
corrplot(M, order = "hclust", addrect = 2, col = cm.colors(100))
corrplot(M, order = "hclust", addrect = 2, col = gray.colors(100))
```

Other option would be to use `RcolorBrewer` package.
```{r hclust-rcolorbrewer}
library(RColorBrewer)

corrplot(M, type = "upper", order = "hclust",
         col = brewer.pal(n = 8, name = "RdBu"))
corrplot(M, type = "upper", order = "hclust",
         col = brewer.pal(n = 8, name = "RdYlBu"))
corrplot(M, type = "upper", order = "hclust",
         col = brewer.pal(n = 8, name = "PuOr"))
```



Changing color and rotation of text labels and legend
-----------------------------------------------------
Parameter `cl.*` is for color legend, and `tl.*` if for text legend. For the
text label, `tl.col` (text label color) and `tl.srt` (text label string 
rotation) are used to change text colors and rotations.

Here are some examples.
```{r color-label}
## remove color legend and text legend 
corrplot(M, order = "AOE", cl.pos = "n", tl.pos = "n")  

## bottom  color legend, diagonal text legend, rotate text label
corrplot(M, order = "AOE", cl.pos = "b", tl.pos = "d", tl.srt = 60)

## a wider color legend with numbers right aligned
corrplot(M, order = "AOE", cl.ratio = 0.2, cl.align = "r")

## text labels rotated 45 degrees
corrplot(M, type = "lower", order = "hclust", tl.col = "black", tl.srt = 45)
```


Dealing with a non-correlation matrix
-------------------------------------
```{r non-corr}
corrplot(abs(M),order = "AOE", col = col3(200), cl.lim = c(0, 1))
## visualize a  matrix in [-100, 100]
ran <- round(matrix(runif(225, -100,100), 15))
corrplot(ran, is.corr = FALSE, method = "square")
## a beautiful color legend 
corrplot(ran, is.corr = FALSE, method = "ellipse", cl.lim = c(-100, 100))
```

If your matrix is rectangular, you can adjust the aspect ratio with the
`win.asp` parameter to make the matrix rendered as a square.
```{r non-corr-asp}
ran <- matrix(rnorm(70), ncol = 7)
corrplot(ran, is.corr = FALSE, win.asp = .7, method = "circle")
```

Dealing with missing (NA) values
--------------------------------
By default, **corrplot** renders NA values as `"?"` characters. Using `na.label`
parameter, it is possible to use a different value (max. two characters are
supported).

```{r NAs}
M2 <- M
diag(M2) = NA
corrplot(M2)
corrplot(M2, na.label = "o")
corrplot(M2, na.label = "NA")
```


Using "plotmath" expressions in labels
--------------------------------------
Since version `0.78`, it is possible to use
[plotmath](https://www.rdocumentation.org/packages/grDevices/topics/plotmath)
expression in variable names. To activate plotmath rendering, prefix your label
with one of the characters `":"`, `"="` or `"$"`.

```{r plotmath}
M2 <- M[1:5,1:5]
colnames(M2) <- c("alpha", "beta", ":alpha+beta", ":a[0]", "=a[beta]")
rownames(M2) <- c("alpha", "beta", NA, "$a[0]", "$ a[beta]")
corrplot(M2)
```


Combining correlogram with the significance test
------------------------------------------------
```{r test}
res1 <- cor.mtest(mtcars, conf.level = .95)
res2 <- cor.mtest(mtcars, conf.level = .99)

## specialized the insignificant value according to the significant level
corrplot(M, p.mat = res1$p, sig.level = .2)
corrplot(M, p.mat = res1$p, sig.level = .05)
corrplot(M, p.mat = res1$p, sig.level = .01)

## leave blank on no significant coefficient
corrplot(M, p.mat = res1$p, insig = "blank")

## add p-values on no significant coefficient
corrplot(M, p.mat = res1$p, insig = "p-value")

## add all p-values
corrplot(M, p.mat = res1$p, insig = "p-value", sig.level = -1)

## add cross on no significant coefficient 
corrplot(M, p.mat = res1$p, order = "hclust", insig = "pch", addrect = 3)
```


Visualize confidence interval
-----------------------------
```{r ci}
corrplot(M, low = res1$lowCI, upp = res1$uppCI, order = "hclust",
         rect.col = "navy", plotC = "rect", cl.pos = "n")
corrplot(M, p.mat = res1$p, low = res1$lowCI, upp = res1$uppCI,
         order = "hclust", pch.col = "red", sig.level = 0.01,
         addrect = 3, rect.col = "navy", plotC = "rect", cl.pos = "n")
```

```{r ci_with_label}
res1 <- cor.mtest(mtcars, conf.level = .95)

corrplot(M, p.mat = res1$p, insig = "label_sig",
         sig.level = c(.001, .01, .05), pch.cex = .9, pch.col = "white")
corrplot(M, p.mat = res1$p, method = "color",
         insig = "label_sig", pch.col = "white")
corrplot(M, p.mat = res1$p, method = "color", type = "upper",
         sig.level = c(.001, .01, .05), pch.cex = .9,
         insig = "label_sig", pch.col = "white", order = "AOE")
corrplot(M, p.mat = res1$p, insig = "label_sig", pch.col = "white",
         pch = "p<.05", pch.cex = .5, order = "AOE")
```

Customize the correlogram
-------------------------
```{r pmat}
# matrix of the p-value of the correlation
p.mat <- cor.mtest(mtcars)$p
head(p.mat[, 1:5])

# Specialized the insignificant value according to the significant level
corrplot(M, type = "upper", order = "hclust", 
         p.mat = p.mat, sig.level = 0.01)

# Leave blank on no significant coefficient
corrplot(M, type = "upper", order = "hclust", 
         p.mat = p.mat, sig.level = 0.01, insig = "blank")
```
In the above figure, correlations with **p-value > 0.01** are considered as
insignificant. In this case the correlation coefficient values are leaved blank
or crosses are added.

```{r customized}
col <- colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))
corrplot(M, method = "color", col = col(200),
         type = "upper", order = "hclust", number.cex = .7,
         addCoef.col = "black", # Add coefficient of correlation
         tl.col = "black", tl.srt = 90, # Text label color and rotation
         # Combine with significance
         p.mat = p.mat, sig.level = 0.01, insig = "blank", 
         # hide correlation coefficient on the principal diagonal
         diag = FALSE)
```

**Note:** Some of the plots were taken from [this blog].

[this blog]: http://www.sthda.com/english/wiki/visualize-correlation-matrix-using-correlogram

Explore Large Feature Matrices
------------------------------
```{r large_matrix}

# generating large feature matrix (cols=features, rows=samples)
num_features <- 60 # how many features
num_samples <- 300 # how many samples
DATASET <- matrix(runif(num_features * num_samples),
               nrow = num_samples, ncol = num_features)

# setting some dummy names for the features e.g. f23
colnames(DATASET) <- paste0("f", 1:ncol(DATASET))

# let's make 30% of all features to be correlated with feature "f1"
num_feat_corr <- num_features * .3
idx_correlated_features <- as.integer(seq(from = 1,
                                          to = num_features,
                                          length.out = num_feat_corr))[-1]
for (i in idx_correlated_features) {
  DATASET[,i] <- DATASET[,1] + runif(num_samples) # adding some noise
}

corrplot(cor(DATASET), diag = FALSE, order = "FPC",
         tl.pos = "td", tl.cex = 0.5, method = "color", type = "upper")
```