File: vig2_categorical.Rmd

package info (click to toggle)
r-cran-exactextractr 0.10.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 2,152 kB
  • sloc: cpp: 7,238; sh: 14; makefile: 2
file content (246 lines) | stat: -rw-r--r-- 8,885 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
---
title: "2. Summarizing categorical data"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{2. Summarizing categorical data}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7
)
knitr::opts_knit$set(
  global.par = TRUE
)
```

```{r, include = FALSE}
par(mar = c(1, 1, 1, 1))
```

## Introduction

This vignette builds upon the sample data for São Miguel (introduced in 
[the previous vignette](vig1_population.html)) to demonstrate the use of `exactextractr`
with categorical land cover data. A sample of the
[CORINE 2018](https://land.copernicus.eu/pan-european/corine-land-cover/clc2018)
raster dataset is included in `exactextractr`.

As in the previous vignette, the following packages are used:

```{r setup, message = FALSE}
library(exactextractr)
library(dplyr)
library(sf)
library(raster)
```

## Loading the sample data

First, we load the CORINE land cover data and the _concelho_ boundaries.

```{r}
clc <- raster(system.file('sao_miguel/clc2018_v2020_20u1.tif',
                     package = 'exactextractr'))
concelhos <- st_read(system.file('sao_miguel/concelhos.gpkg',
                                 package = 'exactextractr'),
                     quiet = TRUE)
```

```{r, echo = FALSE}
corine_palette <- c(
  "#e6004d", "#ff0000", "#cc4df2", "#cc0000", "#e6cccc", "#e6cce6", "#a600cc",
  "#a64d00", "#ff4dff", "#ffa6ff", "#ffe6ff", "#ffffa8", "#ffff00", "#e6e600",
  "#e68000", "#f2a64d", "#e6a600", "#e6e64d", "#ffe6a6", "#ffe64d", "#e6cc4d",
  "#f2cca6", "#80ff00", "#00a600", "#4dff00", "#ccf24d", "#a6ff80", "#a6e64d",
  "#a6f200", "#e6e6e6", "#cccccc", "#ccffcc", "#000000", "#a6e6cc", "#a6a6ff",
  "#4d4dff", "#ccccff", "#e6e6ff", "#a6a6e6", "#00ccf2", "#80f2e6", "#00ffa6",
  "#a6ffe6", "#e6f2ff", "#ffffff")

plot(clc, col = corine_palette,
     axes = FALSE, legend = FALSE)
plot(st_geometry(concelhos), add = TRUE)
```

The land cover class descriptions are provided in a separate DBF file. We read
this in to a data frame, then use `levels()` to associate the class descriptions
with the raster.

```{r}
clc_classes <- foreign::read.dbf(system.file('sao_miguel/clc2018_v2020_20u1.tif.vat.dbf',
                                             package = 'exactextractr'),
                                 as.is = TRUE) %>%
  dplyr::select(value = Value,
                landcov = LABEL3)

levels(clc) <- list(data.frame(ID = clc_classes$value,
                               landcov = clc_classes$landcov))
```

This association provides us with a way to look up the description for a given
ID. Alternatively, we can relate the values using `merge` or a ``dplyr` join.

```{r}
factorValues(clc, c(2, 18, 24))
```

## Summarizing land cover classifications

One of the most basic questions we might ask is which land cover classification
is predominant in each _concelho_. We can do this with the built-in `mode`
summary operation. The `minority` and `variety` operations are also applicable
to categorical data and provide the least-common classification and number of
distinct classifications, respectively.

```{r landcov-mode}
landcov_mode <- exact_extract(clc, concelhos, 'mode', 
                              append_cols = 'name', progress = FALSE) %>%
  inner_join(clc_classes, by=c(mode = 'value'))
```

```{r landcov-mode-table, echo = FALSE}
landcov_mode %>%
  dplyr::select(-mode) %>%
  knitr::kable()
```

### Summary functions

While `mode` provides a handy way to see the most common land cover category, we
need to write a custom summary function if we want to see the frequency of
different land cover types in an area.

Summary functions are called once per feature from the input `sf` object. They
can return either:

 * a scalar, in which case the return value of `exact_extract` will be a vector
   whose entries correspond with the rows of the input `sf` object, or
    
 * a data frame, in which case `exact_extract` will return a rowwise combination
   of the data frames for each feature. If the data frame returned by the
   summary function will have than a single row, it is useful for some
   identifying information to be included in the returned data frame.

If we are going to perform typical data frame operations on the raster values
and coverage fractions, it can be more convenient for the summary function to
accept a single data frame argument, instead of separate arguments for the cell
values and coverage fractions. This behavior can be enabled with the
`summarize_df` argument.

Using this method, we can calculate the fraction of each _concelho_ that is
covered by each land cover category:

```{r landcov-fracs, message = FALSE}
landcov_fracs <- exact_extract(clc, concelhos, function(df) {
  df %>%
    mutate(frac_total = coverage_fraction / sum(coverage_fraction)) %>%
    group_by(name, value) %>%
    summarize(freq = sum(frac_total))
}, summarize_df = TRUE, include_cols = 'name', progress = FALSE)
```

Here we use the `include_cols` argument here to include the `name` column from
`concelhos` in the data frame passed to the summary function. (Although the
value of `name` will be the same for all rows in  `df`, we include `name` in the
`group_by` expression so that it is not removed by `summarize`.) Other similar
arguments include `include_xy` to get the cell center coordinates,
`include_area` to get the cell area, and `include_cell` to get the cell index
used by the `raster` package.

This provides us with a correspondence between each numeric land cover category
and its frequency in each _concelho_:

```{r}
head(landcov_fracs)
```

Joining this table to `clc_classes`, we can associate the descriptions with the
numeric types and view the three most common land cover classes in each
_concelho_:

```{r landcov-fracs-table}
landcov_fracs %>%
  inner_join(clc_classes, by = 'value') %>%
  group_by(name) %>%
  arrange(desc(freq)) %>%
  slice_head(n = 3) %>%
  mutate(freq = sprintf('%0.1f%%', 100*freq)) %>%
  knitr::kable()
```
Similarly, we can find the top land covers by area:

```{r landcov-areas, message = FALSE}
landcov_areas <- exact_extract(clc, concelhos, function(df) {
  df %>%
    group_by(name, value) %>%
    summarize(area_km2 = sum(coverage_area) / 1e6)
}, summarize_df = TRUE, coverage_area = TRUE, include_cols = 'name', progress = FALSE)
```

```{r landcov-areas-table, echo = FALSE}
landcov_areas %>%
  inner_join(clc_classes, by = 'value') %>%
  dplyr::select(-value) %>%
  group_by(name) %>%
  arrange(desc(area_km2)) %>%
  slice_head(n = 3) %>%
  knitr::kable()
```

## Summarizing population land cover

One extension of the analysis above is to see which land covers are associated
with human population in a given _concelho_. Is the population primary urban or
rural?

As described in the previous vignette, the population density raster provides
the most robust results in the presence of partially-covered pixels.

```{r load-pop-density}
pop_density <- raster(system.file('sao_miguel/gpw_v411_2020_density_2020.tif',
                                  package = 'exactextractr'))
```

We are able to perform this analysis because the CORINE sample distributed with
`exactextractr` has been reprojected from its native Lambert Equal Area
projection into geographic coordinates consistent with GPW. Otherwise, working
with multiple rasters in different projections requires transformation to a
common grid using tools such as `raster::projectRaster` or the `gdalwarp`
command-line utility.

Very little about the call to `exact_extract` requires changing to incorporate
population. We set `weights = pop_density` and, since we are using the
`summarize_df` option, a column called `weight` will be added to the data frame
passed to the summary function. We also set `coverage_area = TRUE` so that we
can multiply the density by the covered pixel area to get a population count.

```{r landcov-pop-areas, message = FALSE, results = FALSE}
landcov_pop_areas <- exact_extract(clc, concelhos, function(df) {
  df %>%
    group_by(name, value) %>%
    summarize(pop = sum(coverage_area * weight / 1e6)) %>%
    mutate(pop_frac = pop / sum(pop))
}, weights = pop_density, coverage_area = TRUE,
   summarize_df = TRUE, include_cols = 'name')
```

Looking at the highest-population land cover type in each _concelho_, we can
can see that the western/central _concelhos_ of Lagoa, Ponta Delgada, Ribeira
Grande, and Vila Franca do Campo have a more urban population than Nordeste or
Povoação to the east.

```{r landcov-pop-areas-table, echo = FALSE}
landcov_pop_areas %>%
  inner_join(clc_classes, by = 'value') %>%
  group_by(name) %>%
  arrange(desc(pop_frac)) %>%
  slice_head(n = 1) %>%
  dplyr::select(name, landcov, pop, pop_frac) %>%
  mutate(pop = round(pop),
         pop_frac = round(pop_frac, 3)) %>%
  knitr::kable()
```