File: Edges.Rmd

package info (click to toggle)
r-cran-ggraph 2.2.1%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 1,832 kB
  • sloc: cpp: 1,630; makefile: 2
file content (493 lines) | stat: -rw-r--r-- 18,896 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
---
title: "Edges"
author: "Thomas Lin Pedersen"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Edges}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include=FALSE}
set.seed(2022)
```

If the natural `ggplot2` equivalent to nodes is `geom_point()`, then surely the
equivalent to edges must be `geom_segment()`? Well, sort of, but there's a bit 
more to it than that.

![One does not simply draw a line between two nodes](edge_meme_wide.jpg)

While nodes are the sensible, mature, and predictably geoms, edges are the edgy 
(sorry), younger cousins that pushes the boundaries. To put it bluntly: 

> On the ggraph savannah you definitely want to be an edge!

## Meet the `geom_edge_*()` family
While the introduction might feel a bit over-the-top it is entirely true. An 
edge is an abstract concept denoting a relationship between two entities. A 
straight line is simply just one of many ways this relationship can be 
visualised. As we saw when [discussing nodes](Nodes.html)
sometimes it is not drawn at all but impied using containment or position 
(treemap, circle packing, and partition layouts), but more often it is shown
using a line of some sort. This use-case is handled by the large family of edge
geoms provided in `ggraph`. Some of the edges are general while others are 
dedicated to specific layouts. Let's creates some graphs for illustrative purposes
first:

```{r, message=FALSE}
library(ggraph)
library(tidygraph)
library(purrr)
library(rlang)

set_graph_style(plot_margin = margin(1,1,1,1))
hierarchy <- as_tbl_graph(hclust(dist(iris[, 1:4]))) |> 
  mutate(Class = map_bfs_back_chr(node_is_root(), .f = function(node, path, ...) {
    if (leaf[node]) {
      as.character(iris$Species[as.integer(label[node])])
    } else {
      species <- unique(unlist(path$result))
      if (length(species) == 1) {
        species
      } else {
        NA_character_
      }
    }
  }))

hairball <- as_tbl_graph(highschool) |> 
  mutate(
    year_pop = map_local(mode = 'in', .f = function(neighborhood, ...) {
      neighborhood %E>% pull(year) |> table() |> sort(decreasing = TRUE)
    }),
    pop_devel = map_chr(year_pop, function(pop) {
      if (length(pop) == 0 || length(unique(pop)) == 1) return('unchanged')
      switch(names(pop)[which.max(pop)],
             '1957' = 'decreased',
             '1958' = 'increased')
    }),
    popularity = map_dbl(year_pop, ~ .[1]) %|% 0
  ) |> 
  activate(edges) |> 
  mutate(year = as.character(year))
```

### Link
While you don't have to use a straight line for edges it is certainly possible
and `geom_edge_link()` is here to serve your needs:

```{r}
ggraph(hairball, layout = 'stress') + 
  geom_edge_link(aes(colour = year))
```

There's really not much more to it --- every edge is simply a straight line 
between the terminal nodes. Moving on...

### Fan
Sometimes the graph is not simple, i.e. it has multiple edges between the same
nodes. Using links is a bad choice here because edges will overlap and the
viewer will be unable to discover parallel edges. `geom_edge_fan()` got you 
covered here. If there are no parallel edges it behaves like `geom_edge_link()`
and draws a straight line, but if parallel edges exists it will spread them out
as arcs with different curvature. Parallel edges will be sorted by 
directionality prior to plotting so edges flowing in the same direction will be
plotted together:

```{r}
ggraph(hairball, layout = 'stress') + 
  geom_edge_fan(aes(colour = year))
```

### Parallel
An alternative to `geom_edge_fan()` is `geom_edge_parallel()`. It will draw 
edges as straight lines but in the case of multi-edges it will offset each edge 
a bit so they run parallel to each other. As with `geom_edge_fan()` the edges 
will be sorted by direction first. The offset is done at draw time and will thus
remain constant even during resizing:

```{r}
ggraph(hairball, layout = 'stress') + 
  geom_edge_parallel(aes(colour = year))
```

### Loops
Loops cannot be shown with regular edges as they have no length. A dedicated 
`geom_edge_loop()` exists for these cases:

```{r}
# let's make some of the student love themselves
loopy_hairball <- hairball |> 
  bind_edges(tibble::tibble(from = 1:5, to = 1:5, year = rep('1957', 5)))
ggraph(loopy_hairball, layout = 'stress') + 
  geom_edge_link(aes(colour = year), alpha = 0.25) + 
  geom_edge_loop(aes(colour = year))
```

The direction, span, and strength of the loop can all be controlled, but in 
general loops will add a lot of visual clutter to your plot unless the graph is
very simple.

### Density
This one is definitely strange, and I'm unsure of it's usefulness, but it is here
and it deserves an introduction. Consider the case where it is of interest to
see which types of edges dominates certain areas of the graph. You can colour
the edges, but edges can tend to get overplotted, thus reducing readability.
`geom_edge_density()` lets you add a shading to your plot based on the density
of edges in a certain area:

```{r}
ggraph(hairball, layout = 'stress') + 
  geom_edge_density(aes(fill = year)) + 
  geom_edge_link(alpha = 0.25)
```

### Arcs
While some insists that curved edges should be used in standard *"hairball"* 
graph visualisations it really is a poor choice, as it increases overplotting
and decreases interpretability for virtually no gain (unless complexity is 
your thing). That doesn't mean arcs have no use in graph visualizations. Linear 
and circular layouts can benefit greatly from them and `geom_edge_arc()` is 
provided precisely for this scenario:

```{r}
ggraph(hairball, layout = 'linear') + 
  geom_edge_arc(aes(colour = year))
```

Arcs behave differently in circular layouts as they will always bend towards
the center no matter the direction of the edge (the same thing can be achieved
in a linear layout by setting `fold = TRUE`).

```{r}
ggraph(hairball, layout = 'linear', circular = TRUE) + 
  geom_edge_arc(aes(colour = year)) + 
  coord_fixed()
```

### Bundling
Edge bundling is a technique to reduce clutter in a network visualization by
bundling edges that flows in the same direction. There are various ways of doing
this, many with heavy computational cost and the potential to mislead. The 
technique were initially confined to connections between nodes with a 
[hierarchical structure](#Connections) but has been expanded to general graphs.
ggraph provides 3 different bundling geoms with various up- and downsides. 

#### Force directed
This is perhaps the most classic. It treats the edges as an array of points with
the propensity to attract each other if edges are parallel. It suffers from bad
performance (though the edge bundling geoms uses memoisation to avoid 
recomputations) and can also be misleading as it doesn't use the underlying 
topology of the graph to determine if edges should be bundled, only whether they
are parallel.

```{r}
ggraph(hairball) + 
  geom_edge_bundle_force(n_cycle = 2, threshold = 0.4)
```

#### Edge path
An alternative is to let the edges follow the shortest paths rather than 
attract each other. This means the topology is being used in the bundling and
in theory lead to less misleading results. It also has the upside of being 
faster. The algorithm is iterative so that if an edge has been bundled it is
deleted from the graph where the shortest path is being searched in. In this
way the edges naturally converge towards a few "highways".

```{r}
ggraph(hairball) + 
  geom_edge_bundle_path()
```

#### Minimal
In the same vein as edge path bundling but even simpler, you can use the minimal
spanning tree of the graph as the scaffold to bundle edges along. As such, it 
changes to the hierarchical edge bundling approach, just with an implicit 
hierarchy calculated on the graph. This method is very fast but does create bias
in the output as edges will (obviously) travel along the minimal spanning tree 
thus amplifying that topology.

```{r}
ggraph(hairball) + 
  geom_edge_bundle_minimal()
```


### Elbow
Aah... The classic dendrogram with its right angle bends. Of course such
visualizations are also supported with the `geom_edge_elbow()`. It goes without
saying that this type of edge requires a layout that flows in a defined 
direction, such as a tree:

```{r}
ggraph(hierarchy, layout = 'dendrogram', height = height) + 
  geom_edge_elbow()
```

### Diagonals
If right angles aren't really your thing `ggraph` provides a smoother version in
the form of `geom_edge_diagonal()`. This edge is a quadratic bezier with control
points positioned at the same x-value as the terminal nodes and halfway 
in-between the nodes on the y-axis. The result is more organic than the elbows:

```{r}
ggraph(hierarchy, layout = 'dendrogram', height = height) + 
  geom_edge_diagonal()
```

It tends to look a bit weird with hugely unbalanced trees so use with care...

### Bends
An alternative to diagonals are bend edges which are elbow edges with a 
smoothed corner. It is implemented as a quadratic bezier with control points at
the location of the expected elbow corner:

```{r}
ggraph(hierarchy, layout = 'dendrogram', height = height) + 
  geom_edge_bend()
```

### Hive
This is certainly a very specific type of edge, intended only for use with hive
plots. It draws edges as quadratic beziers with control point positioned 
perpendicular to the axes of the hive layout:

```{r}
ggraph(hairball, layout = 'hive', axis = pop_devel, sort.by = popularity) + 
  geom_edge_hive(aes(colour = year)) + 
  geom_axis_hive(label = FALSE) + 
  coord_fixed()
```

### Span
As with the hive edge the `geom_edge_span()` is made in particular for a 
specific layout - the fabric layout. It draws the edge as a vertical line 
connecting the horizontal node lines of the layout, potentially with a terminal
shape.

```{r}
ggraph(hairball, layout = 'fabric', sort.by = node_rank_fabric()) + 
  geom_node_range(colour = 'grey') + 
  geom_edge_span(end_shape = 'circle') + 
  coord_fixed()
```

### Point and tile
It may seem weird to have edge geoms that doesn't have any span, but the matrix
layout calls for exactly that. The terminal nodes of the edge are determined by
the vertical and horizontal position of the mark, and for that reason the geom 
doesn't need any extend. The point and tile geoms serve the same purpose but are
simply different geometry types:

```{r}
ggraph(hairball, layout = 'matrix', sort.by = bfs_rank()) + 
  geom_edge_point() + 
  coord_fixed()
```

```{r}
ggraph(hairball, layout = 'matrix', sort.by = bfs_rank()) + 
  geom_edge_tile() + 
  coord_fixed()
```

## The three types of edge geoms
Almost all edge geoms comes in three variants. The basic variant (no suffix) as 
well as the variant suffixed with 2 (e.g. `geom_edge_link2()`) calculates a 
number (`n`) of points along the edge and draws it as a path. The variant 
suffixed with 0 (e.g. `geom_edge_diagonal0()`) uses the build in grid grobs to
draw the edges directly (in case of a diagonal it uses `bezierGrob()`). It might
seem strange to have so many different implementations of the same geoms but 
there's a reason to the insanity...

### Base variant
The basic edge geom is drawn by calculating a number of points along the edge
path and draw a line between these. This means that you're in control of the
detail level of curved edges and that all complex calculations happens up front.
Generally you will see better performance using the base variant rather than the
0-variant that uses grid grobs, unless you set the number of points to calculate
to something huge (50--100 is usually sufficient for a smooth look). Apart from
better performance you also get a nice bonus (you actually get several, but only
one is discussed here): The possibility of drawing a gradient along the edge.
Each calculated point gets an index value between 0 and 1 that specifies how
far along the edge it is positioned and this value can be used to e.g. map to
an alpha level to show the direction of the edge:

```{r}
ggraph(hairball, layout = 'linear') + 
  geom_edge_arc(aes(colour = year, alpha = after_stat(index))) + 
  scale_edge_alpha('Edge direction', guide = 'edge_direction')
```

### 2-variant
Like the base variant the 2-variant calculates points along the edge and draws
a path along them. The difference here is that in this variant you can map node
attributes to the edge and the aesthetics are then interpolated along the edge.
This is easier to show than to explain:

```{r}
ggraph(hierarchy, layout = 'dendrogram', height = height) + 
  geom_edge_elbow2(aes(colour = node.Class))
```

There are considerably more computation going on than in the base variant so 
unless you need to interpolate values between the terminal nodes you should go
with the base variant.

### 0-variant
This is, sadly, the boring one at the end. You don't get the luxury of smooth
gradients over the edge and often you have a considerably worse performance.
What you gain though is tack sharp resolution in the curves so if this is of 
utmost importance you are covered by this variant.

## Edge strength
Many of the edge geoms takes a strength argument that denotes their deviation 
from a straight line. Setting `strength = 0` will always result in a straight
line, while `strength = 1` is the default look. Anything in between can be used
to modify the look of the edge, while values outside that range will probably 
result in some weird looks. Some examples are shown below:

```{r}
small_tree <- create_tree(5, 2)

ggraph(small_tree, 'dendrogram') + 
  geom_edge_elbow(strength = 0.75)
```

```{r}
ggraph(small_tree, 'dendrogram') + 
  geom_edge_diagonal(strength = 0.5)
```


## Decorating edges
An edge is so much more than a line... Well at least it is also potentially an
arrow and a label. This section will go into how these can be added. To clearly
see the effect here we will use a slightly simpler graph

```{r}
# Random names - I swear
simple <- create_notable('bull') |> 
  mutate(name = c('Thomas', 'Bob', 'Hadley', 'Winston', 'Baptiste')) |> 
  activate(edges) |> 
  mutate(type = sample(c('friend', 'foe'), 5, TRUE))
```

### Arrows
While we saw above that direction can be encoded as a gradient, the good old 
arrow is still available. As with the standard `ggplot2` geoms an arrow can be 
added using the arrow argument:

```{r}
ggraph(simple, layout = 'graphopt') + 
  geom_edge_link(arrow = arrow(length = unit(4, 'mm'))) + 
  geom_node_point(size = 5)
```

I hope you think *Ugh* at the sight of this. The edges naturally extend to the
node center and nodes are thus drawn on top of the arrow heads. There's a 
solution to this in the form of the `start_cap` and `end_cap` aesthetics in the
base and 2-variant edge geoms (sorry 0-variant). This can be used to start and
stop the edge drawing at an absolute distance from the terminal nodes. Watch this:

```{r}
ggraph(simple, layout = 'graphopt') + 
  geom_edge_link(arrow = arrow(length = unit(4, 'mm')), 
                 end_cap = circle(3, 'mm')) + 
  geom_node_point(size = 5)
```

Using the `circle()`, `square()`, `ellipsis()`, and `rectangle()` helpers it is
possible to get a lot of control over how edges are capped at either end. This 
works for any edge, curved or not:

```{r}
ggraph(simple, layout = 'linear', circular = TRUE) + 
  geom_edge_arc(arrow = arrow(length = unit(4, 'mm')), 
                start_cap = circle(3, 'mm'),
                end_cap = circle(3, 'mm')) + 
  geom_node_point(size = 5) + 
  coord_fixed()
```

When plotting node labels you often want to avoid that incoming and outgoing 
edges overlaps with the labels. `ggraph` provides a helper that calculates the 
bounding rectangle of the labels and cap edges based on that:

```{r}
ggraph(simple, layout = 'graphopt') + 
  geom_edge_link(aes(start_cap = label_rect(node1.name),
                     end_cap = label_rect(node2.name)), 
                 arrow = arrow(length = unit(4, 'mm'))) + 
  geom_node_text(aes(label = name))
```

The capping of edges is dynamic and responds to resizing of the plot so the 
absolute size of the cap areas are maintained at all time.

#### A quick note on directionality
In `ggraph` there is no such thing as an undirected graph. Every edge has a 
start and an end node. For undirected graphs the start and end of edges is 
arbitrary but still exists and it is thus possible to add arrowheads to 
undirected graphs as well. This should not be done of course, but this is the
responsibility of the user as `ggraph` does not make any checks during 
rendering.

### Labels
You would expect that edge labels would be their own geom(s), but `ggraph` 
departs from the stringent grammar interpretation here. This is because the 
label placement is dependent on the choice of edge. Because of this edge 
labeling is bundled with each edge geom (but not the 0-variant) through the 
label aesthetic

```{r}
ggraph(simple, layout = 'graphopt') + 
  geom_edge_link(aes(label = type), 
                 arrow = arrow(length = unit(4, 'mm')), 
                 end_cap = circle(3, 'mm')) + 
  geom_node_point(size = 5)
```

Usually you would like the labels to run along the edges, but providing a 
fixed angle will only work at a very specific aspect ratio. Instead `ggraph`
offers to calculate the correct angle dynamically so the labels always runs
along the edge. Furthermore it can offset the label by an absolute length:

```{r}
ggraph(simple, layout = 'graphopt') + 
  geom_edge_link(aes(label = type), 
                 angle_calc = 'along',
                 label_dodge = unit(2.5, 'mm'),
                 arrow = arrow(length = unit(4, 'mm')), 
                 end_cap = circle(3, 'mm')) + 
  geom_node_point(size = 5)
```

`ggraph` offers a lot of additional customization of the edge labels but this
shows the main features. As with arrowheads labels can severely clutter your
visualization so it is only advisable on very simple graphs.

## Connections
The estranged cousin of edges are connections. While edges show the relational 
nature of the nodes in the graph structure, connections connect nodes that are 
not connected in the graph. This is done by finding the shortest path between
the two nodes. Currently the only connection geom available is 
`geom_conn_bundle()` that implements the hierarchical edge bundling technique:

```{r}
flaregraph <- tbl_graph(flare$vertices, flare$edges)
from <- match(flare$imports$from, flare$vertices$name)
to <- match(flare$imports$to, flare$vertices$name)
ggraph(flaregraph, layout = 'dendrogram', circular = TRUE) + 
  geom_conn_bundle(data = get_con(from = from, to = to), alpha = 0.1) + 
  coord_fixed()
```

The connection concept is underutilized at the moment but I expect to add more
support for this in coming releases.

## Want more?
Check out the other vignettes for more information on how to specify 
[layouts](Layouts.html) and draw [nodes](Nodes.html)...