1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143
|
"""
Dendrogram of Hierarchical Clustering
-------------------------------------
This is a dendrogram from the result of a hierarchical clustering. It's based on the example from
https://scikit-learn.org/stable/auto_examples/cluster/plot_agglomerative_dendrogram.html
"""
# category: case studies
import pandas as pd
import altair as alt
import numpy as np
# the variable `den` shown below is an exemplary output of `scipy.cluster.hierarchy.dendrogram`
# (https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.dendrogram.html#scipy.cluster.hierarchy.dendrogram)
# where the dendrogram itself is truncated such that no more than 3 levels of the dendrogram tree are shown.
den = {
'dcoord': [[0.0, 0.8187388676087964, 0.8187388676087964, 0.0],
[0.0, 1.105139508538779, 1.105139508538779, 0.0],
[0.8187388676087964,
1.3712698320830048,
1.3712698320830048,
1.105139508538779],
[0.0, 0.9099819926189507, 0.9099819926189507, 0.0],
[0.0, 1.2539936203984452, 1.2539936203984452, 0.0],
[0.9099819926189507,
1.9187528699821954,
1.9187528699821954,
1.2539936203984452],
[1.3712698320830048,
3.828052620290243,
3.828052620290243,
1.9187528699821954],
[0.0, 1.7604450194955439, 1.7604450194955439, 0.0],
[0.0, 1.845844754344974, 1.845844754344974, 0.0],
[1.7604450194955439,
4.847708507921838,
4.847708507921838,
1.845844754344974],
[0.0, 2.8139388316471536, 2.8139388316471536, 0.0],
[0.0, 2.8694176394568705, 2.8694176394568705, 0.0],
[2.8139388316471536,
6.399406819518539,
6.399406819518539,
2.8694176394568705],
[4.847708507921838,
12.300396052792589,
12.300396052792589,
6.399406819518539],
[3.828052620290243,
32.44760699959244,
32.44760699959244,
12.300396052792589]],
'icoord': [[5.0, 5.0, 15.0, 15.0],
[25.0, 25.0, 35.0, 35.0],
[10.0, 10.0, 30.0, 30.0],
[45.0, 45.0, 55.0, 55.0],
[65.0, 65.0, 75.0, 75.0],
[50.0, 50.0, 70.0, 70.0],
[20.0, 20.0, 60.0, 60.0],
[85.0, 85.0, 95.0, 95.0],
[105.0, 105.0, 115.0, 115.0],
[90.0, 90.0, 110.0, 110.0],
[125.0, 125.0, 135.0, 135.0],
[145.0, 145.0, 155.0, 155.0],
[130.0, 130.0, 150.0, 150.0],
[100.0, 100.0, 140.0, 140.0],
[40.0, 40.0, 120.0, 120.0]],
'ivl': [
'(7)', '(8)', '41', '(5)', '(10)', '(7)', '(4)', '(8)', '(9)', '(15)', '(5)', '(7)', '(4)', '(22)', '(15)', '(23)'
],
}
def get_leaf_loc(den):
"""
Get the location of the leaves
"""
_from = int(np.array(den["icoord"]).min())
_to = int(np.array(den["icoord"]).max() + 1)
return range(_from, _to, 10)
def get_df_coord(den):
"""
Get coordinate dataframe.
"""
# if you view the dendrogram as a collection of upside-down "U" shapes, then
# we can regard the 4 corners of the upside-down "U" as points 1, 2, 3 and 4.
cols_xk = ["xk1", "xk2", "xk3", "xk4"]
cols_yk = ["yk1", "yk2", "yk3", "yk4"]
df_coord = pd.merge(
pd.DataFrame(den["icoord"], columns=cols_xk),
pd.DataFrame(den["dcoord"], columns=cols_yk),
left_index=True,
right_index=True
)
return df_coord
source = get_df_coord(den)
base = alt.Chart(source)
# the U shape is composed of a shoulder plus two arms
shoulder = base.mark_rule().encode(
alt.X("xk2:Q", title=""),
alt.X2("xk3:Q"),
alt.Y("yk2:Q", title="")
)
arm1 = base.mark_rule().encode(
alt.X("xk1:Q"),
alt.Y("yk1:Q"),
alt.Y2("yk2:Q")
)
arm2 = base.mark_rule().encode(
alt.X("xk3:Q"),
alt.Y("yk3:Q"),
alt.Y2("yk4:Q")
)
chart_den = shoulder + arm1 + arm2
df_text = pd.DataFrame(dict(labels=den["ivl"], x=get_leaf_loc(den)))
chart_text = alt.Chart(
df_text
).mark_text(
dy=0, angle=0, align="center"
).encode(
x = alt.X("x:Q", axis={"grid":False, "title":"Number of points in nodes"}),
text = alt.Text("labels:N")
)
(chart_den & chart_text).resolve_scale(
x="shared"
).configure(
padding={"top":10,"left":10}
).configure_concat(
spacing=0
).configure_axis(
labels=False,
ticks=False,
grid=False
).properties(
title="Hierarchical Clustering Dendrogram"
)
|