File: _model_class_selection.md

package info (click to toggle)
graph-tool 2.98%2Bds-2
  • links: PTS, VCS
  • area: main
  • in suites: forky
  • size: 29,324 kB
  • sloc: cpp: 87,937; python: 31,476; makefile: 952; xml: 101; sh: 42
file content (173 lines) | stat: -rw-r--r-- 6,510 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
# Model class selection

When averaging over partitions, we may be interested in evaluating which
**model class** provides a better fit of the data, considering all
possible parameter choices. This is done by evaluating the model
evidence summed over all possible partitions {cite}`inf-peixoto_nonparametric_2017`:

$$
P(\boldsymbol A) = \sum_{\boldsymbol\theta,\boldsymbol b}P(\boldsymbol A,\boldsymbol\theta, \boldsymbol b) =  \sum_{\boldsymbol b}P(\boldsymbol A,\boldsymbol b).
$$

This quantity is analogous to a [partition function](<https://en.wikipedia.org/wiki/Partition_function_(statistical_mechanics)>)
in statistical physics, which we can write more conveniently as a
negative [free energy](https://en.wikipedia.org/wiki/Thermodynamic_free_energy) by taking
its logarithm

$$
\ln P(\boldsymbol A) = \underbrace{\sum_{\boldsymbol b}q(\boldsymbol b)\ln P(\boldsymbol A,\boldsymbol b)}_{-\left<\Sigma\right>}\;
           \underbrace{- \sum_{\boldsymbol b}q(\boldsymbol b)\ln q(\boldsymbol b)}_{\mathcal{S}}
$$ (free-energy)

where

$$
q(\boldsymbol b) = \frac{P(\boldsymbol A,\boldsymbol b)}{\sum_{\boldsymbol b'}P(\boldsymbol A,\boldsymbol b')}
$$

is the posterior probability of partition $\boldsymbol b$. The
first term of Eq. {eq}`free-energy` (the "negative energy") is minus the
average of description length $\left<\Sigma\right>$, weighted
according to the posterior distribution. The second term
$\mathcal{S}$ is the [entropy](<https://en.wikipedia.org/wiki/Entropy_(information_theory)>) of the
posterior distribution, and measures, in a sense, the "quality of fit"
of the model: If the posterior is very "peaked", i.e. dominated by a
single partition with a very large probability, the entropy will tend to
zero. However, if there are many partitions with similar probabilities
--- meaning that there is no single partition that describes the network
uniquely well --- it will take a large value instead.

Since the MCMC algorithm samples partitions from the distribution
$q(\boldsymbol b)$, it can be used to compute
$\left<\Sigma\right>$ easily, simply by averaging the description
length values encountered by sampling from the posterior distribution
many times.

The computation of the posterior entropy $\mathcal{S}$, however,
is significantly more difficult, since it involves measuring the precise
value of $q(\boldsymbol b)$. A direct "brute force" computation of
$\mathcal{S}$ is implemented via
{meth}`~graph_tool.inference.BlockState.collect_partition_histogram`
and {func}`~graph_tool.inference.microstate_entropy`, however
this is only feasible for very small networks. For larger networks, we
are forced to perform approximations. One possibility is to employ the
method described in {cite}`inf-peixoto_revealing_2021`, based on fitting a
mixture "random label" model to the posterior distribution, which allows
us to compute its entropy. In graph-tool this is done by using
{class}`~graph_tool.inference.ModeClusterState`, as we
show in the example below.

```{testsetup} model-evidence
np.random.seed(43)
gt.seed_rng(43)
```

```{testcode} model-evidence
from scipy.special import gammaln

g = gt.collection.data["lesmis"]

for deg_corr in [True, False]:
    state = gt.minimize_blockmodel_dl(g, state_args=dict(deg_corr=deg_corr))  # Initialize the Markov
                                                                              # chain from the "ground
                                                                              # state"
    dls = []         # description length history
    bs = []          # partitions

    def collect_partitions(s):
        global bs, dls
        bs.append(s.get_state().a.copy())
        dls.append(s.entropy())

    # Now we collect 2000 partitions; but the larger this is, the
    # more accurate will be the calculation

    gt.mcmc_equilibrate(state, force_niter=2000, mcmc_args=dict(niter=10),
                        callback=collect_partitions)

    # Infer partition modes
    pmode = gt.ModeClusterState(bs)

    # Minimize the mode state itself
    gt.mcmc_equilibrate(pmode, wait=1, mcmc_args=dict(niter=1, beta=np.inf))

    # Posterior entropy
    H = pmode.posterior_entropy()

    # log(B!) term
    logB = mean(gammaln(np.array([len(np.unique(b)) for b in bs]) + 1))

    # Evidence
    L = -mean(dls) + logB + H

    print(f"Model log-evidence for deg_corr = {deg_corr}: {L}")
```

```{testoutput} model-evidence
Model log-evidence for deg_corr = True: -680.709748...
Model log-evidence for deg_corr = False: -670.79211...
```

The outcome shows a slight preference for the non-degree-corrected model.

When using the nested model, the approach is entirely analogous. We show below the
approach for the same network, using the nested model.

```{testsetup} nested-model-evidence
np.random.seed(43)
gt.seed_rng(42)
```

```{testcode} nested-model-evidence
from scipy.special import gammaln

g = gt.collection.data["lesmis"]

for deg_corr in [True, False]:
    state = gt.NestedBlockState(g, state_args=dict(deg_corr=deg_corr))

    # Equilibrate
    gt.mcmc_equilibrate(state, force_niter=1000, mcmc_args=dict(niter=10))

    dls = []         # description length history
    bs = []          # partitions

    def collect_partitions(s):
        global bs, dls
        bs.append(s.get_state())
        dls.append(s.entropy())

    # Now we collect 5000 partitions; but the larger this is, the
    # more accurate will be the calculation

    gt.mcmc_equilibrate(state, force_niter=5000, mcmc_args=dict(niter=10),
                        callback=collect_partitions)

    # Infer partition modes
    pmode = gt.ModeClusterState(bs, nested=True)

    # Minimize the mode state itself
    gt.mcmc_equilibrate(pmode, wait=1, mcmc_args=dict(niter=1, beta=np.inf))

    # Posterior entropy
    H = pmode.posterior_entropy()

    # log(B!) term
    logB = mean([sum(gammaln(len(np.unique(bl))+1) for bl in b) for b in bs])

    # Evidence
    L = -mean(dls) + logB + H

    print(f"Model log-evidence for deg_corr = {deg_corr}: {L}")
```

```{testoutput} nested-model-evidence
Model log-evidence for deg_corr = True: -670.572583...
Model log-evidence for deg_corr = False: -662.13989...
```

The situation is the same: The non-degree-corrected model possesses the largest
evidence. Note also that we observe a better evidence for the nested models
themselves, when comparing to the evidences for the non-nested model --- which
is not quite surprising, since the non-nested model is a special case of the
nested one.