File: checkpointing_long_running.rst

package info (click to toggle)
python-cogent 1.5.3-2
  • links: PTS, VCS
  • area: main
  • in suites: jessie, jessie-kfreebsd
  • size: 16,424 kB
  • ctags: 24,343
  • sloc: python: 134,200; makefile: 100; ansic: 17; sh: 10
file content (67 lines) | stat: -rw-r--r-- 2,647 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
.. _checkpointing-optimisation:

Checkpointing optimisation runs
===============================

.. sectionauthor Gavin Huttley

A common problem in HPC systems is to make sure a long running process is capable of restarting after interruptions by restoring to the last check pointed state. The optimiser class code has this capability, for example and we'll illustrate that here. We first construct a likelihood function object.

.. doctest::
    
    >>> from cogent import LoadSeqs, LoadTree
    >>> from cogent.evolve.models import F81
    >>> aln = LoadSeqs('data/primate_brca1.fasta')
    >>> tree = LoadTree('data/primate_brca1.tree')
    >>> sub_model = F81()
    >>> lf = sub_model.makeLikelihoodFunction(tree)
    >>> lf.setAlignment(aln)

We then start an optimisation, providing a filename for checkpointing and specifying a time-interval in (which we make very short here to ensure something get's written, for longer running functions the default ``interval`` setting is fine). Calling ``optimise`` then results in the notice that checkpoint's are being written.

.. doctest::
    
    >>> checkpoint_fn = 'checkpoint_this.txt'
    >>> lf.optimise(filename=checkpoint_fn, interval=100, show_progress = False)
    CHECKPOINTING to file 'checkpoint_this.txt'...

Recovering from a real run that was interrupted generates an additional notification: ``RESUMING from file ..``. For the purpose of this snippet we just show that the checkpoint file exists.

.. doctest::
    
    >>> import cPickle
    >>> data = cPickle.load(open(checkpoint_fn))
    >>> print data
    <cogent.maths.simannealingoptimiser.AnnealingRun object...

Checkpointing phylogenetic optimisation runs
============================================

The built-in phylogeny code is also capable of checkpointing it's internal state. We illustrate here for the least-squares approach but the same approach also holds for maximum-likelihood. We load some stored distances.

.. doctest::

    >>> import cPickle
    >>> dists = cPickle.load(open('data/dists_for_phylo.pickle'))

We make the weighted least-squares calculator.

.. doctest::

    >>> from cogent.phylo import distance, least_squares
    >>> ls = least_squares.WLS(dists)

We start searching for trees, providing the name of the file to checkpoint to.

.. doctest::
    
    >>> checkpoint_phylo_fn = 'checkpoint_phylo.txt'
    >>> score, tree = ls.trex(a = 5, k = 1, filename=checkpoint_phylo_fn, interval=100)

.. following cleans up files

.. doctest::
    :hide:
    
    >>> from cogent.util.misc import remove_files
    >>> remove_files([checkpoint_fn, checkpoint_phylo_fn], error_on_missing=False)