File: workflow.h

package info (click to toggle)
gromacs 2025.4-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 287,236 kB
  • sloc: xml: 3,718,478; cpp: 654,820; ansic: 75,282; python: 20,471; sh: 3,471; perl: 2,218; yacc: 644; fortran: 397; lisp: 265; makefile: 171; lex: 125; awk: 68; csh: 39
file content (256 lines) | stat: -rw-r--r-- 10,423 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
/*
 * This file is part of the GROMACS molecular simulation package.
 *
 * Copyright 2018- The GROMACS Authors
 * and the project initiators Erik Lindahl, Berk Hess and David van der Spoel.
 * Consult the AUTHORS/COPYING files and https://www.gromacs.org for details.
 *
 * GROMACS is free software; you can redistribute it and/or
 * modify it under the terms of the GNU Lesser General Public License
 * as published by the Free Software Foundation; either version 2.1
 * of the License, or (at your option) any later version.
 *
 * GROMACS is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 * Lesser General Public License for more details.
 *
 * You should have received a copy of the GNU Lesser General Public
 * License along with GROMACS; if not, see
 * https://www.gnu.org/licenses, or write to the Free Software Foundation,
 * Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA.
 *
 * If you want to redistribute modifications to GROMACS, please
 * consider that scientific software is very special. Version
 * control is crucial - bugs must be traceable. We will be happy to
 * consider code for inclusion in the official distribution, but
 * derived work must not be called official GROMACS. Details are found
 * in the README & COPYING files - if they are missing, get the
 * official version at https://www.gromacs.org.
 *
 * To help us fund GROMACS development, we humbly ask that you cite
 * the research papers on the package. Check out https://www.gromacs.org.
 */

#ifndef GMXAPI_WORKFLOW_H
#define GMXAPI_WORKFLOW_H

/*! \internal \file
 * \brief Declare public interface for Workflow and related infrastructure.
 *
 * \ingroup gmxapi
 */

#include <forward_list>
#include <map>
#include <memory>
#include <string>

namespace gmxapi
{

/*!
 * \brief Uniquely identify a workflow node in the graph.
 *
 * The key probably needs a human-readable aspect, some machine-decipherable encoding of roles taken by the node,
 * and a hash to uniquely identify the output of the node (i.e. deterministic input parameters). It is probably not
 * necessary for nodes to refer to the consumers of their output by key, but they should abstractly refer to their
 * inputs by a key that is not dependent on a currently-running workflow.
 *
 * Requirements and roles:
 *
 * * serve as a key for use by other nodes to name their inputs
 * * encode workflow scheduling hints (TBD)
 * * provide robust assurance of reproducible results and restartability
 * * allow nodes to specify only their immediately dependent nodes (inwards directed edges)
 *
 * Workflow specifications need to be serializeable and portable across job restarts and porting to other computing
 * resources. The data graph manager and/or work scheduler need to be able to look at the inputs specified for a node
 * and be able to determine that the required node or its output is available. If a node is used as the input for
 * multiple other nodes, it should be clear how to avoid wasting resources when meeting the data requirement. If
 * similar looking nodes have different inputs or parameters, they must not be mistaken to be equivalent.
 *
 * Context-dependent aspects of the workflow specification cannot be included in a hash, then, but context-independent
 * aspects that affect the output of a node must be reflected.
 *
 * For example, an input filename should be included as identifying information, but the absolute path should not,
 * though path hints or conventions should be clear in the context. The filename is sufficient as a parameter with which
 * to construct the workflow node in an execution context, but is insufficient to uniquely identify the file since
 * several names get reused a lot. Some sort of checksum of the file should also be included so that the inputs of the
 * workflow at execution time can be checked against the inputs when the workflow was specified.
 *
 * Uniqueness of inputs could be more elaborate. For instance, a node may require the trajectory of a specific
 * simulation as input, but flexibly handle starting from an arbitrary step in that trajectory to allow check-pointed
 * workflows.
 *
 * The workflow object can have a list of keys that can be instantiated with no input dependencies, the scheduler could
 * scan for keys that represent source nodes, or workflow containers could be turned into graphs through an additional
 * preprocessing or clustering phase, but it will be easiest if we assert a protocol such as a node is not instantiated
 * or activated until its inputs are ready.
 *
 * This is just a type alias until more elaborate implementation is needed.
 */
using NodeKey = std::string;

// Forward declarations for definitions below.
class NodeSpecification;

/*!
 * \brief Recipe for a computational workflow.
 *
 * Provides a lightweight and portable container defining the nodes and edges in a workflow with
 * enough information for the workflow to be instantiated and run.
 *
 * \ingroup gmxapi
 */
class Workflow final
{
public:
    //! In initial version, Implementation class is just a type alias.
    using Impl = typename std::map<NodeKey, std::unique_ptr<NodeSpecification>>;

    /*! \brief Use create() to get Workflow objects.
     *
     * An empty workflow is not meaningful except to a builder, which does not
     * yet exist. Even a builder, though, will probably create the implementation
     * object directly and the Workflow object from that.
     */
    Workflow() = delete;

    /*!
     * \brief Construct by transfering ownership of an implementation object.
     *
     * \param impl Implementation object to wrap.
     *
     * Usage:
     *
     *     gmxapi::Workflow::Impl newGraph;
     *     // ...
     *     // configure graph...
     *     // ...
     *     // Create workflow container
     *     gmxapi::Workflow work {std::move(newGraph)};
     *     gmxapi::launchSession(&context, work);
     *
     */
    explicit Workflow(Impl&& impl);

    /*!
     * \brief Add a node to the workflow graph.
     *
     * The work specification must already have its inputs assigned to existing
     * nodes. This operation should only be permitted if it does not render a
     * valid workflow invalid.
     *
     * \param spec Operational node to add to the Workflow.
     *
     * \return Key for the new node in the Workflow container.
     *
     * \todo Not yet implemented.
     */
    static NodeKey addNode(std::unique_ptr<NodeSpecification> spec);

    /*!
     * \brief Get the node specification for a provided key.
     *
     * \param key Unique identifier for a node in the graph.
     * \return copy of the node specification.
     */
    std::unique_ptr<NodeSpecification> getNode(const gmxapi::NodeKey& key) const noexcept;

    /*!
     * \brief Get an iterator to the node key--value pairs.
     *
     * \return iterator across nodes in container.
     *
     * The order in which the nodes are returned is unspecified. Only forward iterator is provided.
     * \{
     */
    Impl::const_iterator cbegin() const;
    Impl::const_iterator cend() const;
    // Allow range based for loop to work before C++17
    Impl::const_iterator begin() const;
    Impl::const_iterator end() const;
    /*! \} */

    /*!
     * \brief Create a new workflow.
     *
     * \param filename TPR filename accessible both to the client and library.
     * \return Ownership of a new Workflow instance.
     */
    static std::unique_ptr<Workflow> create(const std::string& filename);

private:
    /*!
     * \brief Storage structure.
     */
    Impl graph_;
};

/*!
 * \brief Portable specification to define work and inform instantiation by the library.
 *
 * The GROMACS library creates the objects it needs to run as late as possible while
 * optimizing parallel resources at run time. The specifications provide a way for
 * client code to interact with the definition of the work to be performed while carrying
 * enough information for GROMACS to launch.
 *
 * Client input is translated into serializeable parameters sufficient to instantiate
 * the node at runtime.
 *
 * On the library side, the spec should have a pointer to a factory function for
 * the library object(s) it represents that is valid in the current Context. Thus,
 * when a workflow specification (and thus Node Specifications) are cloned to new
 * Contexts, the Contexts must resolve an appropriate function pointer or raise an
 * appropriate exception indicating the specified work is not possible on the targeted
 * execution context.
 *
 * Different node types will have different sorts of parameters and such.
 * \todo Clarify chain of responsibility for defining param type.
 */
class NodeSpecification
{
public:
    //! Base class is heritable.
    virtual ~NodeSpecification();

    //! Nodes can use arbitrary param type, but string is default.
    using paramsType = std::string;

    /*!
     * \brief Get an equivalent node for a new graph.
     *
     * \return ownership of a new node specification
     *
     * Allows a derived class to define its own copy behavior when accessed
     * through a base class pointer.
     *
     * \internal
     * Future versions may use this function to translate a node spec from one
     * context to another, in which case the context would likely be passed
     * as an argument. E.g. clone(&context) or cloneTo(&workspec). It may
     * be confusing for developers to manage the distinction between replicating
     * a node in a graph versus using helper methods to copy the node-specific
     * parameters to a node in a new graph, so it is probably better to
     * reserve copy/move construction/assignment for internal code and use
     * well-named well-documented free functions for such higher level operations.
     * Furthermore, it is not universally intuitive what is meant by copying
     * a node without specifying what happens to edges and connected nodes.
     */
    virtual std::unique_ptr<NodeSpecification> clone() = 0;

    /*!
     * \brief Fetch current params value.
     *
     * \return copy of internal params value.
     */
    virtual paramsType params() const noexcept = 0;

    //! Parameters for the operation represented by this node.
    paramsType params_{};
};

} // end namespace gmxapi

#endif // GMXAPI_WORKFLOW_H