File: commandline_interface.rst

package info (click to toggle)
python-pbcommand 0.2.17-1~bpo8%2B1
  • links: PTS, VCS
  • area: main
  • in suites: jessie-backports
  • size: 556 kB
  • sloc: python: 3,451; makefile: 200
file content (142 lines) | stat: -rw-r--r-- 5,709 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
Common Commandline Interface
============================


Motivation And High Level Example
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- Provide a common interface for executables to expose options
- Provide a common interface for executables to be called
- Provide a common interface for exposing metadata of tool, such as memory usage, cpu usage, required temp files

Benefits
~~~~~~~~

- A consistent concrete common interface for shelling out to executables
- task options have a consistent model for validation
- task versioning is supported
- A principled model for wrapping tools. For example, pbalign would "inherit" blasr options and extend, or wrap them.
- Once a manifest has been defined and registered to pbsmrtpipe, the task/manifest can be referenced in pipelines with no additional work


Terms
~~~~~

- 'Tool Contract' is a single file that exposing the exe interface. It
  contains metadata about the task, such as input and output file
  types, nproc.
- 'Resolved Tool Contract' is a single file that contains the resolved values in the manifest
- 'Driver' is the general interface for calling a commandline exe. This can be called from the commandline or directly as an API call (via any language which supports the manifest interface).

Hello World Example
~~~~~~~~~~~~~~~~~~~

Tool Contract file for 'my-exe'


.. literalinclude:: ../../tests/data/dev_example_tool_contract.json
    :language: javascript


Details of Tool Contract
~~~~~~~~~~~~~~~~~~~~~~~~

- Tool Contract id which can be referenced globally (e.g., within a pipeline template)
- Input File types have file type id, id that can be referenced within the driver, and a brief description
- Output File types have a file type id and a default output file name
- number of processors is defined by $nproc. "\$" prefixed values are symbols that have well defined semantic meaning
- Temp files and Log files are defined using "$" symbols are can have multiple items
- the exe options are exposed via jsonschema standard. Each option has an id and maps to a single schema definition. Each option must have a default value.
- the exe section of the "driver" is the commandline interface that will be called as a positional arguement (e.g., "my-exe resolved-manifest.json")
- task type describes if the task should be submitted to the cluster resources


Note. A single driver can reference many manifests. For example "pbreports" would have a single driver exe. From the "task_manifest_id", the driver would dispatch to the correct function call

Programmatically defining a Parser to Emit a Tool Contract
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

pbcommand provides a API to create a tool contract and an argparse instance from a single interface. This facilitates a single point of defining options and keeps the standard commandline entry point and the tool contract to be in sync. This also allows your tool to emit the tool contract to stdout using "--emit-tool-contract" and to be run from a **Resolved Tool Contract** using the "--resolved-tool-contract /path/to/resolved-tool-contract.json" commandline argument.

Complete App shown below.


.. literalinclude:: ../../pbcommand/cli/examples/dev_app.py
    :language: python

.. note:: Options must be prefixed with {pbcommand}.task_options.{option_id} format.

Details of Resolved Tool Contract
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- input, outputs file types are resolved to file paths
- nproc and other resources are resolved


.. literalinclude:: ../../tests/data/dev_example_resolved_tool_contract.json
    :language: javascript


Library usage
~~~~~~~~~~~~~

(language API example)


Example of using a manifest in an tool, such as mapping status report.

.. code-block:: python

    from pbcommand.pb_io import load_tool_contract_from
    # your application was called via "pbreports resolved-manifest.json"
    p = "/path/to/resolved-tool-contract.json"
    # load resolved manifest from
    rtc = load_tool_contract_from(p)

    # general call to mapping stats report main
    # mapping_stats_main("/path/to/align.dataset.xml", "/path/to/reference.dataset.xml", "/path/to/output.json", my_option=1235)
    exit_code = mapping_stats_main(rtc.input_files[0], rtc.input_files[1], rtc.output_files[0], rtc.opts["pbreports.task_options.my_option"])


Example to resolving the Tool Contract
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The resolver must have assigned values for max nproc, root temp dir,
output dir. The output dir can be used to assign the output paths of
the output files.

.. code-block:: python

    # simple python example, the scala or C++ API would be similar

    from pbcommand.pb_io import load_tool_contract_from
    from pbcommand.cli import resolve_tool_contract

    # load tool contract that is registered to your python package
    tool_contract = load_tool_contract_from("/path/to/tool-contract.json")
    tool_contract = ToolContractRegistry.get("pbsmrtpipe.tasks.dev_static_task")

    max_nproc = 3
    tmp_dir = "/tmp/my-tmp"
    output_dir = os.getcwd()

    input_files = ("/path/to/file.csv", "/path/to/dataset.subreads.xml")
    options = {"pbsmrtipe.task_options.my_option": 1234}

    # create instance of Resolved Tool Contract
    rtc = resolve_tool_contract(tool_contract, input_files, output_dir, tmp_dir, max_nproc, options)

    # TODO. Not implemented yet
    # The driver will run the tool, validate output files exist and
    # cleanup any temp files/resources.
    result = run_tool_contract_driver(rtc, cleanup=False)

    print result.exit_code
    print result.error_message
    print result.host_name
    print result.run_time

    # sugar to persist results
    result.write_json("output-results.json")