1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195
|
.. _defining-new-commands:
Defining new commands
=====================
A pyqi ``Command`` is a class that accepts inputs, does some work, and produces outputs. A ``Command`` is designed to be interface agnostic, so ideally should not be tied to a filesystem (i.e., it shouldn't do I/O or take filepaths) though there are some exceptions. Your ``Command`` class ultimately defines an API for your ``Command`` that can then easily be wrapped in other interface types (for example, a command line interface and/or a web interface) which handle input and output in an interface-specific way. This strategy also facilitates unit testing of your ``Command`` (by separating core functionality, which is essential to test, from interfaces, which can be very difficult to test in an automated fashion), parallel processing with your ``Command``, and constructing workflows that chain multiple ``Commands`` together. In general, your ``Command`` should take structured input (for example, a list of tuples or a numpy array), not a file that needs to be parsed.
This document describes how to create your first ``pyqi`` ``Command``.
Stubbing a new command
----------------------
After installing pyqi, you can easily stub (i.e., create templates for) new commands using ``pyqi make-command``. You can get usage information by calling::
pyqi make-command -h
To create our sequence collection summarizer, we can start by stubbing a ``SequenceCollectionSummarizer`` class::
pyqi make-command -n SequenceCollectionSummarizer --credits "Greg Caporaso" -o sequence_collection_summarizer.py
If you run this command locally, substituting your own name where applicable, you'll have a new file called ``sequence_collection_summarizer.py``, which will look roughly like the following::
#!/usr/bin/python3
from __future__ import division
__credits__ = ["Greg Caporaso"]
from pyqi.core.command import (Command, CommandIn, CommandOut,
ParameterCollection)
class SequenceCollectionSummarizer(Command):
BriefDescription = "FILL IN A 1 SENTENCE DESCRIPTION"
LongDescription = "GO INTO MORE DETAIL"
CommandIns = ParameterCollection([
CommandIn(Name='foo', DataType=str,
Description='some required parameter', Required=True),
CommandIn(Name='bar', DataType=int,
Description='some optional parameter', Required=False,
Default=1)
])
CommandOuts = ParameterCollection([
CommandOut(Name="result_1", DataType=str, Description="xyz"),
CommandOut(Name="result_2", DataType=str, Description="123"),
])
def run(self, **kwargs):
# EXAMPLE:
# return {'result_1': kwargs['foo'] * kwargs['bar'],
# 'result_2': "Some output bits"}
raise NotImplementedError("You must define this method")
CommandConstructor = SequenceCollectionSummarizer
Defining a command
------------------
There are several values that you'll need to fill in to define your command based on the stub that is created by ``make-command``. The first, which are the easiest, are ``BriefDescription`` and ``LongDescription``. ``BriefDescription`` should be a one sentence description of your command, and ``LongDescription`` should be a more detailed explanation (usually 2-3 sentences). These are used in auto-generated documentation.
Next, you'll need to define the parameters that your new command can take as input. Each of these parameters will be an instance of the ``pyqi.core.command.CommandIn`` class.
Our ``SequenceCollectionSummarizer`` command will take one required parameter and one optional parameter. The required parameter will be called ``seqs``, and will be a list (or some other iterable type) of tuples of (sequence identifier, sequence) pairs. For example::
[('sequence1','ACCGTGGACCAA'),('sequence2','TGTGGA'), ...]
We'll also need to provide a description of this parameter (used in documentation), its type, and indicate that it is required. The final ``CommandIn`` definition should look like this::
CommandIn(Name='seqs', DataType=list,
Description='sequences to be summarized', Required=True)
The optional parameter will be called ``suppress_length_summary``, and if passed will indicate that we don't want information on sequence lengths included in our output summary. The ``Parameter`` definition in this case should look like this::
CommandIn(Name='suppress_length_summary', DataType=bool,
Description='do not generate summary information on the sequence lengths',
Required=False, Default=False)
The only additional parameter that is passed here, relative to our ``seqs`` parameter, is ``Default``. Because this parameter isn't required, it's necessary to give it a default value here. All of the ``CommandIns`` should be included in a ``pyqi.core.command.ParameterCollection`` object (as in the stubbed file).
.. note:: There are a few restrictions on what ``Name`` can be set to for a ``Parameter`` (e.g., a ``CommandIn`` or a ``CommandOut``). It must be a `valid python identifier <http://docs.python.org/2/reference/lexical_analysis.html#identifiers>`_ (e.g., it cannot contain ``-`` characters or begin with a number) so the ``Command`` can be called with named options instead of passing a dict. ``Parameter`` names also must be unique for a ``Command``.
Next, you'll need to define the results that the ``Command`` generates as output. In this example, our ``Command`` will generate three results: the number of sequences, the minimum sequence length, and the maximum sequence length. Each of these results will be an instance of the ``pyqi.core.command.CommandOut`` class. We define the name of the result, its type, and a description. The final ``CommandOuts`` should look like this::
CommandOut(Name='num_seqs', DataType=int, Description='number of sequences'),
CommandOut(Name='min_length', DataType=int, Description='minimum sequence length'),
CommandOut(Name='max_length', DataType=int, Description='maximum sequence length')
All of the ``CommandOuts`` should be included in a ``pyqi.core.command.ParameterCollection`` object (as in the stubbed file).
Next, we'll need to define what our ``Command`` will actually do. This is done in the ``run`` method, and all results are returned in a dictionary. The run method for our ``SequenceCollectionSummarizer`` object would look like the following::
def run(self, **kwargs):
"""
"""
num_seqs = 0
sequence_lengths = []
for seq_id, seq in kwargs['seqs']:
num_seqs += 1
sequence_lengths.append(len(seq))
if kwargs['suppress_length_summary']:
min_length = None
max_length = None
else:
min_length = min(sequence_lengths)
max_length = max(sequence_lengths)
return {'num_seqs':num_seqs,
'min_length':min_length,
'max_length':max_length}
In practice, if your ``Command`` is more complex than our ``SequenceCollectionSummarizer`` (which it probably is), you can define other methods that are called by ``run``. These should likely be private methods.
.. note:: ``kwargs`` is validated prior to ``run`` being called, so that any required ``kwargs`` that are missing will raise an error, and any optional ``kwargs`` that are missing will have their default values filled in. To customize the validation that is performed on ``kwargs`` for your ``Command`` you should override ``_validate_kwargs`` in your ``Command``.
A complete example Command
--------------------------
The following illustrates a complete python file defining a new pyqi ``Command``::
#!/usr/bin/python3
from __future__ import division
__credits__ = ["Greg Caporaso"]
from pyqi.core.command import (Command, CommandIn, CommandOut,
ParameterCollection)
class SequenceCollectionSummarizer(Command):
BriefDescription = "Generate summary statistics on a collection of sequences."
LongDescription = "Provide the number of sequences, the minimum sequence length, and the maximum sequence length given a collection of sequences. Sequences should be provided as a list (or other iterable object) of tuples of (sequence id, sequence) pairs."
CommandIns = ParameterCollection([
CommandIn(Name='seqs', DataType=list,
Description='sequences to be summarized', Required=True),
CommandIn(Name='suppress_length_summary', DataType=bool,
Description='do not generate summary information on the sequence lengths',
Required=False, Default=False)
])
CommandOuts = ParameterCollection([
CommandOut(Name='num_seqs', DataType=int, Description='number of sequences'),
CommandOut(Name='min_length', DataType=int, Description='minimum sequence length'),
CommandOut(Name='max_length', DataType=int, Description='maximum sequence length')
])
def run(self, **kwargs):
"""
"""
num_seqs = 0
sequence_lengths = []
for seq_id, seq in kwargs['seqs']:
num_seqs += 1
sequence_lengths.append(len(seq))
if kwargs['suppress_length_summary']:
min_length = None
max_length = None
else:
min_length = min(sequence_lengths)
max_length = max(sequence_lengths)
return {'num_seqs':num_seqs,
'min_length':min_length,
'max_length':max_length}
CommandConstructor = SequenceCollectionSummarizer
At this stage you have defined a new command and its API. To access the API in the python terminal, you could do the following::
# Import your new class
>>> from sequence_collection_summarizer import SequenceCollectionSummarizer
# Instantiate it
>>> s = SequenceCollectionSummarizer()
# Call the command, passing a list of (seq id, sequence) tuples as input.
# Note that because the parameters are provided as kwargs, you need to
# pass the parameter with a keyword.
>>> r = s(seqs=[('sequence1','ACCGTGGACCAA'),('sequence2','TGTGGA')])
# You can now see the full output of the command by inspecting the
# result dictionary.
>>> r
{'max_length': 12, 'min_length': 6, 'num_seqs': 2}
# Alternatively, you can access each value independently, as with any dictionary.
>>> print(r['num_seqs'])
2
>>> print(r['min_length'])
6
>>> print(r['max_length'])
12
# You can call this command again with different input.
# For example, we can call the command again passing the
# suppress_length_summary parameter.
>>> r = s(seqs=[('sequence1','ACCGTGGACCAA'),('sequence2','TGTGGA')],suppress_length_summary=True)
>>> r
{'max_length': None, 'min_length': None, 'num_seqs': 2}
|