File: defining_new_commands.rst

package info (click to toggle)
pyqi 0.3.2%2Bdfsg-7
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 916 kB
  • sloc: python: 2,705; makefile: 136
file content (195 lines) | stat: -rw-r--r-- 10,765 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
.. _defining-new-commands:

Defining new commands
=====================

A pyqi ``Command`` is a class that accepts inputs, does some work, and produces outputs. A ``Command`` is designed to be interface agnostic, so ideally should not be tied to a filesystem (i.e., it shouldn't do I/O or take filepaths) though there are some exceptions. Your ``Command`` class ultimately defines an API for your ``Command`` that can then easily be wrapped in other interface types (for example, a command line interface and/or a web interface) which handle input and output in an interface-specific way. This strategy also facilitates unit testing of your ``Command`` (by separating core functionality, which is essential to test, from interfaces, which can be very difficult to test in an automated fashion), parallel processing with your ``Command``, and constructing workflows that chain multiple ``Commands`` together. In general, your ``Command`` should take structured input (for example, a list of tuples or a numpy array), not a file that needs to be parsed.

This document describes how to create your first ``pyqi`` ``Command``.

Stubbing a new command
----------------------

After installing pyqi, you can easily stub (i.e., create templates for) new commands using ``pyqi make-command``. You can get usage information by calling::

	pyqi make-command -h

To create our sequence collection summarizer, we can start by stubbing a ``SequenceCollectionSummarizer`` class::

	pyqi make-command -n SequenceCollectionSummarizer --credits "Greg Caporaso" -o sequence_collection_summarizer.py

If you run this command locally, substituting your own name where applicable, you'll have a new file called ``sequence_collection_summarizer.py``, which will look roughly like the following::

	#!/usr/bin/python3
	from __future__ import division

	__credits__ = ["Greg Caporaso"]

	from pyqi.core.command import (Command, CommandIn, CommandOut, 
	    ParameterCollection)

	class SequenceCollectionSummarizer(Command):
	    BriefDescription = "FILL IN A 1 SENTENCE DESCRIPTION"
	    LongDescription = "GO INTO MORE DETAIL"
	    CommandIns = ParameterCollection([
	        CommandIn(Name='foo', DataType=str,
	                  Description='some required parameter', Required=True),
	        CommandIn(Name='bar', DataType=int,
	                  Description='some optional parameter', Required=False,
	                  Default=1)
	    ])

	    CommandOuts = ParameterCollection([
	        CommandOut(Name="result_1", DataType=str, Description="xyz"),
	        CommandOut(Name="result_2", DataType=str, Description="123"),
	    ])

	    def run(self, **kwargs):
	        # EXAMPLE:
	        # return {'result_1': kwargs['foo'] * kwargs['bar'],
	        #         'result_2': "Some output bits"}
	        raise NotImplementedError("You must define this method")

	CommandConstructor = SequenceCollectionSummarizer

Defining a command
------------------

There are several values that you'll need to fill in to define your command based on the stub that is created by ``make-command``. The first, which are the easiest, are ``BriefDescription`` and ``LongDescription``. ``BriefDescription`` should be a one sentence description of your command, and ``LongDescription`` should be a more detailed explanation (usually 2-3 sentences). These are used in auto-generated documentation.

Next, you'll need to define the parameters that your new command can take as input. Each of these parameters will be an instance of the ``pyqi.core.command.CommandIn`` class.

Our ``SequenceCollectionSummarizer`` command will take one required parameter and one optional parameter. The required parameter will be called ``seqs``, and will be a list (or some other iterable type) of tuples of (sequence identifier, sequence) pairs. For example::

	[('sequence1','ACCGTGGACCAA'),('sequence2','TGTGGA'), ...]

We'll also need to provide a description of this parameter (used in documentation), its type, and indicate that it is required. The final ``CommandIn`` definition should look like this::

	CommandIn(Name='seqs', DataType=list,
	          Description='sequences to be summarized', Required=True)

The optional parameter will be called ``suppress_length_summary``, and if passed will indicate that we don't want information on sequence lengths included in our output summary. The ``Parameter`` definition in this case should look like this::

	CommandIn(Name='suppress_length_summary', DataType=bool,
	          Description='do not generate summary information on the sequence lengths', 
	          Required=False, Default=False)

The only additional parameter that is passed here, relative to our ``seqs`` parameter, is ``Default``. Because this parameter isn't required, it's necessary to give it a default value here. All of the ``CommandIns`` should be included in a ``pyqi.core.command.ParameterCollection`` object (as in the stubbed file).

.. note:: There are a few restrictions on what ``Name`` can be set to for a ``Parameter`` (e.g., a ``CommandIn`` or a ``CommandOut``). It must be a `valid python identifier <http://docs.python.org/2/reference/lexical_analysis.html#identifiers>`_ (e.g., it cannot contain ``-`` characters or begin with a number) so the ``Command`` can be called with named options instead of passing a dict. ``Parameter`` names also must be unique for a ``Command``.

Next, you'll need to define the results that the ``Command`` generates as output. In this example, our ``Command`` will generate three results: the number of sequences, the minimum sequence length, and the maximum sequence length. Each of these results will be an instance of the ``pyqi.core.command.CommandOut`` class. We define the name of the result, its type, and a description. The final ``CommandOuts`` should look like this::

	CommandOut(Name='num_seqs', DataType=int, Description='number of sequences'),
	CommandOut(Name='min_length', DataType=int, Description='minimum sequence length'),
	CommandOut(Name='max_length', DataType=int, Description='maximum sequence length')

All of the ``CommandOuts`` should be included in a ``pyqi.core.command.ParameterCollection`` object (as in the stubbed file).

Next, we'll need to define what our ``Command`` will actually do. This is done in the ``run`` method, and all results are returned in a dictionary. The run method for our ``SequenceCollectionSummarizer`` object would look like the following::

	def run(self, **kwargs):
	    """
	    """
	    num_seqs = 0
	    sequence_lengths = []
	    for seq_id, seq in kwargs['seqs']:
	        num_seqs += 1
	        sequence_lengths.append(len(seq))
       
	    if kwargs['suppress_length_summary']:
	        min_length = None
	        max_length = None
	    else:
	        min_length = min(sequence_lengths)
	        max_length = max(sequence_lengths)
   
	    return {'num_seqs':num_seqs,
	            'min_length':min_length,
	            'max_length':max_length}

In practice, if your ``Command`` is more complex than our ``SequenceCollectionSummarizer`` (which it probably is), you can define other methods that are called by ``run``. These should likely be private methods.

.. note:: ``kwargs`` is validated prior to ``run`` being called, so that any required ``kwargs`` that are missing will raise an error, and any optional ``kwargs`` that are missing will have their default values filled in. To customize the validation that is performed on ``kwargs`` for your ``Command`` you should override ``_validate_kwargs`` in your ``Command``.

A complete example Command
--------------------------

The following illustrates a complete python file defining a new pyqi ``Command``::

	#!/usr/bin/python3
	from __future__ import division

	__credits__ = ["Greg Caporaso"]

	from pyqi.core.command import (Command, CommandIn, CommandOut, 
	    ParameterCollection)

	class SequenceCollectionSummarizer(Command):
	    BriefDescription = "Generate summary statistics on a collection of sequences."
	    LongDescription = "Provide the number of sequences, the minimum sequence length, and the maximum sequence length given a collection of sequences. Sequences should be provided as a list (or other iterable object) of tuples of (sequence id, sequence) pairs."

	    CommandIns = ParameterCollection([
	        CommandIn(Name='seqs', DataType=list,
	                  Description='sequences to be summarized', Required=True),
	        CommandIn(Name='suppress_length_summary', DataType=bool,
	                  Description='do not generate summary information on the sequence lengths',
	                  Required=False, Default=False)
	    ])

	    CommandOuts = ParameterCollection([
	        CommandOut(Name='num_seqs', DataType=int, Description='number of sequences'),
	        CommandOut(Name='min_length', DataType=int, Description='minimum sequence length'),
	        CommandOut(Name='max_length', DataType=int, Description='maximum sequence length')
	    ])

	    def run(self, **kwargs):
	        """
	        """
	        num_seqs = 0
	        sequence_lengths = []
	        for seq_id, seq in kwargs['seqs']:
	            num_seqs += 1
	            sequence_lengths.append(len(seq))
            
	        if kwargs['suppress_length_summary']:
	            min_length = None
	            max_length = None
	        else:
	            min_length = min(sequence_lengths)
	            max_length = max(sequence_lengths)
        
	        return {'num_seqs':num_seqs,
	                'min_length':min_length,
	                'max_length':max_length}

	CommandConstructor = SequenceCollectionSummarizer

At this stage you have defined a new command and its API. To access the API in the python terminal, you could do the following::

	# Import your new class
	>>> from sequence_collection_summarizer import SequenceCollectionSummarizer
	# Instantiate it
	>>> s = SequenceCollectionSummarizer()
	# Call the command, passing a list of (seq id, sequence) tuples as input. 
	# Note that because the parameters are provided as kwargs, you need to 
	# pass the parameter with a keyword.
	>>> r = s(seqs=[('sequence1','ACCGTGGACCAA'),('sequence2','TGTGGA')])
	# You can now see the full output of the command by inspecting the 
	# result dictionary.
	>>> r
	{'max_length': 12, 'min_length': 6, 'num_seqs': 2}
	# Alternatively, you can access each value independently, as with any dictionary.
	>>> print(r['num_seqs'])
	2
	>>> print(r['min_length'])
	6
	>>> print(r['max_length'])
	12
	# You can call this command again with different input.
	# For example, we can call the command again passing the
	# suppress_length_summary parameter.
	>>> r = s(seqs=[('sequence1','ACCGTGGACCAA'),('sequence2','TGTGGA')],suppress_length_summary=True)
	>>> r
	{'max_length': None, 'min_length': None, 'num_seqs': 2}