File: tutorial-channels.rst

package info (click to toggle)
orange3 3.40.0-1
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 15,908 kB
  • sloc: python: 162,745; ansic: 622; makefile: 322; sh: 93; cpp: 77
file content (185 lines) | stat: -rw-r--r-- 7,912 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
###################
Channels and Tokens
###################


Our data sampler widget was, regarding the channels, rather simple
and linear: the widget was designed to receive the token from one
widget, and send an output token to another widget. Just like in an
example schema below:

.. image:: images/schemawithdatasamplerB.png

There's quite a bit more to channels and management of tokens, and
we will overview most of the stuff you need to know to make your more
complex widgets in this section.

********************
Multi-Input Channels
********************

In essence, the basic idea about "multi-input" channels is that they can
be used to connect them with several output channels. That is, if a
widget supports such a channel, several widgets can feed their input
to that widget simultaneously.

Say we want to build a widget that takes a dataset and test
various predictive modeling techniques on it. A widget has to have an
input data channel, and this we know how to deal with from our
:doc:`previous <tutorial-settings>` lesson. But, somehow differently, we
want to connect any number of widgets which define learners to our
testing widget. Just like in a schema below, where three different
learners are used:

.. image:: images/learningcurve.png

We will here take a look at how we define the channels for a learning
curve widget, and how we manage its input tokens. But before we do it,
just in brief: learning curve is something that you can use to test
some machine learning algorithm in trying to see how its performance
depends on the size of the training set size. For this, one can draw a
smaller subset of data, learn the classifier, and test it on remaining
dataset. To do this in a just way (by Salzberg, 1997), we perform
k-fold cross validation but use only a proportion of the data for
training. The output widget should then look something like:

.. image:: images/learningcurve-output.png


Now back to channels and tokens. Input and output channels for our
widget are defined by

.. literalinclude:: orange-demo/orangedemo/OWLearningCurveA.py
   :start-after: start-snippet-1
   :end-before: end-snippet-1


Notice that everything is pretty much the same as it was with
widgets from previous lessons, the only difference being the additional argument
``multiple=True``, which says that this input can be connected to outputs of
multiple widgets.

Handlers of multiple-input signals must accept two arguments: the sent object
and the id of the sending widget.

.. literalinclude:: orange-demo/orangedemo/OWLearningCurveA.py
   :pyobject: OWLearningCurveA.set_learner

OK, this looks like one long and complicated function. But be
patient! Learning curve is not the simplest widget there is, so
there's some extra code in the function above to manage the
information it handles in the appropriate way. To understand the
signals, though, you should only understand the following. We store
the learners (objects that learn from data) in an
:class:`~collections.OrderedDict` :obj:`self.learners`. This dictionary
is a mapping of input *id* to the input value (the input learner itself).
The reason this is an :class:`~collections.OrderedDict` is that the order
of the input learners is important as we want to maintain a consistent column
order in the table view of the learning curve point scores.

The function above first checks if the channel `id` is already in
:obj:`self.learners` and if so either deletes the corresponding entry if
``learner`` is ``None`` (remember receiving a ``None`` value means the
link was removed/closed) or invalidates the cross validation results
and curve point for that channel id, marking for update in
:func:`~Orange.widgets.widget.OWWidget.handleNewSignals`. A similar case is
when we receive a learner for a new channel id.

Note that in this widget the evaluation (k-fold cross
validation) is carried out just once given the learner, dataset and
evaluation parameters, and scores are then derived from class
probability estimates as obtained from the evaluation procedure. Which
essentially means that switching from one to another scoring function
(and displaying the result in the table) takes only a split of a
second. To see the rest of the widget, check out
:download:`its code <orange-demo/orangedemo/OWLearningCurveA.py>`.


*****************************
Using Several Output Channels
*****************************

There's nothing new here, only that we need a widget that has
several output channels of the same type to illustrate the idea of the
default channels in the next section. For this purpose, we will modify
our sampling widget as defined in previous lessons such that it will
send out the sampled data to one channel, and all other data to
another channel. The corresponding channel definition of this widget
is

.. literalinclude:: orange-demo/orangedemo/OWDataSamplerC.py
   :start-after: start-snippet-1
   :end-before: end-snippet-1


We used this in the third incarnation of :download:`data sampler widget
<orange-demo/orangedemo/OWDataSamplerC.py>`, with essentially the only
other change in the code in the :func:`selection` and :func:`commit`
functions

.. literalinclude:: orange-demo/orangedemo/OWDataSamplerC.py
   :pyobject: OWDataSamplerC.selection

.. literalinclude:: orange-demo/orangedemo/OWDataSamplerC.py
   :pyobject: OWDataSamplerC.commit


If a widget that has multiple channels of the same type is
connected to a widget that accepts such tokens, Orange Canvas opens a
window asking the user to confirm which channels to connect. Hence,
if we have just connected *Data Sampler (C)* widget to a Data Table
widget in a schema below:

.. image:: images/datasampler-totable.png

we would get a following window querying users for information on
which channels to connect:

.. image:: images/datasampler-channelquerry.png


*************************************************************
Default Channels (When Using Input Channels of the Same Type)
*************************************************************

Now, let's say we want to extend our learning curve widget such
that it does the learning the same way as it used to, but can -
provided that such dataset is defined - test the
learners (always) on the same, external dataset. That is, besides the
training dataset, we need another channel of the same type but used
for training dataset. Notice, however, that most often we will only
provide the training dataset, so we would not like to be bothered (in
Orange Canvas) with the dialog which channel to connect to, as the
training dataset channel will be the default one.

When enlisting the input channel of the same type, the default
channels have a special flag in the channel specification list. So for
our new :download:`learning curve <orange-demo/orangedemo/OWLearningCurveB.py>`
widget, the channel specification is

.. literalinclude:: orange-demo/orangedemo/OWLearningCurveB.py
   :start-after: start-snippet-1
   :end-before: end-snippet-1

That is, the :obj:`Train Data` channel is a single-token
channel which is a default one (third parameter). Note that the flags can
be added (or OR-d) together so ``Default + Multiple`` is a valid flag.
To test how this works, connect a file widget to a learning curve widget
and - nothing will really happen:

.. image:: images/file-to-learningcurveb.png

That is, no window with a query on which channels to connect to will
open, as the default *"Train Data"* was selected.


*****************
Explicit Channels
*****************

Sometimes when a widget has multiple outputs of different types, some
of them should not be subject to this automatic default connection selection.
An example of this is in Orange's `Logistic Regression` widget that outputs
a supplementary 'Coefficients' data table. Such outputs can be marked with
and :attr:`~Orange.widgets.widget.Explicit` flag, which ensures they are never
selected for a default connection.