File: BadValues.pod

package info (click to toggle)
pdl 1%3A2.100-1
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 6,816 kB
sloc: perl: 22,587; ansic: 14,969; sh: 31; makefile: 30; sed: 6
file content (259 lines) | stat: -rw-r--r-- 9,537 bytes
=head1 NAME

PDL::BadValues - Discussion of bad value support in PDL

=head1 DESCRIPTION

=head2 What are bad values and why should I bother with them?

Sometimes it's useful to be able to specify a certain value is 'bad' or
'missing'; for example CCDs used in astronomy produce 2D images which are not
perfect since certain areas contain invalid data due to imperfections in the
detector.  Whilst PDL's powerful index
routines and all the complicated business with dataflow, slices, etc etc mean
that these regions can be ignored in processing, it's awkward to do. It would
be much easier to be able to say C<$c = $x + $y> and leave all the hassle to
the computer.

If you're not interested in this, then you may (rightly) be concerned
with how this affects the speed of PDL, since the overhead of checking for a
bad value at each operation can be large.
Because of this, the code has been written to be as fast as possible -
particularly when operating on ndarrays which do not contain bad values.
In fact, you should notice essentially no speed difference when working
with ndarrays which do not contain bad values.

You may also ask 'well, my computer supports IEEE NaN, so I already have this'.
They are different things; a bad value signifies "leave this out of
processing", whereas NaN is the result of a mathematically-invalid
operation.

Many routines, such as C<y=sin(x)>, will propagate NaN's
without the user having to code differently, but routines such as C<qsort>, or
finding the median of an array, need to be re-coded to handle bad values.
For floating-point datatypes, C<NaN> and C<Inf> can be used to flag bad
values, but by default
special values are used (L<Default bad values|/Default bad values>).

There is one default bad value for each datatype, but
as of PDL 2.040, you can have different bad values for separate ndarrays of the
same type.

You can use C<NaN> as the bad value for any floating-point type,
including complex.

=head2 A quick overview

 pdl> $x = sequence(4,3);
 pdl> p $x
 [
  [ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]
 ]
 pdl> $x = $x->setbadif( $x % 3 == 2 )
 pdl> p $x
 [
  [  0   1 BAD   3]
  [  4 BAD   6   7]
  [BAD   9  10 BAD]
 ]
 pdl> $x *= 3
 pdl> p $x
 [
  [  0   3 BAD   9]
  [ 12 BAD  18  21]
  [BAD  27  30 BAD]
 ]
 pdl> p $x->sum
 120

C<demo bad>
within L<perldl|PDL::perldl> gives a demonstration of some of the things
possible with bad values.  These are also available on PDL's web-site,
at F<http://pdl.perl.org/demos/>.  See L<PDL::Bad> for useful routines for working
with bad values and F<t/bad.t> to see them in action.

To find out if a routine supports bad values, use the C<badinfo> command in
L<perldl> or the C<-b> option to L<pdldoc>.

Each ndarray contains a flag - accessible via C<< $pdl->badflag >> - to say
whether there's any bad data present:

=over 4

=item *

If B<false/0>, which means there's no bad data here, the code supplied by the
C<Code> option to C<pp_def()> is executed.

=item *

If B<true/1>, then this says there I<MAY> be bad data in the ndarray, so use the
code in the C<BadCode> option (assuming that the C<pp_def()> for this routine
has been updated to have a BadCode key).
You get all the advantages of broadcasting, as with the C<Code> option,
but it will run slower since you are going to have to handle the presence of bad values.

=back

If you create an ndarray, it will have its bad-value flag set to 0. To change
this, use C<< $pdl->badflag($new_bad_status) >>, where C<$new_bad_status> can be 0 or 1.
When a routine creates an ndarray, its bad-value flag will depend on the input
ndarrays: the
bad-value flag will be set true if any of the input ndarrays have the bad-value flag.
To check that an ndarray really contains bad data, use the C<check_badflag> method.

I<NOTE>: propagation of the badflag

If you change the badflag of an ndarray, this change is propagated to all
the I<children> of an ndarray, so

   pdl> $x = zeroes(20,30);
   pdl> $y = $x->slice('0:10,0:10');
   pdl> $c = $y->slice(',(2)');
   pdl> print ">>c: ", $c->badflag, "\n";
   >>c: 0
   pdl> $x->badflag(1);
   pdl> print ">>c: ", $c->badflag, "\n";
   >>c: 1

This is also propagated to the parents of an ndarray, so

   pdl> print ">>a: ", $x->badflag, "\n";
   >>a: 1
   pdl> $c->badflag(0);
   pdl> print ">>a: ", $x->badflag, "\n";
   >>a: 0

There's also
the issue of what happens if you change the badvalue of an ndarray - should
these propagate to children/parents (yes) or whether you should only be
able to change the badvalue at the 'top' level - i.e. those ndarrays which do
not have parents.

The C<orig_badvalue()> method returns the compile-time value for a given
datatype. It works on ndarrays, PDL::Type objects, and numbers - eg

  $pdl->orig_badvalue(), byte->orig_badvalue(), and orig_badvalue(4).

To get the current bad value, use the C<badvalue()> method - it has the same
syntax as C<orig_badvalue()>.

To change the current bad value, supply the new number to badvalue - eg

  $pdl->badvalue(2.3), byte->badvalue(2), badvalue(5,-3e34).

I<Note>: the value is silently converted to the correct C type, and
returned - i.e. C<< byte->badvalue(-26) >> returns 230 on my Linux machine.

Note that changes to the bad value are I<NOT> propagated to previously-created
ndarrays - they will still have the bad flag set, but suddenly the elements
that were bad will become 'good', but containing the old bad value.
See discussion below.

=head2 Bad values and boolean operators

For those boolean operators in L<PDL::Ops>, evaluation
on a bad value returns the bad value. This:

 $mask = $img > $thresh;

correctly propagates bad values. This will omit any bad values, but
return a bad value if there are no good ones:

 $bool = any( $img > $thresh );

As of 2.077, a bad value used as a boolean will throw an exception.

When using one of the 'projection' functions in L<PDL::Ufunc> - such as
L<orover|PDL::Ufunc/orover> -
bad values are skipped over (see the documentation of these
functions for the current handling of the case when
all elements are bad).

=head1 IMPLEMENTATION DETAILS

A new flag has been added to the state of an ndarray - C<PDL_BADVAL>. If unset, then
the ndarray does not contain bad values, and so all the support code can be
ignored. If set, it does not guarantee that bad values are present, just that
they should be checked for.

The C<pdl_trans> structure has been extended to include an integer value,
C<bvalflag>, which acts as a switch to tell the code whether to handle bad values
or not. This value is set if any of the input ndarrays have their C<PDL_BADVAL>
flag set (although this code can be replaced by setting C<FindBadStateCode> in
pp_def).

The per-ndarray badvalue is a C<PDL_Anyval>, a struct with an
integer-valued type (like an ndarray), and a union-value that can
hold one of any value that PDL can handle. As of 2.086, this value
must always have the same type as the ndarray, and this is enforced
in the PDL core.

=head2 Default bad values

The default bad values
are now stored in a structure within the Core PDL structure
- C<PDL.bvals> (eg F<lib/PDL/Core/pdlcore.h>); see also
C<typedef pdl_badvals> in F<lib/PDL/Core/pdl.h.PL> and the
BOOT code of F<lib/PDL/Core.xs> where the values are initialised to
(hopefully) sensible values.
See L<PDL::Bad/badvalue> and L<PDL::Bad/orig_badvalue> for read/write routines to the values.

The default/original bad values are set to the C type's maximum (unsigned
integers) or the minimum (floating-point and signed integers).

=head2 How do I change a routine to handle bad values?

See L<PDL::PP/BadCode> and L<PDL::PP/HandleBad>.

If you have a routine that you want to be able to use as in-place, look
at the routines in F<bad.pd> (or F<ops.pd>)
which use the C<in-place> option to see how the
bad flag is propagated to children using the C<xxxBadStatusCode> options.
I decided not to automate this as rules would be a
little complex, since not every in-place op will need to propagate the
badflag (eg unary functions).

This all means that you can change (as of 2.073)

   Code => '$a() = $b() + $c();'

to

   Code => 'PDL_IF_BAD(if ( $ISBAD(b()) || $ISBAD(c()) ) {
                 $SETBAD(a());
               } else,) {
                 $a() = $b() + $c();
               }'

which means you only need to maintain the logic in one place, rather
than having a separate C<BadCode>. C<PDL_IF_BAD> evaluates to its
first argument if the C<badvalue> flag is true, or its second if not.
PDL::PP::PDLCode will then create code something like

   if ( __trans->bvalflag ) {
        broadcastloop over Code with PDL_IF_BAD set appropriately
   } else {
        broadcastloop over Code with PDL_IF_BAD set appropriately
   }

=head1 WHAT ABOUT DOCUMENTATION?

One of the strengths of PDL is its on-line documentation. The aim is to use
this system to provide information on how/if a routine supports bad values:
in many cases C<pp_def()> contains all the information anyway, so the
function-writer doesn't need to do anything at all! For the cases when this is
not sufficient, there's the C<BadDoc> option. For code written at
the Perl level - i.e. in a .pm file - use the C<=for bad> pod directive.

This information will be available via man/pod2man/html documentation. It's also
accessible from the C<perldl> shell - using the C<badinfo> command - and the C<pdldoc>
shell command - using the C<-b> option.

=head1 AUTHOR

Copyright (C) Doug Burke (djburke@cpan.org), 2000, 2006.

The per-ndarray bad value support is by Heiko Klein (2006).