1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281
|
@c DO NOT EDIT! Generated automatically by munge-texi.pl.
@c Copyright (C) 2012-2015 Jordi GutiƩrrez Hermoso
@c
@c This file is part of Octave.
@c
@c Octave is free software; you can redistribute it and/or modify it
@c under the terms of the GNU General Public License as published by the
@c Free Software Foundation; either version 3 of the License, or (at
@c your option) any later version.
@c
@c Octave is distributed in the hope that it will be useful, but WITHOUT
@c ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
@c FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
@c for more details.
@c
@c You should have received a copy of the GNU General Public License
@c along with Octave; see the file COPYING. If not, see
@c <http://www.gnu.org/licenses/>.
@node Vectorization and Faster Code Execution
@chapter Vectorization and Faster Code Execution
@cindex vectorization
@cindex vectorize
Vectorization is a programming technique that uses vector operations
instead of element-by-element loop-based operations. Besides frequently
producing more succinct Octave code, vectorization also allows for better
optimization in the subsequent implementation. The optimizations may occur
either in Octave's own Fortran, C, or C++ internal implementation, or even at a
lower level depending on the compiler and external numerical libraries used to
build Octave. The ultimate goal is to make use of your hardware's vector
instructions if possible or to perform other optimizations in software.
Vectorization is not a concept unique to Octave, but it is particularly
important because Octave is a matrix-oriented language. Vectorized
Octave code will see a dramatic speed up (10X--100X) in most cases.
This chapter discusses vectorization and other techniques for writing faster
code.
@menu
* Basic Vectorization:: Basic techniques for code optimization
* Broadcasting:: Broadcasting operations
* Function Application:: Applying functions to arrays, cells, and structs
* Accumulation:: Accumulation functions
* JIT Compiler:: Just-In-Time Compiler for loops
* Miscellaneous Techniques:: Other techniques for speeding up code
* Examples::
@end menu
@node Basic Vectorization
@section Basic Vectorization
To a very good first approximation, the goal in vectorization is to
write code that avoids loops and uses whole-array operations. As a
trivial example, consider
@example
@group
for i = 1:n
for j = 1:m
c(i,j) = a(i,j) + b(i,j);
endfor
endfor
@end group
@end example
@noindent
compared to the much simpler
@example
c = a + b;
@end example
@noindent
This isn't merely easier to write; it is also internally much easier to
optimize. Octave delegates this operation to an underlying
implementation which, among other optimizations, may use special vector
hardware instructions or could conceivably even perform the additions in
parallel. In general, if the code is vectorized, the underlying
implementation has more freedom about the assumptions it can make
in order to achieve faster execution.
This is especially important for loops with "cheap" bodies. Often it
suffices to vectorize just the innermost loop to get acceptable
performance. A general rule of thumb is that the "order" of the
vectorized body should be greater or equal to the "order" of the
enclosing loop.
As a less trivial example, instead of
@example
@group
for i = 1:n-1
a(i) = b(i+1) - b(i);
endfor
@end group
@end example
@noindent
write
@example
a = b(2:n) - b(1:n-1);
@end example
This shows an important general concept about using arrays for indexing
instead of looping over an index variable. @xref{Index Expressions}.
Also use boolean indexing generously. If a condition needs to be tested,
this condition can also be written as a boolean index. For instance,
instead of
@example
@group
for i = 1:n
if (a(i) > 5)
a(i) -= 20
endif
endfor
@end group
@end example
@noindent
write
@example
a(a>5) -= 20;
@end example
@noindent
which exploits the fact that @code{a > 5} produces a boolean index.
Use elementwise vector operators whenever possible to avoid looping
(operators like @code{.*} and @code{.^}). @xref{Arithmetic Ops}. For
simple inline functions, the @code{vectorize} function can do this
automatically.
@c vectorize libinterp/octave-value/ov-fcn-inline.cc
@anchor{XREFvectorize}
@deftypefn {Built-in Function} {} vectorize (@var{fun})
Create a vectorized version of the inline function @var{fun} by replacing
all occurrences of @code{*}, @code{/}, etc., with @code{.*}, @code{./}, etc.
This may be useful, for example, when using inline functions with numerical
integration or optimization where a vector-valued function is expected.
@example
@group
fcn = vectorize (inline ("x^2 - 1"))
@result{} fcn = f(x) = x.^2 - 1
quadv (fcn, 0, 3)
@result{} 6
@end group
@end example
@seealso{@ref{XREFinline,,inline}, @ref{XREFformula,,formula}, @ref{XREFargnames,,argnames}}
@end deftypefn
Also exploit broadcasting in these elementwise operators both to avoid
looping and unnecessary intermediate memory allocations.
@xref{Broadcasting}.
Use built-in and library functions if possible. Built-in and compiled
functions are very fast. Even with an m-file library function, chances
are good that it is already optimized, or will be optimized more in a
future release.
For instance, even better than
@example
a = b(2:n) - b(1:n-1);
@end example
@noindent
is
@example
a = diff (b);
@end example
Most Octave functions are written with vector and array arguments in
mind. If you find yourself writing a loop with a very simple operation,
chances are that such a function already exists. The following functions
occur frequently in vectorized code:
@itemize @bullet
@item
Index manipulation
@itemize
@item
find
@item
sub2ind
@item
ind2sub
@item
sort
@item
unique
@item
lookup
@item
ifelse / merge
@end itemize
@item
Repetition
@itemize
@item
repmat
@item
repelems
@end itemize
@item
Vectorized arithmetic
@itemize
@item
sum
@item
prod
@item
cumsum
@item
cumprod
@item
sumsq
@item
diff
@item
dot
@item
cummax
@item
cummin
@end itemize
@item
Shape of higher dimensional arrays
@itemize
@item
reshape
@item
resize
@item
permute
@item
squeeze
@item
deal
@end itemize
@end itemize
@node Broadcasting
@section Broadcasting
@cindex broadcast
@cindex broadcasting
@cindex BSX
@cindex recycling
@cindex SIMD
Broadcasting refers to how Octave binary operators and functions behave
when their matrix or array operands or arguments differ in size. Since
version 3.6.0, Octave now automatically broadcasts vectors, matrices,
and arrays when using elementwise binary operators and functions.
Broadly speaking, smaller arrays are ``broadcast'' across the larger
one, until they have a compatible shape. The rule is that corresponding
array dimensions must either
@enumerate
@item
be equal, or
@item
one of them must be 1.
@end enumerate
@noindent
In case all dimensions are equal, no broadcasting occurs and ordinary
element-by-element arithmetic takes place. For arrays of higher
dimensions, if the number of dimensions isn't the same, then missing
trailing dimensions are treated as 1. When one of the dimensions is 1,
the array with that singleton dimension gets copied along that dimension
until it matches the dimension of the other array. For example, consider
@example
@group
x = [1 2 3;
4 5 6;
7 8 9];
y = [10 20 30];
x + y
@end group
@end example
@noindent
Without broadcasting, @code{x + y} would be an error because the dimensions
do not agree. However, with broadcasting it is as if the following
operation were performed:
@example
@group
x = [1 2 3
4 5 6
7 8 9];
y = [10 20 30
10 20 30
10 20 30];
x + y
@result{} 11 22 33
14 25 36
17 28 39
@end group
@end example
@noindent
That is, the smaller array of size @code{[1 3]} gets copied along the
singleton dimension (the number of rows) until it is @code{[3 3]}. No
actual copying takes place, however. The internal implementation reuses
elements along the necessary dimension in order to achieve the desired
effect without copying in memory.
Both arrays can be broadcast across each other, for example, all
pairwise differences of the elements of a vector with itself:
@example
@group
y - y'
@result{} 0 10 20
-10 0 10
-20 -10 0
@end group
@end example
@noindent
Here the vectors of size @code{[1 3]} and @code{[3 1]} both get
broadcast into matrices of size @code{[3 3]} before ordinary matrix
subtraction takes place.
A special case of broadcasting that may be familiar is when all
dimensions of the array being broadcast are 1, i.e., the array is a
scalar. Thus for example, operations like @code{x - 42} and @code{max
(x, 2)} are basic examples of broadcasting.
For a higher-dimensional example, suppose @code{img} is an RGB image of
size @code{[m n 3]} and we wish to multiply each color by a different
scalar. The following code accomplishes this with broadcasting,
@example
img .*= permute ([0.8, 0.9, 1.2], [1, 3, 2]);
@end example
@noindent
Note the usage of permute to match the dimensions of the
@code{[0.8, 0.9, 1.2]} vector with @code{img}.
For functions that are not written with broadcasting semantics,
@code{bsxfun} can be useful for coercing them to broadcast.
@c bsxfun libinterp/corefcn/bsxfun.cc
@anchor{XREFbsxfun}
@deftypefn {Built-in Function} {} bsxfun (@var{f}, @var{A}, @var{B})
The binary singleton expansion function performs broadcasting,
that is, it applies a binary function @var{f} element-by-element to two
array arguments @var{A} and @var{B}, and expands as necessary
singleton dimensions in either input argument.
@var{f} is a function handle, inline function, or string containing the name
of the function to evaluate. The function @var{f} must be capable of
accepting two column-vector arguments of equal length, or one column vector
argument and a scalar.
The dimensions of @var{A} and @var{B} must be equal or singleton. The
singleton dimensions of the arrays will be expanded to the same
dimensionality as the other array.
@seealso{@ref{XREFarrayfun,,arrayfun}, @ref{XREFcellfun,,cellfun}}
@end deftypefn
Broadcasting is only applied if either of the two broadcasting
conditions hold. As usual, however, broadcasting does not apply when two
dimensions differ and neither is 1:
@example
@group
x = [1 2 3
4 5 6];
y = [10 20
30 40];
x + y
@end group
@end example
@noindent
This will produce an error about nonconformant arguments.
Besides common arithmetic operations, several functions of two arguments
also broadcast. The full list of functions and operators that broadcast
is
@example
plus + .+
minus - .-
times .*
rdivide ./
ldivide .\
power .^ .**
lt <
le <=
eq ==
gt >
ge >=
ne != ~=
and &
or |
atan2
hypot
max
min
mod
rem
xor
+= -= .+= .-= .*= ./= .\= .^= .**= &= |=
@end example
Beware of resorting to broadcasting if a simpler operation will suffice.
For matrices @var{a} and @var{b}, consider the following:
@example
@var{c} = sum (permute (@var{a}, [1, 3, 2]) .* permute (@var{b}, [3, 2, 1]), 3);
@end example
@noindent
This operation broadcasts the two matrices with permuted dimensions
across each other during elementwise multiplication in order to obtain a
larger 3-D array, and this array is then summed along the third dimension.
A moment of thought will prove that this operation is simply the much
faster ordinary matrix multiplication, @code{@var{c} = @var{a}*@var{b};}.
A note on terminology: ``broadcasting'' is the term popularized by the
Numpy numerical environment in the Python programming language. In other
programming languages and environments, broadcasting may also be known
as @emph{binary singleton expansion} (BSX, in @sc{matlab}, and the
origin of the name of the @code{bsxfun} function), @emph{recycling} (R
programming language), @emph{single-instruction multiple data} (SIMD),
or @emph{replication}.
@subsection Broadcasting and Legacy Code
The new broadcasting semantics almost never affect code that worked
in previous versions of Octave. Consequently, all code inherited from
@sc{matlab} that worked in previous versions of Octave should still work
without change in Octave. The only exception is code such as
@example
@group
try
c = a.*b;
catch
c = a.*a;
end_try_catch
@end group
@end example
@noindent
that may have relied on matrices of different size producing an error.
Because such operation is now valid Octave syntax, this will no longer
produce an error. Instead, the following code should be used:
@example
@group
if (isequal (size (a), size (b)))
c = a .* b;
else
c = a .* a;
endif
@end group
@end example
@node Function Application
@section Function Application
@cindex map
@cindex apply
@cindex function application
As a general rule, functions should already be written with matrix
arguments in mind and should consider whole matrix operations in a
vectorized manner. Sometimes, writing functions in this way appears
difficult or impossible for various reasons. For those situations,
Octave provides facilities for applying a function to each element of an
array, cell, or struct.
@c arrayfun libinterp/corefcn/cellfun.cc
@anchor{XREFarrayfun}
@deftypefn {Function File} {} arrayfun (@var{func}, @var{A})
@deftypefnx {Function File} {@var{x} =} arrayfun (@var{func}, @var{A})
@deftypefnx {Function File} {@var{x} =} arrayfun (@var{func}, @var{A}, @var{b}, @dots{})
@deftypefnx {Function File} {[@var{x}, @var{y}, @dots{}] =} arrayfun (@var{func}, @var{A}, @dots{})
@deftypefnx {Function File} {} arrayfun (@dots{}, "UniformOutput", @var{val})
@deftypefnx {Function File} {} arrayfun (@dots{}, "ErrorHandler", @var{errfunc})
Execute a function on each element of an array.
This is useful for functions that do not accept array arguments. If the
function does accept array arguments it is better to call the function
directly.
The first input argument @var{func} can be a string, a function
handle, an inline function, or an anonymous function. The input
argument @var{A} can be a logic array, a numeric array, a string
array, a structure array, or a cell array. By a call of the function
@command{arrayfun} all elements of @var{A} are passed on to the named
function @var{func} individually.
The named function can also take more than two input arguments, with
the input arguments given as third input argument @var{b}, fourth
input argument @var{c}, @dots{} If given more than one array input
argument then all input arguments must have the same sizes, for
example:
@example
@group
arrayfun (@@atan2, [1, 0], [0, 1])
@result{} [ 1.5708 0.0000 ]
@end group
@end example
If the parameter @var{val} after a further string input argument
@qcode{"UniformOutput"} is set @code{true} (the default), then the named
function @var{func} must return a single element which then will be
concatenated into the return value and is of type matrix. Otherwise,
if that parameter is set to @code{false}, then the outputs are
concatenated in a cell array. For example:
@example
@group
arrayfun (@@(x,y) x:y, "abc", "def", "UniformOutput", false)
@result{}
@{
[1,1] = abcd
[1,2] = bcde
[1,3] = cdef
@}
@end group
@end example
If more than one output arguments are given then the named function
must return the number of return values that also are expected, for
example:
@example
@group
[A, B, C] = arrayfun (@@find, [10; 0], "UniformOutput", false)
@result{}
A =
@{
[1,1] = 1
[2,1] = [](0x0)
@}
B =
@{
[1,1] = 1
[2,1] = [](0x0)
@}
C =
@{
[1,1] = 10
[2,1] = [](0x0)
@}
@end group
@end example
If the parameter @var{errfunc} after a further string input argument
@qcode{"ErrorHandler"} is another string, a function handle, an inline
function, or an anonymous function, then @var{errfunc} defines a
function to call in the case that @var{func} generates an error.
The definition of the function must be of the form
@example
function [@dots{}] = errfunc (@var{s}, @dots{})
@end example
@noindent
where there is an additional input argument to @var{errfunc}
relative to @var{func}, given by @var{s}. This is a structure with
the elements @qcode{"identifier"}, @qcode{"message"}, and
@qcode{"index"} giving, respectively, the error identifier, the error
message, and the index of the array elements that caused the error. The
size of the output argument of @var{errfunc} must have the same size as the
output argument of @var{func}, otherwise a real error is thrown. For
example:
@example
@group
function y = ferr (s, x), y = "MyString"; endfunction
arrayfun (@@str2num, [1234],
"UniformOutput", false, "ErrorHandler", @@ferr)
@result{}
@{
[1,1] = MyString
@}
@end group
@end example
@seealso{@ref{XREFspfun,,spfun}, @ref{XREFcellfun,,cellfun}, @ref{XREFstructfun,,structfun}}
@end deftypefn
@c spfun scripts/sparse/spfun.m
@anchor{XREFspfun}
@deftypefn {Function File} {@var{y} =} spfun (@var{f}, @var{S})
Compute @code{f(@var{S})} for the nonzero values of @var{S}.
This results in a sparse matrix with the same structure as @var{S}. The
function @var{f} can be passed as a string, a function handle, or an
inline function.
@seealso{@ref{XREFarrayfun,,arrayfun}, @ref{XREFcellfun,,cellfun}, @ref{XREFstructfun,,structfun}}
@end deftypefn
@c cellfun libinterp/corefcn/cellfun.cc
@anchor{XREFcellfun}
@deftypefn {Built-in Function} {} cellfun (@var{name}, @var{C})
@deftypefnx {Built-in Function} {} cellfun ("size", @var{C}, @var{k})
@deftypefnx {Built-in Function} {} cellfun ("isclass", @var{C}, @var{class})
@deftypefnx {Built-in Function} {} cellfun (@var{func}, @var{C})
@deftypefnx {Built-in Function} {} cellfun (@var{func}, @var{C}, @var{D})
@deftypefnx {Built-in Function} {[@var{a}, @dots{}] =} cellfun (@dots{})
@deftypefnx {Built-in Function} {} cellfun (@dots{}, "ErrorHandler", @var{errfunc})
@deftypefnx {Built-in Function} {} cellfun (@dots{}, "UniformOutput", @var{val})
Evaluate the function named @var{name} on the elements of the cell array
@var{C}.
Elements in @var{C} are passed on to the named function individually. The
function @var{name} can be one of the functions
@table @code
@item isempty
Return 1 for empty elements.
@item islogical
Return 1 for logical elements.
@item isnumeric
Return 1 for numeric elements.
@item isreal
Return 1 for real elements.
@item length
Return a vector of the lengths of cell elements.
@item ndims
Return the number of dimensions of each element.
@item numel
@itemx prodofsize
Return the number of elements contained within each cell element. The
number is the product of the dimensions of the object at each cell element.
@item size
Return the size along the @var{k}-th dimension.
@item isclass
Return 1 for elements of @var{class}.
@end table
Additionally, @code{cellfun} accepts an arbitrary function @var{func}
in the form of an inline function, function handle, or the name of a
function (in a character string). The function can take one or more
arguments, with the inputs arguments given by @var{C}, @var{D}, etc.
Equally the function can return one or more output arguments. For example:
@example
@group
cellfun ("atan2", @{1, 0@}, @{0, 1@})
@result{} [ 1.57080 0.00000 ]
@end group
@end example
The number of output arguments of @code{cellfun} matches the number of output
arguments of the function. The outputs of the function will be collected
into the output arguments of @code{cellfun} like this:
@example
@group
function [a, b] = twoouts (x)
a = x;
b = x*x;
endfunction
[aa, bb] = cellfun (@@twoouts, @{1, 2, 3@})
@result{}
aa =
1 2 3
bb =
1 4 9
@end group
@end example
Note that per default the output argument(s) are arrays of the same size as
the input arguments. Input arguments that are singleton (1x1) cells will be
automatically expanded to the size of the other arguments.
If the parameter @qcode{"UniformOutput"} is set to true (the default),
then the function must return scalars which will be concatenated into the
return array(s). If @qcode{"UniformOutput"} is false, the outputs are
concatenated into a cell array (or cell arrays). For example:
@example
@group
cellfun ("tolower", @{"Foo", "Bar", "FooBar"@},
"UniformOutput", false)
@result{} @{"foo", "bar", "foobar"@}
@end group
@end example
Given the parameter @qcode{"ErrorHandler"}, then @var{errfunc} defines a
function to call in case @var{func} generates an error. The form of the
function is
@example
function [@dots{}] = errfunc (@var{s}, @dots{})
@end example
@noindent
where there is an additional input argument to @var{errfunc} relative to
@var{func}, given by @var{s}. This is a structure with the elements
@qcode{"identifier"}, @qcode{"message"} and @qcode{"index"}, giving
respectively the error identifier, the error message, and the index into the
input arguments of the element that caused the error. For example:
@example
@group
function y = foo (s, x), y = NaN; endfunction
cellfun ("factorial", @{-1,2@}, "ErrorHandler", @@foo)
@result{} [NaN 2]
@end group
@end example
Use @code{cellfun} intelligently. The @code{cellfun} function is a
useful tool for avoiding loops. It is often used with anonymous
function handles; however, calling an anonymous function involves an
overhead quite comparable to the overhead of an m-file function.
Passing a handle to a built-in function is faster, because the
interpreter is not involved in the internal loop. For example:
@example
@group
a = @{@dots{}@}
v = cellfun (@@(x) det (x), a); # compute determinants
v = cellfun (@@det, a); # faster
@end group
@end example
@seealso{@ref{XREFarrayfun,,arrayfun}, @ref{XREFstructfun,,structfun}, @ref{XREFspfun,,spfun}}
@end deftypefn
@c structfun scripts/general/structfun.m
@anchor{XREFstructfun}
@deftypefn {Function File} {} structfun (@var{func}, @var{S})
@deftypefnx {Function File} {[@var{A}, @dots{}] =} structfun (@dots{})
@deftypefnx {Function File} {} structfun (@dots{}, "ErrorHandler", @var{errfunc})
@deftypefnx {Function File} {} structfun (@dots{}, "UniformOutput", @var{val})
Evaluate the function named @var{name} on the fields of the structure
@var{S}. The fields of @var{S} are passed to the function @var{func}
individually.
@code{structfun} accepts an arbitrary function @var{func} in the form of an
inline function, function handle, or the name of a function (in a character
string). In the case of a character string argument, the function must
accept a single argument named @var{x}, and it must return a string value.
If the function returns more than one argument, they are returned as
separate output variables.
If the parameter @qcode{"UniformOutput"} is set to true (the default), then
the function must return a single element which will be concatenated into
the return value. If @qcode{"UniformOutput"} is false, the outputs are
placed into a structure with the same fieldnames as the input structure.
@example
@group
s.name1 = "John Smith";
s.name2 = "Jill Jones";
structfun (@@(x) regexp (x, '(\w+)$', "matches")@{1@}, s,
"UniformOutput", false)
@result{}
@{
name1 = Smith
name2 = Jones
@}
@end group
@end example
Given the parameter @qcode{"ErrorHandler"}, @var{errfunc} defines a function
to call in case @var{func} generates an error. The form of the function is
@example
function [@dots{}] = errfunc (@var{se}, @dots{})
@end example
@noindent
where there is an additional input argument to @var{errfunc} relative to
@var{func}, given by @nospell{@var{se}}. This is a structure with the
elements @qcode{"identifier"}, @qcode{"message"} and @qcode{"index"},
giving respectively the error identifier, the error message, and the index
into the input arguments of the element that caused the error. For an
example on how to use an error handler, @pxref{XREFcellfun,,cellfun}.
@seealso{@ref{XREFcellfun,,cellfun}, @ref{XREFarrayfun,,arrayfun}, @ref{XREFspfun,,spfun}}
@end deftypefn
Consistent with earlier advice, seek to use Octave built-in functions whenever
possible for the best performance. This advice applies especially to the four
functions above. For example, when adding two arrays together
element-by-element one could use a handle to the built-in addition function
@code{@@plus} or define an anonymous function @code{@@(x,y) x + y}. But, the
anonymous function is 60% slower than the first method.
@xref{Operator Overloading}, for a list of basic functions which might be used
in place of anonymous ones.
@node Accumulation
@section Accumulation
Whenever it's possible to categorize according to indices the elements
of an array when performing a computation, accumulation functions can be
useful.
@c accumarray scripts/general/accumarray.m
@anchor{XREFaccumarray}
@deftypefn {Function File} {} accumarray (@var{subs}, @var{vals}, @var{sz}, @var{func}, @var{fillval}, @var{issparse})
@deftypefnx {Function File} {} accumarray (@var{subs}, @var{vals}, @dots{})
Create an array by accumulating the elements of a vector into the
positions defined by their subscripts.
The subscripts are defined by the rows of the matrix @var{subs} and the
values by @var{vals}. Each row of @var{subs} corresponds to one of the
values in @var{vals}. If @var{vals} is a scalar, it will be used for each
of the row of @var{subs}. If @var{subs} is a cell array of vectors, all
vectors must be of the same length, and the subscripts in the @var{k}th
vector must correspond to the @var{k}th dimension of the result.
The size of the matrix will be determined by the subscripts
themselves. However, if @var{sz} is defined it determines the matrix
size. The length of @var{sz} must correspond to the number of columns
in @var{subs}. An exception is if @var{subs} has only one column, in
which case @var{sz} may be the dimensions of a vector and the
subscripts of @var{subs} are taken as the indices into it.
The default action of @code{accumarray} is to sum the elements with
the same subscripts. This behavior can be modified by defining the
@var{func} function. This should be a function or function handle
that accepts a column vector and returns a scalar. The result of the
function should not depend on the order of the subscripts.
The elements of the returned array that have no subscripts associated
with them are set to zero. Defining @var{fillval} to some other value
allows these values to be defined. This behavior changes, however,
for certain values of @var{func}. If @var{func} is @code{min}
(respectively, @code{max}) then the result will be filled with the
minimum (respectively, maximum) integer if @var{vals} is of integral
type, logical false (respectively, logical true) if @var{vals} is of
logical type, zero if @var{fillval} is zero and all values are
non-positive (respectively, non-negative), and NaN otherwise.
By default @code{accumarray} returns a full matrix. If
@var{issparse} is logically true, then a sparse matrix is returned
instead.
The following @code{accumarray} example constructs a frequency table
that in the first column counts how many occurrences each number in
the second column has, taken from the vector @var{x}. Note the usage
of @code{unique} for assigning to all repeated elements of @var{x}
the same index (@pxref{XREFunique,,unique}).
@example
@group
@var{x} = [91, 92, 90, 92, 90, 89, 91, 89, 90, 100, 100, 100];
[@var{u}, ~, @var{j}] = unique (@var{x});
[accumarray(@var{j}', 1), @var{u}']
@result{} 2 89
3 90
2 91
2 92
3 100
@end group
@end example
Another example, where the result is a multi-dimensional 3-D array and
the default value (zero) appears in the output:
@example
@group
accumarray ([1, 1, 1;
2, 1, 2;
2, 3, 2;
2, 1, 2;
2, 3, 2], 101:105)
@result{} ans(:,:,1) = [101, 0, 0; 0, 0, 0]
@result{} ans(:,:,2) = [0, 0, 0; 206, 0, 208]
@end group
@end example
The sparse option can be used as an alternative to the @code{sparse}
constructor (@pxref{XREFsparse,,sparse}). Thus
@example
sparse (@var{i}, @var{j}, @var{sv})
@end example
@noindent
can be written with @code{accumarray} as
@example
accumarray ([@var{i}, @var{j}], @var{sv}', [], [], 0, true)
@end example
@noindent
For repeated indices, @code{sparse} adds the corresponding value. To
take the minimum instead, use @code{min} as an accumulator function:
@example
accumarray ([@var{i}, @var{j}], @var{sv}', [], @@min, 0, true)
@end example
The complexity of accumarray in general for the non-sparse case is
generally O(M+N), where N is the number of subscripts and M is the
maximum subscript (linearized in multi-dimensional case). If
@var{func} is one of @code{@@sum} (default), @code{@@max},
@code{@@min} or @code{@@(x) @{x@}}, an optimized code path is used.
Note that for general reduction function the interpreter overhead can
play a major part and it may be more efficient to do multiple
accumarray calls and compute the results in a vectorized manner.
@seealso{@ref{XREFaccumdim,,accumdim}, @ref{XREFunique,,unique}, @ref{XREFsparse,,sparse}}
@end deftypefn
@c accumdim scripts/general/accumdim.m
@anchor{XREFaccumdim}
@deftypefn {Function File} {} accumdim (@var{subs}, @var{vals}, @var{dim}, @var{n}, @var{func}, @var{fillval})
Create an array by accumulating the slices of an array into the
positions defined by their subscripts along a specified dimension.
The subscripts are defined by the index vector @var{subs}.
The dimension is specified by @var{dim}. If not given, it defaults
to the first non-singleton dimension. The length of @var{subs} must
be equal to @code{size (@var{vals}, @var{dim})}.
The extent of the result matrix in the working dimension will be
determined by the subscripts themselves. However, if @var{n} is
defined it determines this extent.
The default action of @code{accumdim} is to sum the subarrays with the
same subscripts. This behavior can be modified by defining the
@var{func} function. This should be a function or function handle
that accepts an array and a dimension, and reduces the array along
this dimension. As a special exception, the built-in @code{min} and
@code{max} functions can be used directly, and @code{accumdim}
accounts for the middle empty argument that is used in their calling.
The slices of the returned array that have no subscripts associated
with them are set to zero. Defining @var{fillval} to some other
value allows these values to be defined.
An example of the use of @code{accumdim} is:
@example
@group
accumdim ([1, 2, 1, 2, 1], [ 7, -10, 4;
-5, -12, 8;
-12, 2, 8;
-10, 9, -3;
-5, -3, -13])
@result{} [-10,-11,-1;-15,-3,5]
@end group
@end example
@seealso{@ref{XREFaccumarray,,accumarray}}
@end deftypefn
@node JIT Compiler
@section JIT Compiler
Vectorization is the preferred technique for eliminating loops and speeding up
code. Nevertheless, it is not always possible to replace every loop. In such
situations it may be worth trying Octave's @strong{experimental} Just-In-Time
(JIT) compiler.
A JIT compiler works by analyzing the body of a loop, translating the Octave
statements into another language, compiling the new code segment into an
executable, and then running the executable and collecting any results. The
process is not simple and there is a significant amount of work to perform for
each step. It can still make sense, however, if the number of loop iterations
is large. Because Octave is an interpreted language every time through a
loop Octave must parse the statements in the loop body before executing them.
With a JIT compiler this is done just once when the body is translated to
another language.
The JIT compiler is a very new feature in Octave and not all valid Octave
statements can currently be accelerated. However, if no other technique
is available it may be worth benchmarking the code with JIT enabled. The
function @code{jit_enable} is used to turn compilation on or off. The
function @code{jit_startcnt} sets the threshold for acceleration. Loops
with iteration counts above @code{jit_startcnt} will be accelerated. The
functions @code{jit_failcnt} and @code{debug_jit} are not likely to be of use
to anyone not working directly on the implementation of the JIT compiler.
@c jit_enable libinterp/corefcn/pt-jit.cc
@anchor{XREFjit_enable}
@deftypefn {Built-in Function} {@var{val} =} jit_enable ()
@deftypefnx {Built-in Function} {@var{old_val} =} jit_enable (@var{new_val})
@deftypefnx {Built-in Function} {} jit_enable (@var{new_val}, "local")
Query or set the internal variable that enables Octave's JIT compiler.
When called from inside a function with the @qcode{"local"} option, the
variable is changed locally for the function and any subroutines it calls.
The original variable value is restored when exiting the function.
@seealso{@ref{XREFjit_startcnt,,jit_startcnt}, @ref{XREFdebug_jit,,debug_jit}}
@end deftypefn
@c jit_startcnt libinterp/corefcn/pt-jit.cc
@anchor{XREFjit_startcnt}
@deftypefn {Built-in Function} {@var{val} =} jit_startcnt ()
@deftypefnx {Built-in Function} {@var{old_val} =} jit_startcnt (@var{new_val})
@deftypefnx {Built-in Function} {} jit_startcnt (@var{new_val}, "local")
Query or set the internal variable that determines whether JIT compilation
will take place for a specific loop.
Because compilation is a costly operation it does not make sense to employ
JIT when the loop count is low. By default only loops with greater than
1000 iterations will be accelerated.
When called from inside a function with the @qcode{"local"} option, the
variable is changed locally for the function and any subroutines it calls.
The original variable value is restored when exiting the function.
@seealso{@ref{XREFjit_enable,,jit_enable}, @ref{XREFjit_failcnt,,jit_failcnt}, @ref{XREFdebug_jit,,debug_jit}}
@end deftypefn
@c jit_failcnt libinterp/corefcn/pt-jit.cc
@anchor{XREFjit_failcnt}
@deftypefn {Built-in Function} {@var{val} =} jit_failcnt ()
@deftypefnx {Built-in Function} {@var{old_val} =} jit_failcnt (@var{new_val})
@deftypefnx {Built-in Function} {} jit_failcnt (@var{new_val}, "local")
Query or set the internal variable that counts the number of JIT fail
exceptions for Octave's JIT compiler.
When called from inside a function with the @qcode{"local"} option, the
variable is changed locally for the function and any subroutines it calls.
The original variable value is restored when exiting the function.
@seealso{@ref{XREFjit_enable,,jit_enable}, @ref{XREFjit_startcnt,,jit_startcnt}, @ref{XREFdebug_jit,,debug_jit}}
@end deftypefn
@c debug_jit libinterp/corefcn/pt-jit.cc
@anchor{XREFdebug_jit}
@deftypefn {Built-in Function} {@var{val} =} debug_jit ()
@deftypefnx {Built-in Function} {@var{old_val} =} debug_jit (@var{new_val})
@deftypefnx {Built-in Function} {} debug_jit (@var{new_val}, "local")
Query or set the internal variable that determines whether
debugging/tracing is enabled for Octave's JIT compiler.
When called from inside a function with the @qcode{"local"} option, the
variable is changed locally for the function and any subroutines it calls.
The original variable value is restored when exiting the function.
@seealso{@ref{XREFjit_enable,,jit_enable}, @ref{XREFjit_startcnt,,jit_startcnt}}
@end deftypefn
@node Miscellaneous Techniques
@section Miscellaneous Techniques
@cindex execution speed
@cindex speedups
@cindex optimization
Here are some other ways of improving the execution speed of Octave
programs.
@itemize @bullet
@item Avoid computing costly intermediate results multiple times.
Octave currently does not eliminate common subexpressions. Also, certain
internal computation results are cached for variables. For instance, if
a matrix variable is used multiple times as an index, checking the
indices (and internal conversion to integers) is only done once.
@item Be aware of lazy copies (copy-on-write).
@cindex copy-on-write
@cindex COW
@cindex memory management
When a copy of an object is created, the data is not immediately copied, but
rather shared. The actual copying is postponed until the copied data needs to
be modified. For example:
@example
@group
a = zeros (1000); # create a 1000x1000 matrix
b = a; # no copying done here
b(1) = 1; # copying done here
@end group
@end example
Lazy copying applies to whole Octave objects such as matrices, cells,
struct, and also individual cell or struct elements (not array
elements).
Additionally, index expressions also use lazy copying when Octave can
determine that the indexed portion is contiguous in memory. For example:
@example
@group
a = zeros (1000); # create a 1000x1000 matrix
b = a(:,10:100); # no copying done here
b = a(10:100,:); # copying done here
@end group
@end example
This applies to arrays (matrices), cell arrays, and structs indexed
using @samp{()}. Index expressions generating comma-separated lists can also
benefit from shallow copying in some cases. In particular, when @var{a} is a
struct array, expressions like @code{@{a.x@}, @{a(:,2).x@}} will use lazy
copying, so that data can be shared between a struct array and a cell array.
Most indexing expressions do not live longer than their parent
objects. In rare cases, however, a lazily copied slice outlasts its
parent, in which case it becomes orphaned, still occupying unnecessarily
more memory than needed. To provide a remedy working in most real cases,
Octave checks for orphaned lazy slices at certain situations, when a
value is stored into a "permanent" location, such as a named variable or
cell or struct element, and possibly economizes them. For example:
@example
@group
a = zeros (1000); # create a 1000x1000 matrix
b = a(:,10:100); # lazy slice
a = []; # the original "a" array is still allocated
c@{1@} = b; # b is reallocated at this point
@end group
@end example
@item Avoid deep recursion.
Function calls to m-file functions carry a relatively significant overhead, so
rewriting a recursion as a loop often helps. Also, note that the maximum level
of recursion is limited.
@item Avoid resizing matrices unnecessarily.
When building a single result matrix from a series of calculations, set the
size of the result matrix first, then insert values into it. Write
@example
@group
result = zeros (big_n, big_m)
for i = over:and_over
ridx = @dots{}
cidx = @dots{}
result(ridx, cidx) = new_value ();
endfor
@end group
@end example
@noindent
instead of
@example
@group
result = [];
for i = ever:and_ever
result = [ result, new_value() ];
endfor
@end group
@end example
Sometimes the number of items can not be computed in advance, and
stack-like operations are needed. When elements are being repeatedly
inserted or removed from the end of an array, Octave detects it as stack
usage and attempts to use a smarter memory management strategy by
pre-allocating the array in bigger chunks. This strategy is also applied
to cell and struct arrays.
@example
@group
a = [];
while (condition)
@dots{}
a(end+1) = value; # "push" operation
@dots{}
a(end) = []; # "pop" operation
@dots{}
endwhile
@end group
@end example
@item Avoid calling @code{eval} or @code{feval} excessively.
Parsing input or looking up the name of a function in the symbol table are
relatively expensive operations.
If you are using @code{eval} merely as an exception handling mechanism, and not
because you need to execute some arbitrary text, use the @code{try}
statement instead. @xref{The try Statement}.
@item Use @code{ignore_function_time_stamp} when appropriate.
If you are calling lots of functions, and none of them will need to change
during your run, set the variable @code{ignore_function_time_stamp} to
@qcode{"all"}. This will stop Octave from checking the time stamp of a function
file to see if it has been updated while the program is being run.
@end itemize
@node Examples
@section Examples
The following are examples of vectorization questions asked by actual
users of Octave and their solutions.
@c FIXME: We need a lot more examples here.
@itemize @bullet
@item
For a vector @code{A}, the following loop
@example
@group
n = length (A);
B = zeros (n, 2);
for i = 1:length (A)
## this will be two columns, the first is the difference and
## the second the mean of the two elements used for the diff.
B(i,:) = [A(i+1)-A(i), (A(i+1) + A(i))/2];
endfor
@end group
@end example
@noindent
can be turned into the following one-liner:
@example
B = [diff(A)(:), 0.5*(A(1:end-1)+A(2:end))(:)]
@end example
Note the usage of colon indexing to flatten an intermediate result into
a column vector. This is a common vectorization trick.
@end itemize
|