1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186
|
Notes on using Atlas libs with GNU Octave and GNU R
I. Overview
===========
As of the Debian releases 2.1.34-6 (for GNU Octave) and 1.3.0-3 (for GNU R),
both Octave and R can be used with Atlas, the Automatically Tuned Linear
Algebra Software, in order to obtain much faster linear algebra operations.
To make use of Atlas, Debian users need to install the Atlas libraries for
their given cpu architecture. Concretely, one of
libatlas3gf-base - Automatically Tuned Linear Algebra Software, generic
libatlas3gf-sse - Automatically Tuned Linear Algebra Software, SSE1
libatlas3gf-sse2 - Automatically Tuned Linear Algebra Software, SSE2
libatlas3gf-3dnow - Automatically Tuned Linear Algebra Software, 3dnow
must be installed. Here, 'base' provides generic libraries which run on all
platforms whereas 'sse' and 'sse2' stand for the SSE (v1 and v2) extension
available on most Intel cpus, and '3dnow' for those made by AMS.
The Atlas libraries can be loaded dynamically instead of the (non-optimised)
blas and lapack libraries against which both Octave and R are compiled.
Section III below briefly describes how Atlas libraries can be compiled for
your specific machine to further optimise performance.
II. Using the Atlas libraries
=============================
II.A New default behaviour with automatic loading of the Atlas libraries
------------------------------------------------------------------------
Users have to make no changes whatsoever as the Atlas library will be loaded
instead of the standard blas and lapack libraries.
An example such as the script below can be useful to test the performance.
II.B Old behaviour requiring LD_LIBRARY_PATH for Octave
-------------------------------------------------------
[ NB: This section has long been obsolete is kept for the historical record
only. ]
For Octave, use the variable LD_LIBRARY_PATH. On a computer with the
atlas2-base package:
$ LD_LIBRARY_PATH=/usr/lib/atlas octave2.1 -q
octave2.1:1> X=randn(1000,1000);t=cputime();Y=X'*X;cputime-t
ans = 7.9600
$ edd@homebud:~> octave2.1 -q
octave2.1:1> X=randn(1000,1000);t=cputime();Y=X'*X;cputime-t
ans = 61.520
For R version 1.3.0-4, the R_LD_LIBRARY_PATH variable has to be used, and its
value needs to be copied out of /usr/bin/R (or edited therein). For R version
1.3.1 or later this is done automatically in the R startup shell script. For
an Athlon machine, and with the explicit definition which is no longer needed
as of R 1.3.1, the example becomes
$ R_LD_LIBRARY_PATH=/usr/lib/R/bin:/usr/local/lib:/usr/X11R6/lib:/usr/lib/3dnow/atlas:/usr/lib:/usr/X11R6/lib:/usr/lib/gcc-lib/i386-linux/2.95.4:. R --vanilla -q
> mm <- matrix(rnorm(10^6), ncol = 10^3)
> system.time(crossprod(mm))
[1] 2.38 0.04 2.84 0.00 0.00
$ R --vanilla -q
> mm <- matrix(rnorm(10^6), ncol = 10^3)
> system.time(crossprod(mm))
[1] 28.28 0.08 33.54 0.00 0.00
>
Running such a small example is highly recommded to ascertain that the
libraries are indeed found, and to "prove" that the speed gain is real (and
significant) for problems of at least a medium size as the 1000x1000 examples
above.
Note that the example use "/usr/lib/atlas" for the atlas2-base package;
Athlon users should employ "/usr/lib/3dnow/atlas", Pentium III users should
employ "/usr/lib/xmm/atlas" and Pentium IV users should employ
"/usr/lib/26/atlas".
Lastly, it should be pointed out that it is probably worthwhile to locally
compile, and thereby optimise, the Atlas libraries if at least a moderately
intensive load is expected. This is described in the next section.
III. Locally compiling the Atlas libraries
==========================================
The Debian Atlas packages have been setup to allow for local recompilation of
the Atlas libraries. This way the behaviour will be tuned exactly to the
specific CPU rather than the broader class of CPUs. It has been reported that
this can increase performance by a further 12% on the examples above.
Detailed instructions are in /usr/share/doc/atlas2-base/README.debian.gz but
the process is essentially the following [ courtesy of Doug Bates ]
apt-get source libatlas3gf-base
cd atlas-$VERSION
fakeroot debian/rules custom
# wait for a *very* long time
dpkg -i ../libatlas3gf-base*.deb
IV. See also
=============
The Atlas packages have a very detailed README.Debian file which should be
consulted; it also details local recompilation. Sources and documentation for
Atlas are at http://www.netlib.org/atlas.
V. Acknowledgements
====================
Camm Maguire developed the scheme of overloading Atlas over the default blas
libraries and deserves all the credit. Many thanks to John Eaton for helping
debug some errors in the initial setup, and to Doug Bates for work on the R
package. Special thanks to Ben Collins for providing a patched ldconfig as
part of the libc6 package.
VI. Appendix
============
A simple test script such as the following
-----------------------------------------------------------------------------
#!/bin/sh
# remove Atlas et al
sudo dpkg --purge libatlas3gf-base > /dev/null
dpkg -l | grep "blas\|lapack\|atlas"
: ${SIZE=1500}
: ${REPS=20}
r_cmd="m<-matrix(rnorm(${SIZE}*${SIZE}),ncol=${SIZE}); cat(mean(replicate($REPS,system.time(crossprod(m))[1]),trim=0.1),'\n')"
echo ""
echo "GNU R: $r_cmd"
echo -n " Without Atlas: "
echo -n "$r_cmd" | R --vanilla --slave
sudo apt-get install libatlas3gf-base > /dev/null
echo -n " With Atlas base : "
echo -n "$r_cmd" | R --vanilla --slave
sudo dpkg --purge libatlas3gf-base > /dev/null
sudo apt-get install libatlas3gf-sse > /dev/null
echo -n " With Atlas sse : "
echo -n "$r_cmd" | R --vanilla --slave
sudo dpkg --purge libatlas3gf-sse > /dev/null
sudo apt-get install libatlas3gf-sse2 > /dev/null
echo -n " With Atlas sse2 : "
echo -n "$r_cmd" | R --vanilla --slave
sudo dpkg --purge libatlas3gf-sse2 > /dev/null
-----------------------------------------------------------------------------
can be used to test the setup. On a dual Xeon running Debian testing (x86, ie
32bit) in May 2009, I see the following output:
GNU R: m<-matrix(rnorm(1500*1500),ncol=1500); cat(mean(replicate(20,system.time(crossprod(m))[1]),trim=0.1),'\n')
Without Atlas: 4.146438
With Atlas base : 1.788125
With Atlas sse : 1.573625
With Atlas sse2 : 0.8465625
indicating about a three-fold increase in speed for the 'base' version and
even a five-fold increase for the sse2 version.
VI. Revision History
====================
Initial version
-- Dirk Eddelbuettel <edd@debian.org> Tue, 21 Aug 2001 21:37:15 -0500
First updated
-- Dirk Eddelbuettel <edd@debian.org> Sun, 11 Nov 2001 11:03:19 -0600
Major update
-- Dirk Eddelbuettel <edd@debian.org> Sun, 31 May 2009 21:04:47 -0500
|