File: README

package info (click to toggle)
r-cran-irlba 2.3.3-1
  • links: PTS, VCS
  • area: main
  • in suites: bullseye, sid
  • size: 496 kB
  • sloc: ansic: 573; sh: 19; makefile: 2
file content (136 lines) | stat: -rw-r--r-- 5,619 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
irlba

Implicitly-restarted Lanczos methods for fast truncated singular value
decomposition of sparse and dense matrices (also referred to as partial SVD).
IRLBA stands for Augmented, Implicitly Restarted Lanczos Bidiagonalization
Algorithm. The package provides the following functions (see help on each for
details and examples).

* irlba() partial SVD function
* ssvd() l1-penalized matrix decomposition for sparse PCA (based on Shen and
  Huang's algorithm)
* prcomp_irlba()  principal components function similar to the prcomp
  function in stats package for computing the first few principal components
  of large matrices
* svdr() alternate partial SVD function based on randomized SVD (see also the
  https://cran.r-project.org/package=rsvd package by N. Benjamin Erichson for
  an alternative implementation)
* partial_eigen() a very limited partial eigenvalue decomposition for symmetric
  matrices (see the [RSpectra](https://cran.r-project.org/package=RSpectra)
  package for more comprehensive truncated eigenvalue decomposition)

Help documentation for each function includes extensive documentation and
examples. Also see the package vignette: vignette("irlba", package="irlba").
An overview web page is here: https://bwlewis.github.io/irlba/.


New in 2.3.3

- Several important reported problems with sparse matrices other than class
  dgCMatrix and prcomp_irlba, and many bug fixes contributed by Aaron Lun, see for
  instance: https://github.com/bwlewis/irlba/issues/47.

New in 2.3.2

- Fixed a regression in prcomp_irlba() discovered by Xiaojie Qiu, see
  https://github.com/bwlewis/irlba/issues/25, and other related problems
  reported in https://github.com/bwlewis/irlba/issues/32.
- Added rchk testing to pre-CRAN submission tests.
- Fixed a sign bug in ssvd() found by Alex Poliakov.

What's new in Version 2.3.1?

- Fixed an irlba() bug associated with centering (PCA),
  see https://github.com/bwlewis/irlba/issues/21.
- Fixed irlba() scaling to conform to scale,
  see https://github.com/bwlewis/irlba/issues/22.
- Improved prcomp_irlba() from a suggestion by N. Benjamin Erichson,
  see https://github.com/bwlewis/irlba/issues/23.
- Significanty changed/improved svdr() convergence criterion.
- Added a version of Shen and Huang's Sparse PCA/SVD L1-penalized
  matrix decomposition (ssvd()).
- Fixed valgrind errors.


Deprecated features

I will remove partial_eigen() in a future version. As its documentation
states, users are better off using the RSpectra package for eigenvalue
computations (although not generally for singular value computations).

The irlba mult argument is deprecated and will be removed in a future version.
We now recommend simply defining a custom class with a custom multiplication
operator.  The example below illustrates the old and new approaches:

library(irlba)
set.seed(1)
A <- matrix(rnorm(100), 10)

# ------------------ old way ----------------------------------------------
# A custom matrix multiplication function that scales the columns of A
# (cf the scale option). This function scales the columns of A to unit norm.
col_scale <- sqrt(apply(A, 2, crossprod))
mult <- function(x, y)
        {
          # check if x is a  vector
          if (is.vector(x))
          {
            return((x %*% y) / col_scale)
          }
          # else x is the matrix
          x %*% (y / col_scale)
        }
irlba(A, 3, mult=mult)$d
## [1] 1.820227 1.622988 1.067185

# Compare with:
irlba(A, 3, scale=col_scale)$d
## [1] 1.820227 1.622988 1.067185

# Compare with:
svd(sweep(A, 2, col_scale, FUN=`/`))$d[1:3]
## [1] 1.820227 1.622988 1.067185

# ------------------ new way ----------------------------------------------
setClass("scaled_matrix", contains="matrix", slots=c(scale="numeric"))
setMethod("%*%", signature(x="scaled_matrix", y="numeric"),
          function(x ,y) x@.Data %*% (y / x@scale))
setMethod("%*%", signature(x="numeric", y="scaled_matrix"),
          function(x ,y) (x %*% y@.Data) / y@scale)
a <- new("scaled_matrix", A, scale=col_scale)

irlba(a, 3)$d
## [1] 1.820227 1.622988 1.067185


We have learned that using R's existing S4 system is simpler, easier, and more
flexible than using custom arguments with idiosyncratic syntax and behavior.
We've even used the new approach to implement distributed parallel matrix
products for very large problems with amazingly little code.


Wishlist / help wanted...

- More Matrix classes supported in the fast code path
- Help improving the solver for singular values in tricky cases
  (basically, for ill-conditioned problems and especially for the smallest
  singular values); in general this may require a combination of more careful
  convergence criteria and use of harmonic Ritz values; Dmitriy Selivanov
  has proposed alternative convergence criteria in
  https://github.com/bwlewis/irlba/issues/29 for example.


References

* Baglama, James, and Lothar Reichel.
  "Augmented implicitly restarted Lanczos bidiagonalization methods."
  SIAM Journal on Scientific Computing 27.1 (2005): 19-42.
* Halko, Nathan, Per-Gunnar Martinsson, and Joel A. Tropp.
  "Finding structure with randomness: Stochastic algorithms for constructing
  approximate matrix decompositions." (2009).
* Shen, Haipeng, and Jianhua Z. Huang. "Sparse principal component analysis
  via regularized low rank matrix approximation." Journal of multivariate
  analysis 99.6 (2008): 1015-1034.
* Witten, Daniela M., Robert Tibshirani, and Trevor Hastie. "A penalized
  matrix decomposition, with applications to sparse principal components and
  canonical correlation analysis." Biostatistics 10.3 (2009): 515-534.