- Block bidiagonal
- Explore having the various implementations of MatrixMult_* internally invoke an interface that can be swapped
  * Performance hit for small matrices?
  *

-------------------

- Unroll symmetric inverse by minor
- Unroll multiplication for square matrices
- Make QrUpdate more friendly and accessible

For some future Release
- Block SVD
- Block Hessenberg
- Adapt QR-tran into LQ
- Remove QRDecompositionHouseholder?
- Remove chol-block for dense with chol-block64?
  * reduce cache misses in invert and see if its faster
- Merge inner triangular solver code
- Require LinearSolverFactory to take in a matrix so it can figure out alg to use?
- Improve cholesky block inverse by taking advantage of symmetry
- Add a function for sorting eigenvalues.

----------------------------------------------------------

- Sparse
  - Look into supernodal methods. Apparently they are faster since they expoit dense matrix kernels

- LU
  - block

- Cholesky
  - unwrap for small matrices.  improve accuracy
  - improve stability

- Linear Solver
  * Iterative
  * Add condition(), use Hager's method? pg 132
  * Put this new condition into NormOps since it should be much faster

- SVD
  - Save up rotators, multiply against each other, then multiply against U and V
  - Divide and conquer algorithm
  - An implementation that just finds zero singular values
  - Bidiagonal decompositions have a lot of inefficient code

- Incremental SVD

- Eigen decomposition
  - Divide and conquer algorithm.

- Accurate version of symmetric eigenvalue for 2 by 2
  - SVD
  - SymmEig

- Fast Matrix Multiply

- hard code cholesky decomposition for small matrices
- hard code symmetric inverse for small matrices

- Matrix Multiplication:
  - Try a variant on mult_aux that does the vector mult up to block size then goes down a row.
  - Finish vector vector multiply
  - Code generator for matrix vector ops
  - Add matrix vector multiply
  - Auto switch to all of above in CommonOps