A fast Householder bidiagonalization algorithm

Please note that this is not regarding the Lanczos or the PLS bidiagonalization algorithm. This is about the Householder bidiagonalization algorithm. The difference is huge. For example, there are no y‑variables in the Householder decomposition. It is a decomposition of X that results in a factorization with a bidiagonal matrix in the middle, similar to singular value decomposition (SVD, which has a diagonal matrix in the middle). The Householder bidiagonalization decomposition has excellent numerical properties and that is why I wanted to make a fast algorithm and implement it in C followed by compiling it so it could be used as a binary file in MATLAB. That said…

MATLAB has built-in functions for matrix factorizations. Examples are SVD, QR and LU decomposition, but there was no algorithm for the Householder bidiagonalization. It is quite straightforward to implement the Householder bidiagonalization MATLAB [1], especially if someone did exactly this before so you can copy it (smile). The implementation of Householder factorization in MATLAB by Hansen [2] is very good because it calculates the outputs necessary, nothing more, nothing less, thus minimizing the amount of calculations to be performed. That MATLAB code can be downloaded from here.

Although this was a very elegant implementation of Householder factorization, it ran very slowly in MATLAB because of its loop-in-loop structure. Furthermore, it was limited to handle only tall (portrait) matrices. Even if a simple transposition would make a workaround for this, I thought it would be nice to let the algorithm take care of any size of matrix, tall, wide or square — any shape. In this way, it would be easier to use for chemometric applications where all sizes of matrices do appear. Another point that could be of theoretical interest as well as for compatibility would be to allow for both full Householder decomposition as well as economy decomposition, making the algorithm behave very similarly to the factorizations that are delivered with MATLAB, for example QR, LU and SVD.

The work was done by extending the MATLAB-code by Hansen [2] to get the desired behavior, while at the same time keeping the elegant properties of the original implementation by Hansen. An attempt to convert the Householder code using the MATLAB compiler for embedded systems resulted in terrible and inefficient code to my surprise. The MATLAB code was therefore ported to C‑code manually.

Implementing Householder decomposition in C is not is rocket science, and it is probably not the most beautiful coding according to C‑purists, but I think it’s efficient. All of this has already been done in the LAPACK implementions, so it is definitely also a reinvention of the wheel, BUT to get an algorithm that is good at doing exactly the Householder bidiagonalization and nothing more and nothing less was what I wanted. After 16 days of irregular work last year (2010), the algorithm was in place. The resulting C‑code is presented below and can be downloaded from here too as a text file. If you prefer just to put precompiled files into MATLAB, you may prefer do download only this file instead. EDIT: It now contains precompiled files for Windows 32 and 64 bit operating systems as well as 32 and 64 bit operating systems for Mac OS, added 2014-03-19.

It may seem strange that I started looking into the Householder bidiagonaliztion algorithmm. The reason is that I have found a new way to make accurate PLS-calibrations, especially suitable for loads of data. This fast Householder algorithms, together with the ones that are already implemented in MATLAB like SVD, QR and LU, the will provide new possibilities. I think it will be especially useful for process data.

References
1. L. Elden, “Algorithms for regularization of ill-conditioned least-squares problems”, BIT 17 (1977), 134–145.
2. http://www.math.wsu.edu/faculty/tsat/files/matlab/contributed/bidiag.m

/*
BIDIAG Bidiagonalization of a matrix.
B = bidiag(A);
[U,B,V] = bidiag(A);
[U,B] = bidiag(A);
the m‑times‑n matrix A:
A = U*B*V’ ,
where B is an upper bidiagonal square matrix, and U and
V have orthogonal columns. Economy size matrices are obatained by adding a zeros for the second input argument.

L. Elden, “Algorithms for regularization of ill-
conditioned least-squares problems”, BIT 17 (1977), 134–145.

Revision history:
o For Matalb, m >= n:
Per Christian Hansen, UNI‑C, 03/11/92.
o For Matlab, m >= n and m < n:
Martin Andersson, Foss Japan, 2011-08-11
o C‑code compiled Matlab mex-file:
Martin Andersson, Foss Japan, 2011-09-06
o C‑code for full/economy size done:
Martin Andersson, Foss Japan, 2011-09-10
*/

#include “stdafx.h”
#include “mex.h”
#include “limits.h”
#include “mexFunction.h”
#include “math.h”

// This is for Windows
#pragma comment(lib, “libmx.lib”)
#pragma comment(lib, “libmat.lib”)
#pragma comment(lib, “libmex.lib”)
//#pragma comment(lib, “libmatlb.lib”)

// This is also for Windows
BOOL APIENTRY DllMain( HANDLE hModule, DWORD ul_reason_for_call, LPVOID lpReserved)
{
switch (ul_reason_for_call)
{
case DLL_PROCESS_ATTACH:
case DLL_THREAD_ATTACH:
case DLL_THREAD_DETACH:
case DLL_PROCESS_DETACH:
break;
}
return TRUE;
}

void initializeU(int m, int n, mxArray *pU);
void initializeV(int m, int n, mxArray *pV);
void copy_vector_to_col_in_matrix(int row, int col, mxArray *pA, mxArray *col_pv);
void copy_vector_to_row_in_matrix(int row, int col, mxArray *pA, mxArray *row_pv);
void copy_col_in_matrix_to_vector(int row, int col, mxArray *col_pv, mxArray *pA);
void copy_row_in_matrix_to_vector(int row, int col, mxArray *row_pv, mxArray *pA);
double norm(int indStart, mxArray *pv);

void col_gen_hh(int row, int col, mxArray *pv, double& x1, double& beta, mxArray *pA);
void row_gen_hh(int row, int col, mxArray *pv, double& x1, double& beta, mxArray *pA);
void col_app_hh(int row, int col, mxArray *pA, double& beta, mxArray *pv);
void row_app_hh(int row, int col, mxArray *pA, double& beta, mxArray *pv);

void fliupud_fliplr_Btransp(mxArray *pB);
void fliplr(int lastCol, mxArray *pA);

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
// Check the inputs and the ouputs
bool EconomySizeDecomposition = (nlhs == 1);
if(nlhs > 3) mexErrMsgTxt(“Too many output arguments.”);
if(nrhs > 2) mexErrMsgTxt(“Too many input arguments.”);
if(nrhs ==2)
if (*mxGetPr(prhs[1]) == 0)
EconomySizeDecomposition = true;
else
mexErrMsgTxt(“Use bidiag(X,0) for economy size decomposition.”);

//Create scalars, vectors and matrices
double *origA, *A, *B, *betaU, *betaV, *col_v, *row_v;
int m, n, ind;
mxArray *pU, *pB, *pV, *col_pv, *row_pv, *pbetaU, *pbetaV, *pA;
origA = mxGetPr(prhs[0]);
int M = mxGetM(prhs[0]);
int N = mxGetN(prhs[0]);
if (M >= N)
{
m = M;
n = N;
if (EconomySizeDecomposition)
pA = mxCreateDoubleMatrix(m,n,mxREAL);
else
pA = mxCreateDoubleMatrix(m,m,mxREAL);
A = mxGetPr(pA);
memcpy(A, origA, m*n*sizeof(double));
}
else
{
m = N;
n = M;
if (EconomySizeDecomposition)
pA = mxCreateDoubleMatrix(m,n,mxREAL);
else
pA = mxCreateDoubleMatrix(m,m,mxREAL);
A = mxGetPr(pA);
int i = 0;
for (int r = 0; r < m; r++)
{
ind = r;
for (int c = 0; c < n; c++)
{
A[ind]= origA[i++];
ind += m;
}
}
}
int indB =0;
col_pv = mxCreateDoubleMatrix(m,1,mxREAL);
pbetaU = mxCreateDoubleMatrix(m,1,mxREAL);
if (EconomySizeDecomposition)
{
row_pv = mxCreateDoubleMatrix(n,1,mxREAL);
pbetaV = mxCreateDoubleMatrix(n,1,mxREAL);
pB = mxCreateDoubleMatrix(n,n,mxREAL);
}
else
{
row_pv = mxCreateDoubleMatrix(m,1,mxREAL);
pbetaV = mxCreateDoubleMatrix(m,1,mxREAL);
pB = mxCreateDoubleMatrix(M,N,mxREAL);
}
B = mxGetPr(pB);
betaU = mxGetPr(pbetaU);
betaV = mxGetPr(pbetaV);
col_v = mxGetPr(col_pv);
row_v = mxGetPr(row_pv);

//Initialization done.

//Start the main looping
for (int k = 0; k < n; k++)
{
col_gen_hh(k, k, col_pv, B[indB], betaU[k], pA);
if (nlhs>1) copy_vector_to_col_in_matrix(k, k, pA, col_pv);
col_app_hh(k, k+1, pA, betaU[k], col_pv);
if (k < (n-2))
{
indB += mxGetM(pB);
row_gen_hh(k, k+1, row_pv, B[indB++], betaV[k], pA);
if (nlhs>1) copy_vector_to_row_in_matrix(k, k+1, pA, row_pv);
row_app_hh(k+1, k+1, pA, betaV[k], row_pv);
}
else if (k == (n-2))
{
indB += mxGetM(pB);
B[indB++] = A[(m+1)*(n-1)-1];
}
}
if (((nlhs > 1) && (M >= N)) || ((nlhs == 3) && (M < N)))
{
if (EconomySizeDecomposition)
{
pU = mxCreateDoubleMatrix(m,n,mxREAL);
initializeU(m, n, pU);
}
else
{
pU = mxCreateDoubleMatrix(m,m,mxREAL);
initializeU(m, m, pU);
}

for (int k = n-1; k >= 0 ; k–)
{
copy_col_in_matrix_to_vector(k, k, col_pv, pA);
col_app_hh(k, k, pU, betaU[k], col_pv);
}
}
if (((nlhs == 3) && (M >= N)) || ((nlhs > 1) && (M < N)))
{
pV = mxCreateDoubleMatrix(n,n,mxREAL);
initializeV(m, n, pV);
for (int k = n-3; k >= 0 ; k–)
{
copy_row_in_matrix_to_vector(k, k, row_pv, pA);
col_app_hh(k+1, k, pV, betaV[k], row_pv);
}
}

//Main looping done, assign outputs and flip matrices if M < N.
if (M >= N)
{
if (nlhs <= 1)
plhs[0] = pB;
else
{
plhs[0] = pU;
plhs[1] = pB;
if (nlhs == 3) plhs[2] = pV;
}
}
else
{
fliupud_fliplr_Btransp(pB);
if (nlhs <= 1)
plhs[0] = pB;
else
{
fliplr(n,pV);
plhs[0] = pV;
plhs[1] = pB;
if (nlhs == 3)
{
fliplr(n,pU);
plhs[2] = pU;
}
}
}
}

void initializeU(int m, int n, mxArray *pU)
{
double *U;
U = mxGetPr(pU);
int kLim = m*n;
for (int k = 0; k < kLim; k += (m+1))
U[k] = 1.0;
}

void initializeV(int m, int n, mxArray *pV)
{
double *V;
V = mxGetPr(pV);
int kLim = n*n;
for (int k = 0; k < kLim; k += (n+1))
V[k] = 1.0;
}

double norm(int indStart, mxArray *pv)
{
int i,m;
double *v;
double aux = 0.0;
m = mxGetM(pv);
v = mxGetPr(pv);
for (i = indStart; i < m; i++)
aux += v[i]*v[i];
return sqrt(aux);
}

void copy_vector_to_col_in_matrix(int fromRow, int col, mxArray *pA, mxArray *col_pv)
{
double *v, *A;
v = mxGetPr(col_pv);
A = mxGetPr(pA);
int m = mxGetM(pA);
int indA = fromRow+col*m;
for (int i = fromRow; i < m; i++)
A[indA++] = v[i];
}

void copy_vector_to_row_in_matrix(int row, int fromCol, mxArray *pA, mxArray *row_pv)
{
double *v, *A;
v = mxGetPr(row_pv);
A = mxGetPr(pA);
int m = mxGetM(pA);
int n = mxGetN(pA);
int indv = fromCol;
int indAstart = row + fromCol*m;
int indAlimit = m*n;
for (int indA = indAstart; indA < indAlimit; indA += m)
A[indA] = v[indv++];
}

void copy_col_in_matrix_to_vector(int fromRow, int col, mxArray *col_pv, mxArray *pA)
{
double *v, *A;
v = mxGetPr(col_pv);
A = mxGetPr(pA);
int m = mxGetM(pA);
int indA = fromRow+col*m;
for (int i = fromRow; i < m; i++)
v[i] = A[indA++];
}

void copy_row_in_matrix_to_vector(int row, int fromCol, mxArray *row_pv, mxArray *pA)
{
double *v, *A;
v = mxGetPr(row_pv);
A = mxGetPr(pA);
int m = mxGetM(pA);
int n = mxGetN(pA);
int indv = fromCol;
int indAstart = row + fromCol*m;
int indAlimit = m*n;
for (int indA = indAstart; indA < indAlimit; indA += m)
v[indv++] = A[indA];
}

void col_gen_hh(int row, int col, mxArray *col_pv, double& x1, double& beta, mxArray *pA)
{
double *v, *A;
double alpha;
v = mxGetPr(col_pv);
A = mxGetPr(pA);
int m = mxGetM(pA);
int n = mxGetN(pA);

int r = row;
int indAstart = row + col*m;
int indAlimit = (col+1)*m;
for (int indA = indAstart; indA < indAlimit; indA++)
v[r++] = A[indA];

alpha = norm(row,col_pv);
if (alpha == 0) beta = 0.0;
else beta = 1/(alpha*(alpha + abs(v[row])));

if (v[row] < 0) alpha = -alpha;
v[row] = v[row] + alpha;
x1 = -alpha;
}

void row_gen_hh(int row, int col, mxArray *row_pv, double& x1, double& beta, mxArray *pA)
{
double *v, *A;
double alpha;
v = mxGetPr(row_pv);
A = mxGetPr(pA);
int n = mxGetN(pA);
int m = mxGetM(pA);

int indv = col;
int indAstart = row + col*m;
int indAlimit = m*n;
for (int indA = indAstart; indA < indAlimit; indA += m)
v[indv++] = A[indA];

alpha = norm(col, row_pv);
if (alpha == 0) beta = 0.0;
else beta = 1/(alpha*(alpha + abs(v[col])));

if (v[col] < 0) alpha = -alpha;
v[col] = v[col] + alpha;
x1 = -alpha;
}

void col_app_hh(int row, int col, mxArray *pA, double& beta, mxArray *pv)
{
int r, c, m, n, indA;
double *A, *v, d;

m = mxGetM(pA);
n = mxGetN(pA);
A = mxGetPr(pA);
v = mxGetPr(pv);

for (c = col; c < n; c++)
{
d = 0.0;
indA = row + c*m;
for (r = row; r < m; r++)
d = d + v[r]*A[indA++];
indA = row + c*m;
for (r = row; r < m; r++)
{
A[indA] = A[indA] - beta*v[r]*d;
indA++;
}
}
}

void row_app_hh(int row, int col, mxArray *pA, double& beta, mxArray *pv)
{
int r, c, m, n;
double *A, *v, d;
int indAstart;

m = mxGetM(pA);
n = mxGetN(pA);
int indAlimit = m*n;
A = mxGetPr(pA);
v = mxGetPr(pv);

for (r = row; r < m; r++)
{
d = 0.0;
c = col;
indAstart = r + col*m;
for (int indA = indAstart; indA < indAlimit; indA += m)
d = d + A[indA]*v[c++];
c = col;
indAstart = r + col*m;
for (int indA = indAstart; indA < indAlimit; indA += m)
A[indA] = A[indA] - beta*v[c++]*d;
}
}

void fliupud_fliplr_Btransp(mxArray *pB)
{
int m = mxGetM(pB);
int n = mxGetN(pB);
double *B = mxGetPr(pB);
double tmp;
int halfway = min(m,n)/2;
int indD1 = 0;
int indD2;
if (m > n)
indD2 = (m + 1)*(n - 1);
else
indD2 = m*m-1;
for (int i = 0; i < halfway; i++)
{
tmp = B[indD1];
B[indD1] = B[indD2];
B[indD2] = tmp;
if (i < halfway)
{
indD1 += m;
indD2–;
tmp = B[indD1];
B[indD1] = B[indD2];
B[indD2] = tmp;
indD1++;
indD2 -= m;
}
}
}

void fliplr(int lastCol, mxArray *pA)
{
int m = mxGetM(pA);
int n = mxGetN(pA);
double *A = mxGetPr(pA);
double tmp;
int halfway = m*(lastCol/2);
int rightInd = m*(lastCol-1);
for (int i = 0; i < halfway; i++)
{
tmp = A[i];
A[i] = A[rightInd];
A[rightInd] = tmp;
rightInd ++;
if ((rightInd % m) == 0) rightInd = rightInd - 2*m;
}
}

Leave a Comment Cancel Reply