A fast Householder bidiagonalization algorithm

Please note that this is not regard­ing the Lanc­zos or the PLS bidi­ag­o­nal­iza­tion algo­rithm. This is about the House­holder bidi­ag­o­nal­iza­tion algo­rithm. The dif­fer­ence is huge. For exam­ple, there are no y‑variables in the House­holder decom­po­si­tion. It is a decom­po­si­tion of X that results in a fac­tor­iza­tion with a bidi­ag­o­nal matrix in the mid­dle, sim­i­lar to sin­gu­lar value decom­po­si­tion (SVD, which has a diag­o­nal matrix in the mid­dle). The House­holder bidi­ag­o­nal­iza­tion decom­po­si­tion has excel­lent numer­i­cal prop­er­ties and that is why I wanted to make a fast algo­rithm and imple­ment it in C fol­lowed by com­pil­ing it so it could be used as a binary file in MATLAB. That said…

MATLAB has built-in func­tions for matrix fac­tor­iza­tions. Exam­ples are SVD, QR and LU decom­po­si­tion, but there was no algo­rithm for the House­holder bidi­ag­o­nal­iza­tion. It is quite straight­for­ward to imple­ment the House­holder bidi­ag­o­nal­iza­tion MATLAB [1], espe­cially if some­one did exactly this before so you can copy it (smile). The imple­men­ta­tion of House­holder fac­tor­iza­tion in MATLAB by Hansen [2] is very good because it cal­cu­lates the out­puts nec­es­sary, noth­ing more, noth­ing less, thus min­i­miz­ing the amount of cal­cu­la­tions to be per­formed. That MATLAB code can be down­loaded from here.

Although this was a very ele­gant imple­men­ta­tion of House­holder fac­tor­iza­tion, it ran very slowly in MATLAB because of its loop-in-loop struc­ture. Fur­ther­more, it was lim­ited to han­dle only tall (por­trait) matri­ces. Even if a sim­ple trans­po­si­tion would make a workaround for this, I thought it would be nice to let the algo­rithm take care of any size of matrix, tall, wide or square — any shape. In this way, it would be eas­ier to use for chemo­met­ric appli­ca­tions where all sizes of matri­ces do appear. Another point that could be of the­o­ret­i­cal inter­est as well as for com­pat­i­bil­ity would be to allow for both full House­holder decom­po­si­tion as well as econ­omy decom­po­si­tion, mak­ing the algo­rithm behave very sim­i­larly to the fac­tor­iza­tions that are deliv­ered with MATLAB, for exam­ple QR, LU and SVD.

The work was done by extend­ing the MAT­LAB-code by Hansen [2] to get the desired behav­ior, while at the same time keep­ing the ele­gant prop­er­ties of the orig­i­nal imple­men­ta­tion by Hansen. An attempt to con­vert the House­holder code using the MATLAB com­piler for embed­ded sys­tems resulted in ter­ri­ble and inef­fi­cient code to my sur­prise. The MATLAB code was there­fore ported to C‑code manually.

Imple­ment­ing House­holder decom­po­si­tion in C is not is rocket sci­ence, and it is prob­a­bly not the most beau­ti­ful cod­ing accord­ing to C‑purists, but I think it’s effi­cient. All of this has already been done in the LAPACK imple­men­tions, so it is def­i­nitely also a rein­ven­tion of the wheel, BUT to get an algo­rithm that is good at doing exactly the House­holder bidi­ag­o­nal­iza­tion and noth­ing more and noth­ing less was what I wanted. After 16 days of irreg­u­lar work last year (2010), the algo­rithm was in place. The result­ing C‑code is pre­sented below and can be down­loaded from here too as a text file. If you pre­fer just to put pre­com­piled files into MATLAB, you may pre­fer do down­load only this file instead. EDIT: It now con­tains pre­com­piled files for Win­dows 32 and 64 bit oper­at­ing sys­tems as well as 32 and 64 bit oper­at­ing sys­tems for Mac OS, added 2014-03-19.

It may seem strange that I started look­ing into the House­holder bidi­ag­o­nal­iz­tion algo­rithmm. The rea­son is that I have found a new way to make accu­rate PLS-cal­i­bra­tions, espe­cially suit­able for loads of data. This fast House­holder algo­rithms, together with the ones that are already imple­mented in MATLAB like SVD, QR and LU, the will pro­vide new pos­si­bil­i­ties. I think it will be espe­cially use­ful for process data.

Ref­er­ences
1. L. Elden, “Algo­rithms for reg­u­lar­iza­tion of ill-con­di­tioned least-squares prob­lems”, BIT 17 (1977), 134–145.
2. http://www.math.wsu.edu/faculty/tsat/files/matlab/contributed/bidiag.m

/*
BIDIAG Bidi­ag­o­nal­iza­tion of a matrix.
B = bidiag(A);
[U,B,V] = bidiag(A);
[U,B] = bidiag(A);
the m‑times‑n matrix A:
A = U*B*V’ ,
where B is an upper bidi­ag­o­nal square matrix, and U and
V have orthog­o­nal columns. Econ­omy size matri­ces are obatained by adding a zeros for the sec­ond input argument.

L. Elden, “Algo­rithms for reg­u­lar­iza­tion of ill-
con­di­tioned least-squares prob­lems”, BIT 17 (1977), 134–145.

Revi­sion history:
o For Matalb, m >= n:
Per Chris­t­ian Hansen, UNI‑C, 03/11/92.
o For Mat­lab, m >= n and m < n:
Mar­tin Ander­s­son, Foss Japan, 2011-08-11
o C‑code com­piled Mat­lab mex-file:
Mar­tin Ander­s­son, Foss Japan, 2011-09-06
o C‑code for full/economy size done:
Mar­tin Ander­s­son, Foss Japan, 2011-09-10
*/


#include “stdafx.h”
#include “mex.h”
#include “limits.h”
#include “mexFunction.h”
#include “math.h”

// This is for Windows
#pragma comment(lib, “libmx.lib”)
#pragma comment(lib, “libmat.lib”)
#pragma comment(lib, “libmex.lib”)
//#pragma comment(lib, “libmatlb.lib”)

// This is also for Windows
BOOL APIENTRY Dll­Main( HANDLE hMod­ule, DWORD ul_reason_for_call, LPVOID lpRe­served)
{
switch (ul_reason_for_call)
{
case DLL_PROCESS_ATTACH:
case DLL_THREAD_ATTACH:
case DLL_THREAD_DETACH:
case DLL_PROCESS_DETACH:
break;
}
return TRUE;
}

void ini­tial­izeU(int m, int n, mxAr­ray *pU);
void ini­tial­izeV(int m, int n, mxAr­ray *pV);
void copy_vector_to_col_in_matrix(int row, int col, mxAr­ray *pA, mxAr­ray *col_pv);
void copy_vector_to_row_in_matrix(int row, int col, mxAr­ray *pA, mxAr­ray *row_pv);
void copy_col_in_matrix_to_vector(int row, int col, mxAr­ray *col_pv, mxAr­ray *pA);
void copy_row_in_matrix_to_vector(int row, int col, mxAr­ray *row_pv, mxAr­ray *pA);
dou­ble norm(int ind­Start, mxAr­ray *pv);

void col_gen_hh(int row, int col, mxAr­ray *pv, dou­ble& x1, dou­ble& beta, mxAr­ray *pA);
void row_gen_hh(int row, int col, mxAr­ray *pv, dou­ble& x1, dou­ble& beta, mxAr­ray *pA);
void col_app_hh(int row, int col, mxAr­ray *pA, dou­ble& beta, mxAr­ray *pv);
void row_app_hh(int row, int col, mxAr­ray *pA, dou­ble& beta, mxAr­ray *pv);

void fliupud_fliplr_Btransp(mxAr­ray *pB);
void fli­plr(int last­Col, mxAr­ray *pA);

void mex­Func­tion(int nlhs, mxAr­ray *plhs[], int nrhs, const mxAr­ray *prhs[])
{
// Check the inputs and the ouputs
bool Econ­o­my­SizeDe­com­po­si­tion = (nlhs == 1);
if(nlhs > 3) mex­Er­rMs­gTxt(“Too many out­put argu­ments.”);
if(nrhs > 2) mex­Er­rMs­gTxt(“Too many input argu­ments.”);
if(nrhs ==2)
if (*mxGetPr(prhs[1]) == 0)
Econ­o­my­SizeDe­com­po­si­tion = true;
else
mex­Er­rMs­gTxt(“Use bidiag(X,0) for econ­omy size decom­po­si­tion.”);

//Create scalars, vec­tors and matrices
dou­ble *origA, *A, *B, *betaU, *betaV, *col_v, *row_v;
int m, n, ind;
mxAr­ray *pU, *pB, *pV, *col_pv, *row_pv, *pbe­taU, *pbe­taV, *pA;
origA = mxGetPr(prhs[0]);
int M = mxGetM(prhs[0]);
int N = mxGetN(prhs[0]);
if (M >= N)
{
m = M;
n = N;
if (Econ­o­my­SizeDe­com­po­si­tion)
pA = mxCre­ate­Dou­bleMa­trix(m,n,mxREAL);
else
pA = mxCre­ate­Dou­bleMa­trix(m,m,mxREAL);
A = mxGetPr(pA);
mem­cpy(A, origA, m*n*sizeof(dou­ble));
}
else
{
m = N;
n = M;
if (Econ­o­my­SizeDe­com­po­si­tion)
pA = mxCre­ate­Dou­bleMa­trix(m,n,mxREAL);
else
pA = mxCre­ate­Dou­bleMa­trix(m,m,mxREAL);
A = mxGetPr(pA);
int i = 0;
for (int r = 0; r < m; r++)
{
ind = r;
for (int c = 0; c < n; c++)
{
A[ind]= origA[i++];
ind += m;
}
}
}
int indB =0;
col_pv = mxCre­ate­Dou­bleMa­trix(m,1,mxREAL);
pbe­taU = mxCre­ate­Dou­bleMa­trix(m,1,mxREAL);
if (Econ­o­my­SizeDe­com­po­si­tion)
{
row_pv = mxCre­ate­Dou­bleMa­trix(n,1,mxREAL);
pbe­taV = mxCre­ate­Dou­bleMa­trix(n,1,mxREAL);
pB = mxCre­ate­Dou­bleMa­trix(n,n,mxREAL);
}
else
{
row_pv = mxCre­ate­Dou­bleMa­trix(m,1,mxREAL);
pbe­taV = mxCre­ate­Dou­bleMa­trix(m,1,mxREAL);
pB = mxCre­ate­Dou­bleMa­trix(M,N,mxREAL);
}
B = mxGetPr(pB);
betaU = mxGetPr(pbe­taU);
betaV = mxGetPr(pbe­taV);
col_v = mxGetPr(col_pv);
row_v = mxGetPr(row_pv);

//Initialization done.

//Start the main looping
for (int k = 0; k < n; k++)
{
col_gen_hh(k, k, col_pv, B[indB], betaU[k], pA);
if (nlhs>1) copy_vector_to_col_in_matrix(k, k, pA, col_pv);
col_app_hh(k, k+1, pA, betaU[k], col_pv);
if (k < (n-2))
{
indB += mxGetM(pB);
row_gen_hh(k, k+1, row_pv, B[indB++], betaV[k], pA);
if (nlhs>1) copy_vector_to_row_in_matrix(k, k+1, pA, row_pv);
row_app_hh(k+1, k+1, pA, betaV[k], row_pv);
}
else if (k == (n-2))
{
indB += mxGetM(pB);
B[indB++] = A[(m+1)*(n-1)-1];
}
}
if (((nlhs > 1) && (M >= N)) || ((nlhs == 3) && (M < N)))
{
if (Econ­o­my­SizeDe­com­po­si­tion)
{
pU = mxCre­ate­Dou­bleMa­trix(m,n,mxREAL);
ini­tial­izeU(m, n, pU);
}
else
{
pU = mxCre­ate­Dou­bleMa­trix(m,m,mxREAL);
ini­tial­izeU(m, m, pU);
}

for (int k = n-1; k >= 0 ; k)
{
copy_col_in_matrix_to_vector(k, k, col_pv, pA);
col_app_hh(k, k, pU, betaU[k], col_pv);
}
}
if (((nlhs == 3) && (M >= N)) || ((nlhs > 1) && (M < N)))
{
pV = mxCre­ate­Dou­bleMa­trix(n,n,mxREAL);
ini­tial­izeV(m, n, pV);
for (int k = n-3; k >= 0 ; k)
{
copy_row_in_matrix_to_vector(k, k, row_pv, pA);
col_app_hh(k+1, k, pV, betaV[k], row_pv);
}
}

//Main loop­ing done, assign out­puts and flip matri­ces if M < N.
if (M >= N)
{
if (nlhs <= 1)
plhs[0] = pB;
else
{
plhs[0] = pU;
plhs[1] = pB;
if (nlhs == 3) plhs[2] = pV;
}
}
else
{
fliupud_fliplr_Btransp(pB);
if (nlhs <= 1)
plhs[0] = pB;
else
{
fli­plr(n,pV);
plhs[0] = pV;
plhs[1] = pB;
if (nlhs == 3)
{
fli­plr(n,pU);
plhs[2] = pU;
}
}
}
}

void ini­tial­izeU(int m, int n, mxAr­ray *pU)
{
dou­ble *U;
U = mxGetPr(pU);
int kLim = m*n;
for (int k = 0; k < kLim; k += (m+1))
U[k] = 1.0;
}

void ini­tial­izeV(int m, int n, mxAr­ray *pV)
{
dou­ble *V;
V = mxGetPr(pV);
int kLim = n*n;
for (int k = 0; k < kLim; k += (n+1))
V[k] = 1.0;
}

dou­ble norm(int ind­Start, mxAr­ray *pv)
{
int i,m;
dou­ble *v;
dou­ble aux = 0.0;
m = mxGetM(pv);
v = mxGetPr(pv);
for (i = ind­Start; i < m; i++)
aux += v[i]*v[i];
return sqrt(aux);
}

void copy_vector_to_col_in_matrix(int from­Row, int col, mxAr­ray *pA, mxAr­ray *col_pv)
{
dou­ble *v, *A;
v = mxGetPr(col_pv);
A = mxGetPr(pA);
int m = mxGetM(pA);
int indA = from­Row+col*m;
for (int i = from­Row; i < m; i++)
A[indA++] = v[i];
}

void copy_vector_to_row_in_matrix(int row, int from­Col, mxAr­ray *pA, mxAr­ray *row_pv)
{
dou­ble *v, *A;
v = mxGetPr(row_pv);
A = mxGetPr(pA);
int m = mxGetM(pA);
int n = mxGetN(pA);
int indv = from­Col;
int indAs­tart = row + from­Col*m;
int indAlimit = m*n;
for (int indA = indAs­tart; indA < indAlimit; indA += m)
A[indA] = v[indv++];
}

void copy_col_in_matrix_to_vector(int from­Row, int col, mxAr­ray *col_pv, mxAr­ray *pA)
{
dou­ble *v, *A;
v = mxGetPr(col_pv);
A = mxGetPr(pA);
int m = mxGetM(pA);
int indA = from­Row+col*m;
for (int i = from­Row; i < m; i++)
v[i] = A[indA++];
}

void copy_row_in_matrix_to_vector(int row, int from­Col, mxAr­ray *row_pv, mxAr­ray *pA)
{
dou­ble *v, *A;
v = mxGetPr(row_pv);
A = mxGetPr(pA);
int m = mxGetM(pA);
int n = mxGetN(pA);
int indv = from­Col;
int indAs­tart = row + from­Col*m;
int indAlimit = m*n;
for (int indA = indAs­tart; indA < indAlimit; indA += m)
v[indv++] = A[indA];
}

void col_gen_hh(int row, int col, mxAr­ray *col_pv, dou­ble& x1, dou­ble& beta, mxAr­ray *pA)
{
dou­ble *v, *A;
dou­ble alpha;
v = mxGetPr(col_pv);
A = mxGetPr(pA);
int m = mxGetM(pA);
int n = mxGetN(pA);

int r = row;
int indAs­tart = row + col*m;
int indAlimit = (col+1)*m;
for (int indA = indAs­tart; indA < indAlimit; indA++)
v[r++] = A[indA];

alpha = norm(row,col_pv);
if (alpha == 0) beta = 0.0;
else beta = 1/(alpha*(alpha + abs(v[row])));

if (v[row] < 0) alpha = -alpha;
v[row] = v[row] + alpha;
x1 = -alpha;
}

void row_gen_hh(int row, int col, mxAr­ray *row_pv, dou­ble& x1, dou­ble& beta, mxAr­ray *pA)
{
dou­ble *v, *A;
dou­ble alpha;
v = mxGetPr(row_pv);
A = mxGetPr(pA);
int n = mxGetN(pA);
int m = mxGetM(pA);

int indv = col;
int indAs­tart = row + col*m;
int indAlimit = m*n;
for (int indA = indAs­tart; indA < indAlimit; indA += m)
v[indv++] = A[indA];

alpha = norm(col, row_pv);
if (alpha == 0) beta = 0.0;
else beta = 1/(alpha*(alpha + abs(v[col])));

if (v[col] < 0) alpha = -alpha;
v[col] = v[col] + alpha;
x1 = -alpha;
}

void col_app_hh(int row, int col, mxAr­ray *pA, dou­ble& beta, mxAr­ray *pv)
{
int r, c, m, n, indA;
dou­ble *A, *v, d;

m = mxGetM(pA);
n = mxGetN(pA);
A = mxGetPr(pA);
v = mxGetPr(pv);

for (c = col; c < n; c++)
{
d = 0.0;
indA = row + c*m;
for (r = row; r < m; r++)
d = d + v[r]*A[indA++];
indA = row + c*m;
for (r = row; r < m; r++)
{
A[indA] = A[indA] - beta*v[r]*d;
indA++;
}
}
}

void row_app_hh(int row, int col, mxAr­ray *pA, dou­ble& beta, mxAr­ray *pv)
{
int r, c, m, n;
dou­ble *A, *v, d;
int indAs­tart;

m = mxGetM(pA);
n = mxGetN(pA);
int indAlimit = m*n;
A = mxGetPr(pA);
v = mxGetPr(pv);

for (r = row; r < m; r++)
{
d = 0.0;
c = col;
indAs­tart = r + col*m;
for (int indA = indAs­tart; indA < indAlimit; indA += m)
d = d + A[indA]*v[c++];
c = col;
indAs­tart = r + col*m;
for (int indA = indAs­tart; indA < indAlimit; indA += m)
A[indA] = A[indA] - beta*v[c++]*d;
}
}

void fliupud_fliplr_Btransp(mxAr­ray *pB)
{
int m = mxGetM(pB);
int n = mxGetN(pB);
dou­ble *B = mxGetPr(pB);
dou­ble tmp;
int halfway = min(m,n)/2;
int indD1 = 0;
int indD2;
if (m > n)
indD2 = (m + 1)*(n - 1);
else
indD2 = m*m-1;
for (int i = 0; i < halfway; i++)
{
tmp = B[indD1];
B[indD1] = B[indD2];
B[indD2] = tmp;
if (i < halfway)
{
indD1 += m;
indD2;
tmp = B[indD1];
B[indD1] = B[indD2];
B[indD2] = tmp;
indD1++;
indD2 -= m;
}
}
}

void fli­plr(int last­Col, mxAr­ray *pA)
{
int m = mxGetM(pA);
int n = mxGetN(pA);
dou­ble *A = mxGetPr(pA);
dou­ble tmp;
int halfway = m*(last­Col/2);
int rightInd = m*(last­Col-1);
for (int i = 0; i < halfway; i++)
{
tmp = A[i];
A[i] = A[rightInd];
A[rightInd] = tmp;
rightInd ++;
if ((rightInd % m) == 0) rightInd = rightInd - 2*m;
}
}

Leave a Comment

Your email address will not be published. Required fields are marked *

This website uses cookies. By continuing to use this site, you accept our use of cookies. 

Scroll to Top