# Linear Algebra with BLAS/CBLAS

I’m intending to learn some basics about basic linear algebra operations using BLAS and CUDA BLAS (CuBLAS). As usual, the post is actually my notes as I read various articles and experimenting.

## BLAS

BLAS (Basic Linear Algebra Subprograms) has been originally developed in FORTRAN for Linear algebra operations. There are many platform-specific optimized implementations under various licenses, from hardware vendors, compiler vendors and open source alternatives. I am particularly interested on Linux and MacOS. In Linux, ATLAS (Automatically Tuned Linear Algebra Software) is an open source optimized library, with multicore implementations. On Mac OSX, Apple’s Accelerate framework provides full BLAS functionality. Since BLAS is originally developed in Fortran, CBLAS takes care of the various nitty-gritty interfacing issues. Finally, Nvidia provides a BLAS alternative for their CUDA GPUs, named CuBLAS.

BLAS operations are divided into three levels. Level 1 BLAS operations are $O(n)$ algorithms, level 2 operations are $O(n^2)$ and level 3 operations are $O(n^3)$. To get an idea, level 1 covers vector dot product, level 2 covers matrix by vector multiply operations and level 3 covers matrix by matrix multiply operations.

BLAS originates from FORTRAN, where matrices are stored in column-major storage with 1-based indexing. C language stores arrays in row major format with 0-based indexing, while dynamically allocated arrays are usually stored as an array of pointers to row vectors.

## CBLAS Example

I have installed the following libraries a long time ago, now I just searched them using  dpkg, so it is possible that I  have forgotten something.

libblas3gf
libatlas3gf-corei7sse3

In our code,we need to include the cblas header file:

#include


What we are going to do, is to multiply a matrix by a vector.
This operation is carried out by the level 2 BLAS operation dgemv.
Actually, dgemv performs one of the following operations, depending on the transpose flag – if it is set, it transposes the A matrix (A is a matrix, x and y are vectors, alpha and beta are constants):

$y = \alpha A x + \beta y$

or

$y = \alpha A^T x + \beta y$

The structure of the operation shows that BLAS functions are designed for use in iterative numerical methods. We are going to set the constants alpha to 1.0 and beta to 0, in order to have a matrix-vector product.

Let’s see the full example:

#include
#include

double a[] = {
8, 4, 7,
3, 5, 1,
1, 3, 2
};

double x[] = {
-1, 2, 1
};

double y[] = {
0, 0, 0
};

int main()
{
int i, j;

// Print original matrix
for (i=0; i		for (j=0; j			printf("%5.1f", a[i*3+j]);
printf("\n");
}
// y = Ax
cblas_dgemv(CblasRowMajor, CblasNoTrans, 3, 3, 1.0, a, 3, x, 1, 0.0, y, 1);

// Print resulting vector
for (i=0; i		printf("%5.1f\n", y[i]);

return 0;
}


The prototype for cblas_dgemv is:

void cblas_dgemv(const enum CBLAS_ORDER order,
const enum CBLAS_TRANSPOSE TransA,
const int M, const int N,
const double alpha, const double  *A, const int lda,
const double  *X, const int incX, const double beta,
double  *Y, const int incY)


The variables that are not discussed are the incX and incY that define a stride for x and y vectors. Also, lda defines the leading dimension of matrix A.

Let’s also introduce our Makefile, to avoid any confusion on compilation and the like:

TARGET	:= testblas
OBJ	:= testblas.o

LDFLAGS		+= -L /usr/lib/atlas-base
LDFLAGS		+= -llapack -lcblas -lf77blas -latlas

$(TARGET): clean:$(RM) -f $(OBJ)$(TARGET)


Let’s see what happens when we run this:

$make cc -L /usr/lib/atlas-base -llapack -lcblas -lf77blas -latlas testblas.c -o testblas$ ./testblas
8.0  4.0  7.0
3.0  5.0  1.0
1.0  3.0  2.0
7.0
8.0
7.0

It is easy to observe in the build command that our code is linked to the ATLAS library. Also, according to manuals online, the ordering is important. That means I even need to profile for various orders of library definitions in the build command… Oh, well.

Now, let’s change CblasRowMajor to CblasColMajor in the source code above.

	cblas_dgemv(CblasColMajor, CblasNoTrans, 3, 3, 1.0, a, 3, x, 1, 0.0, y, 1);


The result is

\$ ./testblas
8.0  4.0  7.0
3.0  5.0  1.0
1.0  3.0  2.0
-1.0
9.0
-3.0

So, let’s see what has happened to understand row and column major formats. We have defined our array as:

double a[] = {
8, 4, 7,
3, 5, 1,
1, 3, 2
};


When we use row major format, we get the result of:

$\begin{bmatrix} 8 & 4 & 7 \\ 3 & 5 & 1 \\ 1 & 3 & 2 \end{bmatrix} \begin{bmatrix} -1 \\ 2 \\ 1 \end{bmatrix} = \begin{bmatrix} 7 \\ 8 \\ 7 \end{bmatrix}$

When we use the column major format, we get the result of:

$\begin{bmatrix} 8 & 3 & 1 \\ 4 & 5 & 3 \\ 7 & 1 & 2 \end{bmatrix} \begin{bmatrix} -1 \\ 2 \\ 1 \end{bmatrix} = \begin{bmatrix} -1 \\ 9 \\ 3 \end{bmatrix}$

Note that using the above code, we use CBLAS that follows 0-based indexing and not 1-based indexing.

## Closing notes

For C developers, the main confusion with CBLAS is the indexing and row-column major representation. On the other hand, the column major format is very attractive for operations that need column wise access to matrices. For example the Simplex method routinely accesses specific columns of an input matrix. A column major format follows the algorithm access pattern, resulting in a very cache friendly alternative.

References