OSKI, an automatically tuned sparse kernel library.

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

OSKI, an automatically tuned sparse kernel library.

Richard Vuduc
Octave developers,

My colleagues and I have released an initial implementation of OSKI, a
library that provides automatically tuned sparse matrix kernels like
sparse matrix-vector multiply and triangular solve. It's designed to
have BLAS-style functionality, and is similar in spirit to ATLAS and
FFTW, which I'm sure are familiar to this group.

    http://bebop.cs.berkeley.edu/oski

I just wanted to mention OSKI, in case it might be of future interest to
you. I imagine it would be a lot of work to provide an option to use
OSKI within Octave, but at the very least, I invite your comments on
whether the OSKI interface (described in its User's Guide) would provide
adequate useful functionality.

Best regards,
--rich

Reply | Threaded
Open this post in threaded view
|

Re: OSKI, an automatically tuned sparse kernel library.

David Bateman-3
Richard Vuduc wrote:

> Octave developers,
>
> My colleagues and I have released an initial implementation of OSKI, a
> library that provides automatically tuned sparse matrix kernels like
> sparse matrix-vector multiply and triangular solve. It's designed to
> have BLAS-style functionality, and is similar in spirit to ATLAS and
> FFTW, which I'm sure are familiar to this group.
>
>    http://bebop.cs.berkeley.edu/oski

>
> I just wanted to mention OSKI, in case it might be of future interest
> to you. I imagine it would be a lot of work to provide an option to
> use OSKI within Octave, but at the very least, I invite your comments
> on whether the OSKI interface (described in its User's Guide) would
> provide adequate useful functionality.


It looks interesting. Octave at the moment has a pretty good sparse
matrix mul code, that is significantly faster than matlab for most
densities. Matlab beats octave for very low densities. This was written
by Andy Adler and so perhaps he can comment on whether OSKI makes sense
as a replacement for his code or not. As for the triangular solve code I
just copied the algorithm from
the blas {d,z}trsm function and adpated it for sparse matrices.... So
you might be faster there as
well. Motivation to try your code might be lacking however due to a
severe lack of time :-)

One question I have is the issue of tuning. The workload limits you set
seems to imply that the code in OSKI is supposed to be used in a kernel
of an iterative technique. This is one area where at the moment octave
hasn't got much code... I have an implementation of eigs (that is buggy)
where a mul tuned function for use with ARPACK would help. However, if
the workload limits on the tuning in OSKI implies that the tuning is a
significant cost in the calculation then perhaps for the basic mul and
triangular solve functions, its better to avoid tuning. What are the
gains you might expect from OSKI in the case where there is very little
tuning, and the calculation is performed once? Is it in general better
than what we already have?

Regards
David

>
> Best regards,
> --rich
>


--
David Bateman                                [hidden email]
Motorola Labs - Paris                        +33 1 69 35 48 04 (Ph)
Parc Les Algorithmes, Commune de St Aubin    +33 1 69 35 77 01 (Fax)
91193 Gif-Sur-Yvette FRANCE

The information contained in this communication has been classified as:

[x] General Business Information
[ ] Motorola Internal Use Only
[ ] Motorola Confidential Proprietary

Reply | Threaded
Open this post in threaded view
|

Re: OSKI, an automatically tuned sparse kernel library.

Richard Vuduc
David Bateman wrote:
> It looks interesting. Octave at the moment has a pretty good sparse
> matrix mul code, that is significantly faster than matlab for most
> densities. Matlab beats octave for very low densities.

This is very interesting, and is the sort of issue OSKI addresses
since tuning is both matrix- and machine-specific.

 > This was written
> by Andy Adler and so perhaps he can comment on whether OSKI makes sense
> as a replacement for his code or not. As for the triangular solve code I
> just copied the algorithm from
> the blas {d,z}trsm function and adpated it for sparse matrices.... So
> you might be faster there as  well. Motivation to try your code > might be lacking however
due to a severe lack of time :-)

OSKI is designed to "piggyback" on existing code that already
uses compressed sparse row (CSR) or column (CSC) sparse matrix
formats. You provide OSKI with your CSR/CSC representation, and
the library returns a handle. There is a special "sharing" mode
in which you allow OSKI to keep pointers to your CSR/CSC data
structure, and OSKI promises not to change it. By sharing, there
are no extra copies of the matrix floating around. You'd carry
the handle around with your own data structure, and selectively
either use your own routines, or call OSKI's routines on the handle.

As I recall, Matlab uses CSC, and Octave uses either CSR or CSC
for compatibility with SuperLU and UMFPACK, so OSKI is at least
relevant.

I'm not claiming that using OSKI is a piece of cake, and I
certainly wasn't expect any of you to say, "Oh yeah, I'm gonna
start using it right away!" :-) I will certainly try to interest
an OSKI developer into doing this as a proof-of-concept. I mostly
just wanted your team to be aware of OSKI.

> One question I have is the issue of tuning. The workload limits you set
> seems to imply that the code in OSKI is supposed to be used in a kernel

You are right that we target iterative methods, and it seems
reasonable that modifying your dependent libraries (e.g., ARPACK)
could be a better way to for Octave to experiment with OSKI
indirectly.

> triangular solve functions, its better to avoid tuning. What are the
> gains you might expect from OSKI in the case where there is very little
> tuning, and the calculation is performed once? Is it in general better
> than what we already have?

Since OSKI piggybacks on your code, if no tuning has happened and
you use the "shared" mode, then OSKI just uses your data
structure. So there should be no performance hit in this case.
One of our important design goals is that if you bothered to use
OSKI at all, it shouldn't hurt you.

The potential benefits depend strongly on the types of matrices
you users are using. Much of the tuning in OSKI currently targets
applications that use the finite-element method, where 1.5--4x
speedups over CSR for mat-vec are possible on a variety of
platforms. We also have some techniques for getting a similar
range of improvements when the matrix really is randomly
structured with a sufficient number (maybe 20 or more?) non-zeros
per row. We don't yet have proven techniques to handle extremely
sparse matrices (just a few non-zeros per row) with no diagonal
or dense block substructure whatsoever. If you can say what
"typical" Octave users deal with, that might be useful
information for us.

--rich

Reply | Threaded
Open this post in threaded view
|

Re: OSKI, an automatically tuned sparse kernel library.

Andy Adler
On Wed, 29 Jun 2005, Richard Vuduc wrote:

> David Bateman wrote:
> > It looks interesting. Octave at the moment has a pretty good sparse
> > matrix mul code, that is significantly faster than matlab for most
> > densities. Matlab beats octave for very low densities.
>
> This is very interesting, and is the sort of issue OSKI addresses
> since tuning is both matrix- and machine-specific.

I've been trying to work on the octave sparse matrix multiply code;
I have a patch for low density sparse matrices that should
increase the performance by about 4x. When I get back from my
vacation (end-July), I'll test and submit it.

I wanted to try to compare this to the OSKI performance. OSKI
provides a "make benchmarks" function with looks promising.
Unfortunately, I couldn't find a way to compare to octave
performance easily.

1. OSKI benchmarks creates its own test matrices. There doesn't
   seem to be a way to enter or save these to be tested elsewhere.

2. I wasn't able to find test code for sparse x sparse function.

Richard, would you be able to offer some advice?

--
Andy Adler <[hidden email]> 1(613)562-5800x6218


Reply | Threaded
Open this post in threaded view
|

Re: OSKI, an automatically tuned sparse kernel library.

Richard Vuduc
Andy Adler wrote:
> I've been trying to work on the octave sparse matrix multiply code;
> I have a patch for low density sparse matrices that should
> increase the performance by about 4x. When I get back from my
> vacation (end-July), I'll test and submit it.

I would be interested in hearing more about this too when you return.

> I wanted to try to compare this to the OSKI performance. OSKI
> provides a "make benchmarks" function with looks promising.
> Unfortunately, I couldn't find a way to compare to octave
> performance easily.
>
> 1. OSKI benchmarks creates its own test matrices. There doesn't
>    seem to be a way to enter or save these to be tested elsewhere.

There are some benchmark programs you can try running which
accept Harwell-Boeing matrices as input. There is a very short
description in Section 6.4 (p. 31) of the OSKI user's guide. The
benchmark you want is probably "oskibench_autotune_Txy (where
"xy" specifies a particular precision combination liked "id" for
'int' indices and 'double' floating point values). This benchmark
reads in a matrix pattern from a Harwell-Boeing file and reports
performance for a particular workload. Run it without arguments
to see its usage, or drop me a line when you return from vacation.

> 2. I wasn't able to find test code for sparse x sparse function.

Unfortunately, OSKI does not have a sparse mat x sparse mat function.

Best,
--rich