Trade-off with looping

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Trade-off with looping

Heber Farnsworth

Given z which is 1 by n and x which is m by 1 consider the following two
pieces of code which accomplish the same thing:

        for i = 1:n
          u = z(i) - x;
          k = exp(u.^2);
        f(i) = mean(k);

and

        u = z(ones(m,1),:) - x(:,ones(1,n));
        k = exp(u.^2);
        f = mean(k);

The second example contains no loops and I would expect it to be faster.
This is indeed the case for smaller values of m (less than 1000).  But
when m is 5000 the first example is about twice as fast.  In both cases n
is 40 (and for my purposes will always be less than 100).

There is apparently a tradeoff between the speed of loops and the
difficulty in working with very large matrices (200,000 elements).  Can
anyone shed some light on where the optimal tradeoff is?  What is the
largest matrix that octave can handle comfortably (or is that system
dependent?).



  Heber Farnsworth                               | Department of Finance
  Univerity of Washington                        | Box 353200
  tele:  (206) 528-0793 home                     | Seattle, WA 98195-3200
  tele:  (206) 543-4773 finance     web: http://weber.u.washington.edu/~heberf
  fax:   (206) 685-9392             email: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Trade-off with looping

John W. Eaton-6
On 16-Aug-1996, Heber Farnsworth <[hidden email]> wrote:

: Given z which is 1 by n and x which is m by 1 consider the following two
: pieces of code which accomplish the same thing:
:
: for i = 1:n
:  u = z(i) - x;
:  k = exp(u.^2);
: f(i) = mean(k);
:
: and
:
: u = z(ones(m,1),:) - x(:,ones(1,n));
: k = exp(u.^2);
: f = mean(k);
:
: The second example contains no loops and I would expect it to be faster.
: This is indeed the case for smaller values of m (less than 1000).  But
: when m is 5000 the first example is about twice as fast.  In both cases n
: is 40 (and for my purposes will always be less than 100).

I think the time required for the loop version is not that much
different than for the no-loop version because the loop is only over
40 elements, but the computation involves hundreds or thousands of
exponentials.  If you were to switch those numbers, you would see much
different results (with the loop version being much slower, I'm sure).

However, the memory requirements are much different.  A quick check
with top and Octave 1.1.1 on an Alpha shows that the no-loop version
takes about 12MB with 6MB max resident, but the loop version only
requires about 7MB with 1.3MB max resident.  Both execute in about the
same amount of time, even when m is 5000.

The Alpha I'm using has 256MB of physical memory, so paging isn't much
of a problem with this example.  If you have less memory available or
you increase the problem size much and paging begins to dominate, I
suspect that you will see a siginificant decrease in the performance
of the no-loop version before you see a similar decrease in the
performance of the loop version.  :-)

: There is apparently a tradeoff between the speed of loops and the
: difficulty in working with very large matrices (200,000 elements).  Can
: anyone shed some light on where the optimal tradeoff is?  What is the
: largest matrix that octave can handle comfortably (or is that system
: dependent?).

I think it is highly dependent on the amount of physical memory you
have available and the types of computations you are trying to do.

jwe