cputime and parallel computing

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

cputime and parallel computing

Anton Dereventsov
I've noticed that cputime() does not measure time correctly with parallel
package.
For instance, in the following script I multiply two random 1000-by-1000
matrices 10 times and time the execution using tic-toc and cputime. I use a
standard 'for' cycle, an 'arrayfun' function, and a 'pararrayfun' function:

        N = 1000;
        K = 10;
       
        t0 = tic; t1 = cputime;
                for i = 1 : K
                        rand(N)*rand(N);
                endfor
        printf('wall time: %.6f | cpu time: %.6f\n', toc(t0), cputime-t1);
       
        t0 = tic; t1 = cputime;
                arrayfun(@(n) rand(N)*rand(N), 1:K, 'UniformOutput', 0);
        printf('wall time: %.6f | cpu time: %.6f\n', toc(t0), cputime-t1);
       
        pkg load parallel
        t0 = tic; t1 = cputime;
                pararrayfun(nproc, @(n) rand(N)*rand(N), 1:K, 'VerboseLevel', 0);
        printf('wall time: %.6f | cpu time: %.6f\n', toc(t0), cputime-t1);

Here is the output:
        wall time: 5.190973 | cpu time: 5.228000
        wall time: 5.373872 | cpu time: 5.404000
        wall time: 3.517548 | cpu time: 0.116000

This code was executed in Octave 4.4.1 on Ubuntu 16.04 and the results are
consistent with each run, however the last cpu time measurement does not
seem to be correct. I would expect all three cpu times to be very similar
since the computed problem is the same. Moreover, the last measurement is
far too small (it becomes even more obvious if N is increased).

Is there a way to correctly measure cpu time on parallelized chunks of code?
Also, how come that the cpu time is slightly bigger that the wall time in
the first two measurements even though they are executed on a single core?

Thanks,
Anton



--
Sent from: http://octave.1599824.n4.nabble.com/Octave-General-f1599825.html


Reply | Threaded
Open this post in threaded view
|

Re: cputime and parallel computing

Olaf Till-2
On Thu, Sep 06, 2018 at 02:35:03PM -0500, Anton Dereventsov wrote:

> I've noticed that cputime() does not measure time correctly with parallel
> package.
> For instance, in the following script I multiply two random 1000-by-1000
> matrices 10 times and time the execution using tic-toc and cputime. I use a
> standard 'for' cycle, an 'arrayfun' function, and a 'pararrayfun' function:
>
> N = 1000;
> K = 10;
>
> t0 = tic; t1 = cputime;
> for i = 1 : K
> rand(N)*rand(N);
> endfor
> printf('wall time: %.6f | cpu time: %.6f\n', toc(t0), cputime-t1);
>
> t0 = tic; t1 = cputime;
> arrayfun(@(n) rand(N)*rand(N), 1:K, 'UniformOutput', 0);
> printf('wall time: %.6f | cpu time: %.6f\n', toc(t0), cputime-t1);
>
> pkg load parallel
> t0 = tic; t1 = cputime;
> pararrayfun(nproc, @(n) rand(N)*rand(N), 1:K, 'VerboseLevel', 0);
> printf('wall time: %.6f | cpu time: %.6f\n', toc(t0), cputime-t1);
>
> Here is the output:
> wall time: 5.190973 | cpu time: 5.228000
> wall time: 5.373872 | cpu time: 5.404000
> wall time: 3.517548 | cpu time: 0.116000
>
> This code was executed in Octave 4.4.1 on Ubuntu 16.04 and the results are
> consistent with each run, however the last cpu time measurement does not
> seem to be correct. I would expect all three cpu times to be very similar
> since the computed problem is the same. Moreover, the last measurement is
> far too small (it becomes even more obvious if N is increased).
No, that's what is to be expected... in pararrayfun, the code is
executed in (nproc) extra processes. Your 'cputime-t1' after calling
pararrayfun measures the cpu usage of the outer process, which doesn't
do the multiplications, but only waits for the results.

> Is there a way to correctly measure cpu time on parallelized chunks
> of code?

Currently there isn't, this would have to be done inside (the code of)
pararrayfun. But the sum of the usage times of the different processor
cores should be similar to the single core cpu usage without
parallelization, even if the overall processing time is shorter. Is
there really a need for measuring cpu usage within
parcellfun/pararrayfun?

> Also, how come that the cpu time is slightly bigger that the wall time in
> the first two measurements even though they are executed on a single core?

I'm not sure, but the difference is so small, that I would attribute
it to calling cputime() _after_ toc(). Have you tried it in reversed
order?

Olaf

--
public key id EAFE0591, e.g. on x-hkp://pool.sks-keyservers.net



signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: cputime and parallel computing

Anton Dereventsov
Thanks, that makes sense.

Is there really a need for measuring cpu usage within
parcellfun/pararrayfun?
 
I'm trying to compare the computational complexity of two algorithms in a simple 
and intuitive way and I think that cpu time is fit for this purpose as it represents 
the computational complexity of an algorithm and is a bit more independent of
an implementation (as opposed to wall-clock time).

I'm not sure, but the difference is so small, that I would attribute
it to calling cputime() _after_ toc(). Have you tried it in reversed
order?

I tried that but cpu time is still slightly bigger.


Anton

On Sat, Sep 8, 2018 at 7:58 AM, Olaf Till <[hidden email]> wrote:
On Thu, Sep 06, 2018 at 02:35:03PM -0500, Anton Dereventsov wrote:
> I've noticed that cputime() does not measure time correctly with parallel
> package.
> For instance, in the following script I multiply two random 1000-by-1000
> matrices 10 times and time the execution using tic-toc and cputime. I use a
> standard 'for' cycle, an 'arrayfun' function, and a 'pararrayfun' function:
>
>       N = 1000;
>       K = 10;
>       
>       t0 = tic; t1 = cputime;
>               for i = 1 : K
>                       rand(N)*rand(N);
>               endfor
>       printf('wall time: %.6f | cpu time: %.6f\n', toc(t0), cputime-t1);
>       
>       t0 = tic; t1 = cputime;
>               arrayfun(@(n) rand(N)*rand(N), 1:K, 'UniformOutput', 0);
>       printf('wall time: %.6f | cpu time: %.6f\n', toc(t0), cputime-t1);
>       
>       pkg load parallel
>       t0 = tic; t1 = cputime;
>               pararrayfun(nproc, @(n) rand(N)*rand(N), 1:K, 'VerboseLevel', 0);
>       printf('wall time: %.6f | cpu time: %.6f\n', toc(t0), cputime-t1);
>
> Here is the output:
>       wall time: 5.190973 | cpu time: 5.228000
>       wall time: 5.373872 | cpu time: 5.404000
>       wall time: 3.517548 | cpu time: 0.116000
>
> This code was executed in Octave 4.4.1 on Ubuntu 16.04 and the results are
> consistent with each run, however the last cpu time measurement does not
> seem to be correct. I would expect all three cpu times to be very similar
> since the computed problem is the same. Moreover, the last measurement is
> far too small (it becomes even more obvious if N is increased).

No, that's what is to be expected... in pararrayfun, the code is
executed in (nproc) extra processes. Your 'cputime-t1' after calling
pararrayfun measures the cpu usage of the outer process, which doesn't
do the multiplications, but only waits for the results.

> Is there a way to correctly measure cpu time on parallelized chunks
> of code?

Currently there isn't, this would have to be done inside (the code of)
pararrayfun. But the sum of the usage times of the different processor
cores should be similar to the single core cpu usage without
parallelization, even if the overall processing time is shorter. Is
there really a need for measuring cpu usage within
parcellfun/pararrayfun?

> Also, how come that the cpu time is slightly bigger that the wall time in
> the first two measurements even though they are executed on a single core?

I'm not sure, but the difference is so small, that I would attribute
it to calling cputime() _after_ toc(). Have you tried it in reversed
order?

Olaf

--
public key id EAFE0591, e.g. on x-hkp://pool.sks-keyservers.net