parallel 4.0.1 released

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

parallel 4.0.1 released

Olaf Till-2
Hi,

a bug-fix release of package parallel has been made, version 4.0.1.

https://octave.sourceforge.io/parallel/NEWS.html

Thanks Kai for the fast release procedure.

Olaf

--
public key id EAFE0591, e.g. on x-hkp://pool.sks-keyservers.net



signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: parallel 4.0.1 released

sshah
I find that on my machine (Mac mini M1), parallel 4.0.1 runs 3 x slower than
parallel 4.0.0.  What has changed?
My parallel codes on M1 used to run faster than most current generation
Intel processor instances  on AWS of similar core counts.  However, after
the 4.0.1 upgrade they are substantially slower.



--
Sent from: https://octave.1599824.n4.nabble.com/Octave-General-f1599825.html


Reply | Threaded
Open this post in threaded view
|

Re: parallel 4.0.1 released

Olaf Till-2
On Tue, Mar 23, 2021 at 12:41:38AM -0500, sshah wrote:
> I find that on my machine (Mac mini M1), parallel 4.0.1 runs 3 x slower than
> parallel 4.0.0.  What has changed?

Please provide more detail, including how you made sure that only the
versions of 'parallel' were different and everything else the
same. And please consider that the first call to parcellfun() or
pararrayfun() is slower then the following calls since the parallel
Octave instances have to be started first. Similar for calls with a
greater number of instances than the calls before (e.g. parcellfun(8,
...) if the previous calles were all parcellfun(4, ...)).

Olaf

--
public key id EAFE0591, e.g. on x-hkp://pool.sks-keyservers.net



signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: parallel 4.0.1 released

Marius Schamschula-5
In reply to this post by sshah
Sshah,

What do you get if you run

file *.oct

on any of the files installed by the parallel package?

For example, on my Intel machine (running MacPorts) I get

marius$ file /opt/local/lib/octave/packages/parallel-4.0.1/x86_64-apple-darwin18.7.0-api-v55/fsave.oct
/opt/local/lib/octave/packages/parallel-4.0.1/x86_64-apple-darwin18.7.0-api-v55/fsave.oct: Mach-O 64-bit bundle x86_64

You should get a bundle for arm64, as I do on my M1 Mac mini:

% file /opt/local/lib/octave/packages/parallel-4.0.1/aarch64-apple-darwin20.3.0-api-v55/fsave.oct
/opt/local/lib/octave/packages/parallel-4.0.1/aarch64-apple-darwin20.3.0-api-v55/fsave.oct: Mach-O 64-bit bundle arm64

If you get a x86_64 bundle, it would run a lot slower due to Rosetta. 

On Mar 23, 2021, at 12:41 AM, sshah <[hidden email]> wrote:

I find that on my machine (Mac mini M1), parallel 4.0.1 runs 3 x slower than
parallel 4.0.0.  What has changed?
My parallel codes on M1 used to run faster than most current generation
Intel processor instances  on AWS of similar core counts.  However, after
the 4.0.1 upgrade they are substantially slower.



--
Sent from: https://octave.1599824.n4.nabble.com/Octave-General-f1599825.html



Marius
--
Marius Schamschula





Reply | Threaded
Open this post in threaded view
|

Re: parallel 4.0.1 released

sshah
In reply to this post by sshah
Sure Olaf.

I run code that calls parrayfun(ncpu, ..) about 750 times, so the first time
effect is greatly reduced. ncpu is picked up from nproc().  I normally run
this code on my in-house mac OS machines to test out integrity of changes
after any updates to octave and its packages. I also run this to test out
Amazon AWS instance configurations. The underlying functions that run in
parallel use relatively low memory, so the memory pressure is low.
I can also see cpu / core usage in top or Activity monitor.  

Here are some numbers.  

Apple M1, Big Sur,  under Rosetta2, my test code ran in about 14 seconds, in
Octave 6.2 and parallel 4.0.0.
On Apple Mac Book Pro 2017 2.3 Ghz Big Sur,, octave 6.2 and parallel 4.0.0
it runs in about 26 seconds.
On m5zn.6xlarge, ubuntu 20.04 LTS, 12 cores second Gen Cascade lake, Octave
5.2 and parallel 3.1.3, it ran in 15 seconds.
After update of parallel 4.0.0 to 4.0.1 on Apple M1 Big Sur, under Roestta2,
in Octave 6.2 it ran in 48 seconds.


However, I did not have a run just before I upgraded the parallel package on
M1.  I could downgrade parallel to 4.0.0 and retest on M1. Perhaps you can
help me through the steps.   For some reason, I have lost the link to
instructions for creating local tar ball from it so I can do a local install
from it on M1.  Can you please post those instructions?  

I recall the following:
****
hg clone http://hg.code.sf.net/p/octave/parallel octave-parallel

cd octave-parallel
make clean
make check
make dis    

Then copy  tarball  from target/*.tar.gz to octave directory and pkg install
***
Is that correct?  If so, how do I get version 4.0.0 out of hg clone command?  

I did try

hg revert --all -rev 333
make clean
make check
make dist

However, it gave an error

For information about changes from previous versions of the parallel
package, run 'news parallel'.

touch /Users/sunilshah/octave-parallel/target/.installation/.install_stamp
octave --no-gui --silent --norc --eval ' pkg ("local_list",
"/Users/sunilshah/octave-parallel/target/.installation/.octave_packages"); '
--eval ' pkg ("load", "parallel"); ' --eval ' args = {"inst"}; args(cellfun
(@ (x) isempty (a = stat (x)) || ! S_ISDIR (a.mode), args)) = []; if
(isempty (args)) error ("no \"inst\" or \"src\" directory"); exit (1); else
cellfun(@runtests, args); endif '
error: parse error:

  invalid use of operator = in anonymous function

>>>  pkg ("local_list",
>>> "/Users/sunilshah/octave-parallel/target/.installation/.octave_packages");  
>>> pkg ("load", "parallel");   args = {"inst"}; args(cellfun (@ (x) isempty
>>> (a = stat (x)) || ! S_ISDIR (a.mode), args)) = []; if (isempty (args))
>>> error ("no \"inst\" or \"src\" directory"); exit (1); else
>>> cellfun(@runtests, args); endif
                                                                                                                                                                               
^

make: *** [check] Error 1

It did generate target/parallel-4.0.0.tar.gz.  However, when installed in
octave (after uninstalling 4.0.1), it generated parallel-4.0.1 instead, with
no change in speed.  
 

Thanks.



--
Sent from: https://octave.1599824.n4.nabble.com/Octave-General-f1599825.html


Reply | Threaded
Open this post in threaded view
|

Re: parallel 4.0.1 released

sshah
In reply to this post by Marius Schamschula-5
Thanks Marius.

I am aware of the architecture differences on M1.  If the .oct file is
compiled for wrong architecture, octave (OS exception) complains and does
not allow execution.
 
I have not yet been able to install any octave packages in native mode in my
homebrew installed octave on M1.  I have tested that mkoctfile works fine in
native mode for simple codes to test cblas and threading.  Homebrew bottle
for octave in native mode is very recent.  

I do have many python packages in both Rosetta2 and native modes.  In my
experience, Rosetta2 versions of those codes have not been slow. In fact, in
some cases, native versions were slower, perhaps because of the immature
nature of the package wheels or installation scripts.



--
Sent from: https://octave.1599824.n4.nabble.com/Octave-General-f1599825.html


Reply | Threaded
Open this post in threaded view
|

Re: parallel 4.0.1 released

sshah
I did manage to install octave packages control-3.2.0 and parallel-4.0.1 in
native mode.  The parallel test code ran about 1.5x faster than in Rosetta2
version.  However, it is still 2x slower than the original parallel-4.0.0 in
Rosetta2 mode.



--
Sent from: https://octave.1599824.n4.nabble.com/Octave-General-f1599825.html


Reply | Threaded
Open this post in threaded view
|

Re: parallel 4.0.1 released

Olaf Till-2
In reply to this post by sshah
On Tue, Mar 23, 2021 at 02:28:01PM -0500, sshah wrote:

> Sure Olaf.
>
> I run code that calls parrayfun(ncpu, ..) about 750 times, so the first time
> effect is greatly reduced. ncpu is picked up from nproc().  I normally run
> this code on my in-house mac OS machines to test out integrity of changes
> after any updates to octave and its packages. I also run this to test out
> Amazon AWS instance configurations. The underlying functions that run in
> parallel use relatively low memory, so the memory pressure is low.
> I can also see cpu / core usage in top or Activity monitor.  
>
> Here are some numbers.  
>
> Apple M1, Big Sur,  under Rosetta2, my test code ran in about 14 seconds, in
> Octave 6.2 and parallel 4.0.0.
> On Apple Mac Book Pro 2017 2.3 Ghz Big Sur,, octave 6.2 and parallel 4.0.0
> it runs in about 26 seconds.
> On m5zn.6xlarge, ubuntu 20.04 LTS, 12 cores second Gen Cascade lake, Octave
> 5.2 and parallel 3.1.3, it ran in 15 seconds.
> After update of parallel 4.0.0 to 4.0.1 on Apple M1 Big Sur, under Roestta2,
> in Octave 6.2 it ran in 48 seconds.
>
>
> However, I did not have a run just before I upgraded the parallel package on
> M1.  I could downgrade parallel to 4.0.0 and retest on M1. Perhaps you can
> help me through the steps.   For some reason, I have lost the link to
> instructions for creating local tar ball from it so I can do a local install
> from it on M1.  Can you please post those instructions?
Sorry, I couldn't attend to this for the last days.

As for the instructions:

After changing current directory to the one you cloned 'parallel'
into:

rm -r target # but make sure you have nothing valuable there
export PREBUILD=no # makes the next step faster
make dist

Than e.g. change directory into 'target', start Octave and

pkg install parallel-4.0.0.tar.gz

But if you have installed parallel-4.0.1.tar.gz in a different way,
you should probably uninstall it first...

As for the original issue:

Execution speed depends on many things, I can't help much without
having a reproducable example (and I have no McOS). There were changes
from 4.0.0 to 4.0.1 (passing of the current search path) which might
make a small difference in some cases, but only at the start of
parallel execution (i.e. at the start of a call of parcellfun()).

Maybe seeing your test code would help. You should call parcellfun
with 'ncpu' before performing your test to get rid of the 'first time
effect' in your test. Generally, parcellfun is probably not very good
for problems which involve repetitive calls (750 in your test) of
which each one lasts only a short time (0.064 s in your test with
4.0.1). In such a case, the small difference between 4.0.0 and 4.0.1
might have a noticable effect.

One further thought: Have you possibly made the mistake to call
parcellfun_set_nproc(0) before or after each call to parcellfun? (This
shouldn't be done, normally.)

Olaf

--
public key id EAFE0591, e.g. on x-hkp://pool.sks-keyservers.net



signature.asc (849 bytes) Download Attachment