Perfromance problem running multiple copies of Octave on a multicore processor

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Perfromance problem running multiple copies of Octave on a multicore processor

Ian McCallion
I have an audio signal processing system that I've parallelised by
dividing up the audio data into several parts, launching a separate
copy of octave to run each part, waiting until all have ended, and
collating the results. I divide the audio into a part for each
"thread" of the processor, so I run 8 on my Core i7 laptop. This
system has worked well using Octave 3.8.2 and earlier for 5 years, but
it has a major performance problem on Octave 4.0.0, taking up to 100
percent longer compared to Octave 3.8.2!!!

I first suspected it was due Octave 4.0.0 using to a different version
of openblas, but I swapped for the older version and nothing changed.
I then tried launching different numbers of copies of Octave.
Surprisingly this changed the performance comparison dramatically as
shown in this table:

Processes     3.8.2           4.0.0          percent improvement
---------------------------------------------------------------------------
     1            67.8 secs       66.0 secs         3
     2            36.9 secs       36.8 secs         0
     4            21.9 secs       26.2 secs       -20
     8            15.8 secs       31.8 secs      - 100

Essentially, with one active copy of Octave version 4.0.0 is slightly
faster. With two active copies of octave they perform essentially the
same. After that Octave 4.0.0 gets slower dramatically the more copies
are run.

As far as I can see this proves that there is some serialisation going
on in Octave 4.0.0 that was not going on in Octave 3.8.2. It sounds
like a bug (system wide serialisation protecting a local resource?).
Has anyone got any suggestions for a workaround, alternative
explanations for the cause of the problem or other tests I can run to
elucidate the problem.

Cheers... Ian

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Perfromance problem running multiple copies of Octave on a multicore processor

Olaf Till-2
On Mon, Nov 30, 2015 at 11:39:02PM +0000, Ian McCallion wrote:
> ...
> As far as I can see this proves that there is some serialisation going
> on in Octave 4.0.0 that was not going on in Octave 3.8.2. It sounds
> like a bug (system wide serialisation protecting a local resource?).

Sounds improbable.

> Has anyone got any suggestions for a workaround, alternative
> explanations for the cause of the problem

An explanation could be using up the common L2 cache of the processor
cores, so that they must share the main memory bus. Maybe some
calculation in the newer Octave is spread over a larger memory area.

> or other tests I can run to
> elucidate the problem.

It could be useful to find the smallest possible unit (e.g a native
Octave function, or a builtin function called by it) which shows your
problem.

Your idea with different BLAS libraries seemed good to me, make sure
you tested this influence correctly.

Olaf

--
public key id EAFE0591, e.g. on x-hkp://pool.sks-keyservers.net

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Perfromance problem running multiple copies of Octave on a multicore processor

Ian McCallion
Hi Olaf,

Thanks for helping me on this. Yes I agree serialisation is highly
improbable, especially because (as I realised after posting) the CPU
cores are all at 100% during the computation. For avoidance of doubt
the laptop has plenty of memory (although memory usage is a bit
highter with Octave 4.0.0) and disk is not touched during processing.

So far I have (I believe) eliminated openblas as a cause by retrying
my first experiments, trying both versions of Octave with both
versions of OpenBLAS. The bad performance definitely follows Octave
4.0.0.

I have also trimmed down my code greatly, checking at each stage that
the performance problem is still present. I am planning to instrument
the code to find where the excess time is being consumed before
trimming it further.

I'm not sure what your role is so this may be a silly question, but if
this turns out to be a real issue would it be helpful for you to have
the code at some point?

Cheers... Ian

On 1 December 2015 at 08:09, Olaf Till <[hidden email]> wrote:

> On Mon, Nov 30, 2015 at 11:39:02PM +0000, Ian McCallion wrote:
>> ...
>> As far as I can see this proves that there is some serialisation going
>> on in Octave 4.0.0 that was not going on in Octave 3.8.2. It sounds
>> like a bug (system wide serialisation protecting a local resource?).
>
> Sounds improbable.
>
>> Has anyone got any suggestions for a workaround, alternative
>> explanations for the cause of the problem
>
> An explanation could be using up the common L2 cache of the processor
> cores, so that they must share the main memory bus. Maybe some
> calculation in the newer Octave is spread over a larger memory area.
>
>> or other tests I can run to
>> elucidate the problem.
>
> It could be useful to find the smallest possible unit (e.g a native
> Octave function, or a builtin function called by it) which shows your
> problem.
>
> Your idea with different BLAS libraries seemed good to me, make sure
> you tested this influence correctly.
>
> Olaf
>
> --
> public key id EAFE0591, e.g. on x-hkp://pool.sks-keyservers.net

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Perfromance problem running multiple copies of Octave on a multicore processor

Michael Creel
In reply to this post by Ian McCallion
Hi,
You need to do the following before starting the octaves:
export OPENBLAS_NUM_THREADS=1

Also, make sure to use octave-cli

HTH,
Michael
Reply | Threaded
Open this post in threaded view
|

Re: Perfromance problem running multiple copies of Octave on a multicore processor

Ian McCallion
In reply to this post by Ian McCallion
Hi Olaf,

As a reminder of the topic here, Octave 4.0.0 is around 50% slower
than Octave 3.8.2 when a large signal processing task is shared across
multiple copies of Octave running in parallel on a multicore CPU. I'm
looking either for a way round the problem or for recognition that
this is an Octave bug.

On 1 December 2015 at 08:09, Olaf Till <[hidden email]> wrote:

><snip>
> An explanation could be using up the common L2 cache of the processor
> cores, so that they must share the main memory bus. Maybe some
> calculation in the newer Octave is spread over a larger memory area.
>
> It could be useful to find the smallest possible unit (e.g a native
> Octave function, or a builtin function called by it) which shows your
> problem.
>
> Your idea with different BLAS libraries seemed good to me, make sure
> you tested this influence correctly.

On 5 December 2015 at 11:30, Ian McCallion <[hidden email]> wrote:
> <snip>
>The bad performance definitely follows Octave 4.0.0.
>
> I am planning to instrument the code to find where the excess time is
> being consumed

I instrumented the code. This showed matrix multiplication and
log(matrix) are affected, whereas  dot multiplication and division do
not seem to be affected.

I have written demonstrator functions that show the problem for matrix
multiplication:

    perf1(8,1000,12,500,1000);
    Running 8 processes.
    4.0.0,  libopenblas382, rand(1000,12)*rand(12,500) 1000 times
               took 11.07 seconds
    3.8.2,  libopenblas382, rand(1000,12)*rand(12,500) 1000 times
               took 7.22 seconds

The same magnitude of task on one processor:

   perf1(1,1000,12,500,8000)
   Running 1 process
   4.0.0,  libopenblas382, rand(1000,12)*rand(12,500) 8000 times
              took 13.04 seconds
   3.8.2,  libopenblas382, rand(1000,12)*rand(12,500) 8000 times
               took 21.14 seconds

I will happily share the demonstrator with anyone who wants it. It may
be useful to know whether the problem shows up on other processors and
other motherboards. Mine is an intel i7-3610QM ASUS laptop with 8GB
RAM running win7.

Cheers... Ian

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Perfromance problem running multiple copies of Octave on a multicore processor

Olaf Till-2
On Thu, Dec 10, 2015 at 11:47:02AM +0000, Ian McCallion wrote:

> On 1 December 2015 at 08:09, Olaf Till <[hidden email]> wrote:
> ><snip>
> > An explanation could be using up the common L2 cache of the processor
> > cores, so that they must share the main memory bus. Maybe some
> > calculation in the newer Octave is spread over a larger memory area.
> >
> > It could be useful to find the smallest possible unit (e.g a native
> > Octave function, or a builtin function called by it) which shows your
> > problem.
> >
> > Your idea with different BLAS libraries seemed good to me, make sure
> > you tested this influence correctly.
>
> On 5 December 2015 at 11:30, Ian McCallion <[hidden email]> wrote:
> > <snip>
> >The bad performance definitely follows Octave 4.0.0.
> >
> > I am planning to instrument the code to find where the excess time is
> > being consumed
>
> I instrumented the code. This showed matrix multiplication and
> log(matrix) are affected, whereas  dot multiplication and division do
> not seem to be affected.
>
> I have written demonstrator functions that show the problem for matrix
> multiplication:
>
>     perf1(8,1000,12,500,1000);
>     Running 8 processes.
>     4.0.0,  libopenblas382, rand(1000,12)*rand(12,500) 1000 times
>                took 11.07 seconds
>     3.8.2,  libopenblas382, rand(1000,12)*rand(12,500) 1000 times
>                took 7.22 seconds
>
> The same magnitude of task on one processor:
>
>    perf1(1,1000,12,500,8000)
>    Running 1 process
>    4.0.0,  libopenblas382, rand(1000,12)*rand(12,500) 8000 times
>               took 13.04 seconds
>    3.8.2,  libopenblas382, rand(1000,12)*rand(12,500) 8000 times
>                took 21.14 seconds
>
> I will happily share the demonstrator with anyone who wants it. It may
> be useful to know whether the problem shows up on other processors and
> other motherboards. Mine is an intel i7-3610QM ASUS laptop with 8GB
> RAM running win7.
The code for matrix multiplication (in liboctave/array/dMatrix.cc,
function xgemm and how it is called) has only code-formatting changes
between Octave 3.8.2 and 4.0.0. So the same lapack functions should be
called. This indicates the cause are differences in the used (lapack
or) blas. (I know you stated the same blas was used in both, but you
didn't state how you checked it.)

As for your test, your code surely called rand() _before_ the loop,
not within? Better you post the code, including any shell scripts or
similar to call it.

What do

ldd /path/to/octave-3.8.0 | grep lapack
ldd /path/to/octave-3.8.0 | grep blas

and

ldd /path/to/octave-4.0.0 | grep lapack
ldd /path/to/octave-4.0.0 | grep blas

output on your system? (Replace octave-... with octave-...-cli or
whatever the (symlink to the) final executable is named.)

Olaf

--
public key id EAFE0591, e.g. on x-hkp://pool.sks-keyservers.net

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Perfromance problem running multiple copies of Octave on a multicore processor

Ian McCallion
Hi Olaf,

Here is the code. ldd is unix-only so haven't done the commands you
suggested. However I will send you the win equivalent shortly.

 The cmd script is how I switch blas versions.

One important thing I note (which I thought was a testing error and
ignored earlier), is that the only combination that runs fast is
Octave 3.8.0 and the blas shipped with it.

I will experiment with different versions of lapack here. Please take
a quick look at the code to ensure there are no siily errors. You will
need source chnges for your environment to run it.

Cheers... Ian
p.s. Change attachment to .zip. Bloody google!

On 10 December 2015 at 13:16, Olaf Till <[hidden email]> wrote:

> On Thu, Dec 10, 2015 at 11:47:02AM +0000, Ian McCallion wrote:
>> On 1 December 2015 at 08:09, Olaf Till <[hidden email]> wrote:
>> ><snip>
>> > An explanation could be using up the common L2 cache of the processor
>> > cores, so that they must share the main memory bus. Maybe some
>> > calculation in the newer Octave is spread over a larger memory area.
>> >
>> > It could be useful to find the smallest possible unit (e.g a native
>> > Octave function, or a builtin function called by it) which shows your
>> > problem.
>> >
>> > Your idea with different BLAS libraries seemed good to me, make sure
>> > you tested this influence correctly.
>>
>> On 5 December 2015 at 11:30, Ian McCallion <[hidden email]> wrote:
>> > <snip>
>> >The bad performance definitely follows Octave 4.0.0.
>> >
>> > I am planning to instrument the code to find where the excess time is
>> > being consumed
>>
>> I instrumented the code. This showed matrix multiplication and
>> log(matrix) are affected, whereas  dot multiplication and division do
>> not seem to be affected.
>>
>> I have written demonstrator functions that show the problem for matrix
>> multiplication:
>>
>>     perf1(8,1000,12,500,1000);
>>     Running 8 processes.
>>     4.0.0,  libopenblas382, rand(1000,12)*rand(12,500) 1000 times
>>                took 11.07 seconds
>>     3.8.2,  libopenblas382, rand(1000,12)*rand(12,500) 1000 times
>>                took 7.22 seconds
>>
>> The same magnitude of task on one processor:
>>
>>    perf1(1,1000,12,500,8000)
>>    Running 1 process
>>    4.0.0,  libopenblas382, rand(1000,12)*rand(12,500) 8000 times
>>               took 13.04 seconds
>>    3.8.2,  libopenblas382, rand(1000,12)*rand(12,500) 8000 times
>>                took 21.14 seconds
>>
>> I will happily share the demonstrator with anyone who wants it. It may
>> be useful to know whether the problem shows up on other processors and
>> other motherboards. Mine is an intel i7-3610QM ASUS laptop with 8GB
>> RAM running win7.
>
> The code for matrix multiplication (in liboctave/array/dMatrix.cc,
> function xgemm and how it is called) has only code-formatting changes
> between Octave 3.8.2 and 4.0.0. So the same lapack functions should be
> called. This indicates the cause are differences in the used (lapack
> or) blas. (I know you stated the same blas was used in both, but you
> didn't state how you checked it.)
>
> As for your test, your code surely called rand() _before_ the loop,
> not within? Better you post the code, including any shell scripts or
> similar to call it.
>
> What do
>
> ldd /path/to/octave-3.8.0 | grep lapack
> ldd /path/to/octave-3.8.0 | grep blas
>
> and
>
> ldd /path/to/octave-4.0.0 | grep lapack
> ldd /path/to/octave-4.0.0 | grep blas
>
> output on your system? (Replace octave-... with octave-...-cli or
> whatever the (symlink to the) final executable is named.)
>
> Olaf
>
> --
> public key id EAFE0591, e.g. on x-hkp://pool.sks-keyservers.net

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave

perf.zzz (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Perfromance problem running multiple copies of Octave on a multicore processor

Olaf Till-2
On Thu, Dec 10, 2015 at 01:46:53PM +0000, Ian McCallion wrote:
> Hi Olaf,
>
> Here is the code. ldd is unix-only so haven't done the commands you
> suggested. However I will send you the win equivalent shortly.
>
>  The cmd script is how I switch blas versions.

Since you are using Windows, I couldn't check if your procedure indeed
provides the same library to both Octave versions. You seem to assume
that Octave uses the .dll version that is in the same directory as its
own executable file; I have no notion if this is the case in
Windows. But the point is that indeed physically identical blas
library files have to be used by both Octave versions for testing. I
still suspect different libraries could have been used.

I didn't run your code, but you should not measure the duration of
only the first proces (in perf2), but of all parallel processes
(i.e. the duration from before 'for i = 1:nProc' till after

for i = 1:nProc
  mwait (pid(i));
endfor

in perf1. But I daresay the outcome will probably not be
different. Otherwise your code seems to correctly test the duration of
matrix multiplication.

Olaf

--
public key id EAFE0591, e.g. on x-hkp://pool.sks-keyservers.net

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Perfromance problem running multiple copies of Octave on a multicore processor

Mike Miller-4
In reply to this post by Ian McCallion
On Thu, Dec 10, 2015 at 13:46:53 +0000, Ian McCallion wrote:

> Hi Olaf,
>
> Here is the code. ldd is unix-only so haven't done the commands you
> suggested. However I will send you the win equivalent shortly.
>
>  The cmd script is how I switch blas versions.
>
> One important thing I note (which I thought was a testing error and
> ignored earlier), is that the only combination that runs fast is
> Octave 3.8.0 and the blas shipped with it.
>
> I will experiment with different versions of lapack here. Please take
> a quick look at the code to ensure there are no siily errors. You will
> need source chnges for your environment to run it.

For comparison, here is what I get calling your functions with Octave
3.8.2 and 4.0.0 in my Debian environment (using OpenBLAS). I also upped
the count a little to something meaningful for my system.

Without OMP_NUM_THREADS:

  >> perf1 (8, 1000, 12, 500, 5000)
  4.0.0,                , rand(1000,12)*rand(12,500) 5000 times took 14.52 seconds
  3.8.2,                , rand(1000,12)*rand(12,500) 5000 times took 19.39 seconds
  Running 8 processes
  >> perf1 (1, 1000, 12, 500, 8*5000)
  4.0.0,                , rand(1000,12)*rand(12,500) 40000 times took 13.92 seconds
  3.8.2,                , rand(1000,12)*rand(12,500) 40000 times took 13.98 seconds
  Running 1 processes

With OMP_NUM_THREADS set to 1 (disabling multi-processing within
OpenBLAS):

  >> perf1 (8, 1000, 12, 500, 5000)
  4.0.0,           1.0.0, rand(1000,12)*rand(12,500) 5000 times took 10.16 seconds
  3.8.2,           1.0.0, rand(1000,12)*rand(12,500) 5000 times took 13.02 seconds
  Running 8 processes
  >> perf1 (1, 1000, 12, 500, 8*5000)
  4.0.0,           1.0.0, rand(1000,12)*rand(12,500) 40000 times took 29.05 seconds
  3.8.2,           1.0.0, rand(1000,12)*rand(12,500) 40000 times took 29.23 seconds
  Running 1 processes


So the observations I make:

 • No significant change between 3.8.2 and 4.0.0, 4.0.0 maybe slightly
   faster when running multiple jobs
 • Make sure to take into account the interaction between running
   parallel Octave jobs and the use of OpenMP within OpenBLAS

Are you using the 4.0.0 official binary? Which 3.8.2 binary are you
using?

--
mike

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Perfromance problem running multiple copies of Octave on a multicore processor

Ian McCallion
In reply to this post by Olaf Till-2
On 10 December 2015 at 15:13, Olaf Till <[hidden email]> wrote:
> On Thu, Dec 10, 2015 at 01:46:53PM +0000, Ian McCallion wrote:

> Since you are using Windows, I couldn't check if your procedure indeed
> provides the same library to both Octave versions. You seem to assume
> that Octave uses the .dll version that is in the same directory as its
> own executable file;

It's not required to be so by windows, but it is not an assumption
either. If I delete libblas.dll Octave does not start up.

> I have no notion if this is the case in
> Windows. But the point is that indeed physically identical blas
> library files have to be used by both Octave versions for testing. I
> still suspect different libraries could have been used.

Suspect no longer:-)

However the versions of lapack are different. I will try doing the
same with lapack as I'm doing with blas.

> I didn't run your code, but you should not measure the duration of
> only the first process (in perf2), but of all parallel processes
> But I daresay the outcome will probably not be
> different.

I timed within the subprocesses to avoid the process start/stop
overhead. I checked all processes take about the same time, but on
windows fprintf interleaves at the character level sometimes making
deciphering the output difficult, which is why I only display the
results from process 1. (I would have output to files and merged the
results in perf1 if it had been necessary)

Cheers... Ian

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Fwd: Perfromance problem running multiple copies of Octave on a multicore processor

Ian McCallion
In reply to this post by Mike Miller-4
Hi Mike,

On 10 December 2015 at 18:11, Mike Miller <[hidden email]> wrote:
> Are you using the 4.0.0 official binary?
  Yes.
> Which 3.8.2 binary are you using?
  Not sure - it was a while ago.The binaries are timestamped
2014-09-04 20:15 and I downloaded the installer from the official
site. I am pretty sure I redownloaded to get the second iteration of
3.8.2 when it appeared.

Regarding controlling the number of threads used by blas and lapack:
-  I have looked for references to OMP_NUM_THREADS in documentation
and found nothing.
-  I have tried setting OMP_NUM_THREADS environment variable inside or
outside octave to no effect.
Have mechanisms for controlling the number of threads used by openMP
been changed somewhere in Octave lapack blas and openMP?

Cheers... Ian

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Perfromance problem running multiple copies of Octave on a multicore processor

Mike Miller-4
On Thu, Dec 10, 2015 at 23:37:58 +0000, Ian McCallion wrote:
> Regarding controlling the number of threads used by blas and lapack:
> -  I have looked for references to OMP_NUM_THREADS in documentation
> and found nothing.
> -  I have tried setting OMP_NUM_THREADS environment variable inside or
> outside octave to no effect.
> Have mechanisms for controlling the number of threads used by openMP
> been changed somewhere in Octave lapack blas and openMP?

No, it's via the environment variables documented for OpenBLAS, either
OMP_NUM_THREADS or OPENBLAS_NUM_THREADS.

I used

  >> setenv OMP_NUM_THREADS 1
  >> perf1(...)

to achieve the results I reported earlier.

--
mike

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Perfromance problem running multiple copies of Octave on a multicore processor

Ian McCallion
In reply to this post by Ian McCallion
Hi Olaf and Mike,

Thank you both for your help. I have now worked out in more detail
what has been happening to cause anomalous performance between Octave
3.8.2 and Octave 4.0.0 in my signal processing system:

     Octave 3.8.2 and earlier releases, with the supplied openblas
     and lapack, only ever used a single thread, whether by design
     or accident and whatever value (or none) at all was set for
         OPENBLAS_NUM_THREADS

    This was changed/fixed in Octave 4.0.0.

I did not find any reference to this change in the documentation, and
it may be worth adding a few lines in news to explain this.

Cheers... Ian

On 10 December 2015 at 23:37, Ian McCallion <[hidden email]> wrote:

> Hi Mike,
>
> On 10 December 2015 at 18:11, Mike Miller <[hidden email]> wrote:
>> Are you using the 4.0.0 official binary?
>   Yes.
>> Which 3.8.2 binary are you using?
>   Not sure - it was a while ago.The binaries are timestamped
> 2014-09-04 20:15 and I downloaded the installer from the official
> site. I am pretty sure I redownloaded to get the second iteration of
> 3.8.2 when it appeared.
>
> Regarding controlling the number of threads used by blas and lapack:
> -  I have looked for references to OMP_NUM_THREADS in documentation
> and found nothing.
> -  I have tried setting OMP_NUM_THREADS environment variable inside or
> outside octave to no effect.
> Have mechanisms for controlling the number of threads used by openMP
> been changed somewhere in Octave lapack blas and openMP?
>
> Cheers... Ian

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Perfromance problem running multiple copies of Octave on a multicore processor

Mike Miller-4
On Mon, Dec 14, 2015 at 16:54:38 +0000, Ian McCallion wrote:

> Hi Olaf and Mike,
>
> Thank you both for your help. I have now worked out in more detail
> what has been happening to cause anomalous performance between Octave
> 3.8.2 and Octave 4.0.0 in my signal processing system:
>
>      Octave 3.8.2 and earlier releases, with the supplied openblas
>      and lapack, only ever used a single thread, whether by design
>      or accident and whatever value (or none) at all was set for
>          OPENBLAS_NUM_THREADS
>
>     This was changed/fixed in Octave 4.0.0.
>
> I did not find any reference to this change in the documentation, and
> it may be worth adding a few lines in news to explain this.

I'm glad you were able to work out the problem with your system.

This change, if true, is likely due to either the way OpenBLAS is built
for the Octave 4.0.0 bundle, or the version of OpenBLAS that is now
used, or perhaps that OpenBLAS was not even present or being used in the
earlier release that you were running.

I don't know if we currently have a place to document changes or release
notes specific to the binary packaging of Octave for Windows, maybe we
should.

--
mike

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Perfromance problem running multiple copies of Octave on a multicore processor

John Donoghue-3
In reply to this post by Ian McCallion
> Message: 2
> Date: Mon, 14 Dec 2015 13:10:04 -0500
> From: Mike Miller <[hidden email]>
> To: [hidden email]
> Subject: Re: Perfromance problem running multiple copies of Octave on
> a multicore processor
> Message-ID: <[hidden email]>
> Content-Type: text/plain; charset=utf-8
>
> On Mon, Dec 14, 2015 at 16:54:38 +0000, Ian McCallion wrote:
> > Hi Olaf and Mike,
> >
> > Thank you both for your help. I have now worked out in more detail
> > what has been happening to cause anomalous performance between Octave
> > 3.8.2 and Octave 4.0.0 in my signal processing system:
> >
> >      Octave 3.8.2 and earlier releases, with the supplied openblas
> >      and lapack, only ever used a single thread, whether by design
> >      or accident and whatever value (or none) at all was set for
> >          OPENBLAS_NUM_THREADS
> >
> >     This was changed/fixed in Octave 4.0.0.
> >
> > I did not find any reference to this change in the documentation, and
> > it may be worth adding a few lines in news to explain this.
>
> I'm glad you were able to work out the problem with your system.
>
> This change, if true, is likely due to either the way OpenBLAS is built
for the
> Octave 4.0.0 bundle, or the version of OpenBLAS that is now used, or
perhaps
> that OpenBLAS was not even present or being used in the earlier release
that
> you were running.
>
> I don't know if we currently have a place to document changes or release
notes
> specific to the binary packaging of Octave for Windows, maybe we should.
>
> --
> mike
>
>

It was probably changed from this: https://savannah.gnu.org/bugs/?43525



_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave