

I've noticed that indexing expressions on large arrays can be quite slow, even if the indexes are mostly sequential. In the code below, the first loop takes 1.91 seconds while the second takes 0.11 seconds. Any suggestions on how to get around this would be much appreciated, even looking at Octave's source.
a = rand(300); tic for i = 1:100 a(1:(size(a, 1)  1), :) += a(2:size(a, 1), :); endfor toc tic for i = 1:100 a += a; endfor toc thanks, Nicholas
_______________________________________________
Helpoctave mailing list
[hidden email]
https://wwwold.cae.wisc.edu/mailman/listinfo/helpoctave


On Tue, Jan 13, 2009 at 7:13 PM, Nicholas Tung < [hidden email]> wrote:
> I've noticed that indexing expressions on large arrays can be quite slow,
> even if the indexes are mostly sequential. In the code below, the first loop
> takes 1.91 seconds while the second takes 0.11 seconds. Any suggestions on
> how to get around this would be much appreciated, even looking at Octave's
> source.
>
> a = rand(300);
> tic
> for i = 1:100
> a(1:(size(a, 1)  1), :) += a(2:size(a, 1), :);
> endfor
> toc
>
> tic
> for i = 1:100
> a += a;
> endfor
> toc
>
> thanks,
> Nicholas
>
What version of Octave's sources did you use? Dense indexing speed has
been vastly improved in the development sources (but after 3.1.51,
IIRC).

RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz
_______________________________________________
Helpoctave mailing list
[hidden email]
https://wwwold.cae.wisc.edu/mailman/listinfo/helpoctave


>I've noticed that indexing expressions on large arrays can be quite slow,
Have a look at this, which I wrote in November but for some reason does
not appear in the list archives. In an application of mine, I wrote an
ad hoc function to access a 5dim matrix using the last method below,
getting a significant speedup. I usually run 3.0.1, and I did not make
any check to compare its speed with the more recent Octave versions.
Date: 11 Nov 2008 10:44:45 +0100
From: Francesco Potortì < [hidden email]>
To: Jaroslav Hajek < [hidden email]>
CC: Octave help list < [hidden email]>
Subject: Re: slow access to consecutive matrix elements

>On Sat, Oct 11, 2008 at 10:22 AM, Francesco Potorti`
>< [hidden email]> wrote:
>> This is a real life example that demonstrates how access to matrix
>> elements is not optimised for the case of complete ranges in the first
>> dimensions. I am sending it here and not to the bug list because this
>> example may be useful to others, at least until this cases is optimised
>> in the sources.
>>
>> # This is a big 5dim matrix, about 100MB size
>> octave> kk=rand(156,222,1,44,8);
>>
>> # I access a slice of it, which is sequential in memory
>> octave> t=cputime; for ii=1:44, for jj=1:8
>> mm=kk(:,:,:,ii,jj); endfor, endfor, cputimet
>> ans = 5.9124
>>
>> # Using a linear index is much faster
>> octave> span=(1:156*222);
>> t=cputime; for ii=1:44, for jj=1:8
>> mm=kk(sub2ind(size(kk),1,1,1,ii,jj)+span1); endfor, endfor, cputimet
>> ans = 2.6642
>>
>> # Removing the call to sub2ind reaches top speed
>> octave> cp=[1,cumprod(size(kk)(1:end1))]; span=(1:156*222);
>> t=cputime; for ii=1:44, for jj=1:8
>> mm=kk(sum(([1,1,1,ii,jj]1).*cp)+span); endfor, endfor, cputimet
>> ans = 1.4001
>>
>> Still, I wish access were faster yet. Is there a reason why copying
>> consecutive memory is so slow? I wish I could help with optimising
>> this, even if I am certainly not the most qualified person to do it.
>>
>
>I guess the indexing routines do deserve some attention w.r.t
>performance. Reducing code duplication would also be nice. I have this
>on my TODO list, but I don't think it's a good idea to do it before
>3.2.x is forked, as such changes are, IMO, likely to introduce bugs.

Follwoing up on this, I realised that there is room for further
significant speedup:

# Be careful to not convert ranges into matrices
octave> cp=[1,cumprod(size(kk)(1:end1))]; len=156*222;
 t=cputime; for ii=1:44, for jj=1:8
base=sum(([1,1,1,ii,jj]1).*cp); mm=kk(base+1:base+len); endfor, endfor, cputimet
ans = 0.26402

The fact is, I was discounting the fact that a range remains a range
even after linear transformation, while this does not appear to be the
case:

octave3.1> typeinfo(1:4)
ans = range
octave3.1> typeinfo(4+(1:4))
ans = matrix
octave3.1> typeinfo(4*(1:4))
ans = matrix

>From the depth of my ignorance about Octave's internals, I dare say that
it should not be too difficult to keep ranges as ranges even after sum
or product with a scalar. Maybe even after sum with a range with the
same number of elements. Am I wrong?


Francesco Potortì (ricercatore) Voice: +39 050 315 3058 (op.2111)
ISTI  Area della ricerca CNR Fax: +39 050 315 2040
via G. Moruzzi 1, I56124 Pisa Email: [hidden email]
(entrance 20, 1st floor, room C71) Web: http://fly.isti.cnr.it/

_______________________________________________
Helpoctave mailing list
[hidden email]
https://wwwold.cae.wisc.edu/mailman/listinfo/helpoctave


On Tue, Jan 13, 2009 at 16:46, Francesco Potortì <[hidden email]> wrote:
>I've noticed that indexing expressions on large arrays can be quite slow,
Have a look at this, which I wrote in November but for some reason does
not appear in the list archives. In an application of mine, I wrote an
ad hoc function to access a 5dim matrix using the last method below,
getting a significant speedup. I usually run 3.0.1, and I did not make
any check to compare its speed with the more recent Octave versions.
Date: 11 Nov 2008 10:44:45 +0100
From: Francesco Potortì <[hidden email]>
To: Jaroslav Hajek <[hidden email]>
CC: Octave help list <[hidden email]>
Subject: Re: slow access to consecutive matrix elements

>On Sat, Oct 11, 2008 at 10:22 AM, Francesco Potorti`
><[hidden email]> wrote:
>> This is a real life example that demonstrates how access to matrix
>> elements is not optimised for the case of complete ranges in the first
>> dimensions. I am sending it here and not to the bug list because this
>> example may be useful to others, at least until this cases is optimised
>> in the sources.
>>
>> # This is a big 5dim matrix, about 100MB size
>> octave> kk=rand(156,222,1,44,8);
>>
>> # I access a slice of it, which is sequential in memory
>> octave> t=cputime; for ii=1:44, for jj=1:8
>> mm=kk(:,:,:,ii,jj); endfor, endfor, cputimet
>> ans = 5.9124
>>
>> # Using a linear index is much faster
>> octave> span=(1:156*222);
>> t=cputime; for ii=1:44, for jj=1:8
>> mm=kk(sub2ind(size(kk),1,1,1,ii,jj)+span1); endfor, endfor, cputimet
>> ans = 2.6642
>>
>> # Removing the call to sub2ind reaches top speed
>> octave> cp=[1,cumprod(size(kk)(1:end1))]; span=(1:156*222);
>> t=cputime; for ii=1:44, for jj=1:8
>> mm=kk(sum(([1,1,1,ii,jj]1).*cp)+span); endfor, endfor, cputimet
>> ans = 1.4001
>>
>> Still, I wish access were faster yet. Is there a reason why copying
>> consecutive memory is so slow? I wish I could help with optimising
>> this, even if I am certainly not the most qualified person to do it.
>>
>
>I guess the indexing routines do deserve some attention w.r.t
>performance. Reducing code duplication would also be nice. I have this
>on my TODO list, but I don't think it's a good idea to do it before
>3.2.x is forked, as such changes are, IMO, likely to introduce bugs.

Follwoing up on this, I realised that there is room for further
significant speedup:

# Be careful to not convert ranges into matrices
octave> cp=[1,cumprod(size(kk)(1:end1))]; len=156*222;
 t=cputime; for ii=1:44, for jj=1:8
base=sum(([1,1,1,ii,jj]1).*cp); mm=kk(base+1:base+len); endfor, endfor, cputimet
ans = 0.26402

The fact is, I was discounting the fact that a range remains a range
even after linear transformation, while this does not appear to be the
case:

octave3.1> typeinfo(1:4)
ans = range
octave3.1> typeinfo(4+(1:4))
ans = matrix
octave3.1> typeinfo(4*(1:4))
ans = matrix

>From the depth of my ignorance about Octave's internals, I dare say that
it should not be too difficult to keep ranges as ranges even after sum
or product with a scalar. Maybe even after sum with a range with the
same number of elements. Am I wrong?
neat, I didn't know that 1d vector indexing worked on matrices. However, I think it may be faster for you because of the multidimensional aspect, whereas I am working with more values (potentially large image samples). On Octave 3.1.51+ (built today) the following code runs in about the same time. If you're doing something more complicated, please tell me. I know the two snippets below don't compute exactly the same value...
tic, x1 = 1:299; x2 = 2:300; for i = 1:100, a(:, x1) += a(:, x2); endfor, toc x1 = 1:89700; x2 = 301:90000; tic, for i = 1:100, a(x1) += a(x2); endfor, toc As another note, it doesn't seem to matter that much which way the matrix is indexed for the first, i.e. a(x1, :) vs a(:, x1).
Jaroslav  I updated to hg head and it was faster (cut time in half for the test example), but on real code not quite as much as I'd hoped.
Nicholas
_______________________________________________
Helpoctave mailing list
[hidden email]
https://wwwold.cae.wisc.edu/mailman/listinfo/helpoctave


On Wed, Jan 14, 2009 at 01:36, Jaroslav Hajek <[hidden email]> wrote:
On Wed, Jan 14, 2009 at 5:00 AM, Nicholas Tung < [hidden email]> wrote:
> On Tue, Jan 13, 2009 at 16:46, Francesco Potortì < [hidden email]>
> wrote:
>>
>> >I've noticed that indexing expressions on large arrays can be quite slow,
>>
>> Have a look at this, which I wrote in November but for some reason does
>> not appear in the list archives. In an application of mine, I wrote an
>> ad hoc function to access a 5dim matrix using the last method below,
>> getting a significant speedup. I usually run 3.0.1, and I did not make
>> any check to compare its speed with the more recent Octave versions.
>>
>> Date: 11 Nov 2008 10:44:45 +0100
>> From: Francesco Potortì < [hidden email]>
>> To: Jaroslav Hajek < [hidden email]>
>> CC: Octave help list < [hidden email]>
>> Subject: Re: slow access to consecutive matrix elements
>> 
>> >On Sat, Oct 11, 2008 at 10:22 AM, Francesco Potorti`
>> >< [hidden email]> wrote:
>> >> This is a real life example that demonstrates how access to matrix
>> >> elements is not optimised for the case of complete ranges in the first
>> >> dimensions. I am sending it here and not to the bug list because this
>> >> example may be useful to others, at least until this cases is
>> optimised
>> >> in the sources.
>> >>
>> >> # This is a big 5dim matrix, about 100MB size
>> >> octave> kk=rand(156,222,1,44,8);
>> >>
>> >> # I access a slice of it, which is sequential in memory
>> >> octave> t=cputime; for ii=1:44, for jj=1:8
>> >> mm=kk(:,:,:,ii,jj); endfor, endfor, cputimet
>> >> ans = 5.9124
>> >>
>> >> # Using a linear index is much faster
>> >> octave> span=(1:156*222);
>> >> t=cputime; for ii=1:44, for jj=1:8
>> >> mm=kk(sub2ind(size(kk),1,1,1,ii,jj)+span1); endfor, endfor,
>> cputimet
>> >> ans = 2.6642
>> >>
>> >> # Removing the call to sub2ind reaches top speed
>> >> octave> cp=[1,cumprod(size(kk)(1:end1))]; span=(1:156*222);
>> >> t=cputime; for ii=1:44, for jj=1:8
>> >> mm=kk(sum(([1,1,1,ii,jj]1).*cp)+span); endfor, endfor, cputimet
>> >> ans = 1.4001
>> >>
>> >> Still, I wish access were faster yet. Is there a reason why copying
>> >> consecutive memory is so slow? I wish I could help with optimising
>> >> this, even if I am certainly not the most qualified person to do it.
>> >>
>> >
>> >I guess the indexing routines do deserve some attention w.r.t
>> >performance. Reducing code duplication would also be nice. I have this
>> >on my TODO list, but I don't think it's a good idea to do it before
>> >3.2.x is forked, as such changes are, IMO, likely to introduce bugs.
>> 
>> Follwoing up on this, I realised that there is room for further
>> significant speedup:
>> 
>> # Be careful to not convert ranges into matrices
>> octave> cp=[1,cumprod(size(kk)(1:end1))]; len=156*222;
>>  t=cputime; for ii=1:44, for jj=1:8
>> base=sum(([1,1,1,ii,jj]1).*cp); mm=kk(base+1:base+len); endfor, endfor,
>> cputimet
>> ans = 0.26402
>> 
>> The fact is, I was discounting the fact that a range remains a range
>> even after linear transformation, while this does not appear to be the
>> case:
>> 
>> octave3.1> typeinfo(1:4)
>> ans = range
>> octave3.1> typeinfo(4+(1:4))
>> ans = matrix
>> octave3.1> typeinfo(4*(1:4))
>> ans = matrix
>> 
>> >From the depth of my ignorance about Octave's internals, I dare say that
>> it should not be too difficult to keep ranges as ranges even after sum
>> or product with a scalar. Maybe even after sum with a range with the
>> same number of elements. Am I wrong?
>
> neat, I didn't know that 1d vector indexing worked on matrices. However, I
> think it may be faster for you because of the multidimensional aspect,
> whereas I am working with more values (potentially large image samples). On
> Octave 3.1.51+ (built today) the following code runs in about the same time.
> If you're doing something more complicated, please tell me. I know the two
> snippets below don't compute exactly the same value...
>
> tic, x1 = 1:299; x2 = 2:300; for i = 1:100, a(:, x1) += a(:, x2); endfor,
> toc
> x1 = 1:89700; x2 = 301:90000; tic, for i = 1:100, a(x1) += a(x2); endfor,
> toc
>
> As another note, it doesn't seem to matter that much which way the matrix is
> indexed for the first, i.e. a(x1, :) vs a(:, x1).
>
> Jaroslav  I updated to hg head and it was faster (cut time in half for the
> test example), but on real code not quite as much as I'd hoped.
>
> Nicholas
>
Hi Nicholas,
if you can identify bottlenecks and post them as reasonably small
selfcontained examples, we can surely discuss them and possible
improvements.
regards

RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz Thanks so much. I uploaded a real example to http://gatoatigrado.arvixe.com/files/temp/octave_indexing_perf.tgz
The script is 60 lines long; it should perform a twostep Haar transformation (actually only for the wavelet coefficients) lifting method decomposition. Run it with the following: octave encode_image_standalone.m attachment.jpg
I looped it 450 times to simulate the amount of processing that I need. On my machine, the base time is 26.6 seconds; if you remove the matrix scalar indexing (original file lines 36, 37 and 43, 44), it still takes 17 seconds just to run through the interpreter. Replacing both of the lines above with
gamma_ += alpha_arr(shift_idx) * gamma_; lambda_ += alpha_arr(shift_idx) * lambda_; respectively, it takes about 20 seconds, which is much closer to the interpreter overhead (which I might be able to cut down by reducing the number of lines; I'm not sure.
Also, Francesco, I forgot to do the obvious thing and test your examples with 3.1.51. At least on my machine, it looks like the performance has been reversed by the latest dense indexing improvements. octave:1> kk=rand(156,222,1,44,8);
octave:2> t=cputime; for ii=1:44, for jj=1:8, mm=kk(:,:,:,ii,jj); endfor, endfor, cputimet ans = 0.072004 octave:3> cp=[1,cumprod(size(kk)(1:end1))]; len=156*222; t=cputime; for ii=1:44, for jj=1:8,base=sum(([1,1,1,ii,jj]1).*cp); mm=kk(base+1:base+len); endfor, endfor, cputimet
ans = 0.12401 Thanks again, Nicholas
_______________________________________________
Helpoctave mailing list
[hidden email]
https://wwwold.cae.wisc.edu/mailman/listinfo/helpoctave


>Also, Francesco, I forgot to do the obvious thing and test your examples
>with 3.1.51. At least on my machine, it looks like the performance has been
>reversed by the latest dense indexing improvements.
>
>octave:1> kk=rand(156,222,1,44,8);
>octave:2> t=cputime; for ii=1:44, for jj=1:8, mm=kk(:,:,:,ii,jj); endfor,
>endfor, cputimet
>ans = 0.072004
>octave:3> cp=[1,cumprod(size(kk)(1:end1))]; len=156*222; t=cputime; for
>ii=1:44, for jj=1:8,base=sum(([1,1,1,ii,jj]1).*cp); mm=kk(base+1:base+len);
>endfor, endfor, cputimet
>ans = 0.12401
Wonderful! This will make my code more readable as soon as I upgrade :)
Thank you Jaroslav!

Francesco Potortì (ricercatore) Voice: +39 050 315 3058 (op.2111)
ISTI  Area della ricerca CNR Fax: +39 050 315 2040
via G. Moruzzi 1, I56124 Pisa Email: [hidden email]
(entrance 20, 1st floor, room C71) Web: http://fly.isti.cnr.it/_______________________________________________
Helpoctave mailing list
[hidden email]
https://wwwold.cae.wisc.edu/mailman/listinfo/helpoctave


On Wed, Jan 14, 2009 at 5:36 PM, Francesco Potortì < [hidden email]> wrote:
>>Also, Francesco, I forgot to do the obvious thing and test your examples
>>with 3.1.51. At least on my machine, it looks like the performance has been
>>reversed by the latest dense indexing improvements.
>>
>>octave:1> kk=rand(156,222,1,44,8);
>>octave:2> t=cputime; for ii=1:44, for jj=1:8, mm=kk(:,:,:,ii,jj); endfor,
>>endfor, cputimet
>>ans = 0.072004
>>octave:3> cp=[1,cumprod(size(kk)(1:end1))]; len=156*222; t=cputime; for
>>ii=1:44, for jj=1:8,base=sum(([1,1,1,ii,jj]1).*cp); mm=kk(base+1:base+len);
>>endfor, endfor, cputimet
>>ans = 0.12401
>
> Wonderful! This will make my code more readable as soon as I upgrade :)
>
> Thank you Jaroslav!
>
You're welcome. After all, your original report has been the starting
impulse for that work, so part of the credit goes to you... it's hard
to work on performance improvements without such rational starting
impulses (by rational I mean not the sort of "hey guys, Octave sucks,
'cause my code is slow on it, do something 'bout it"), because there's
an awful lot of places in Octave that are amenable to some
optimizations, but of course not everything is equally painful. So,
working simplistic benchmarks that grasp the core of some performance
problem are precious stuff.
Of course, they're best done with development sources, but you know that :)
You may also be interested in the most recent indexing improvements:
http://hg.savannah.gnu.org/hgweb/octave/rev/ad3afaaa19c1http://www.nabble.com/lazycontiguoussubrangeindexingtd21458811.htmlthis will cheat your first loop to run in almost zero time...
cheers

RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz
_______________________________________________
Helpoctave mailing list
[hidden email]
https://wwwold.cae.wisc.edu/mailman/listinfo/helpoctave


I had written:
>>Also, Francesco, I forgot to do the obvious thing and test your examples
>>with 3.1.51. At least on my machine, it looks like the performance has been
>>reversed by the latest dense indexing improvements.
>>
>>octave:1> kk=rand(156,222,1,44,8);
>>octave:2> t=cputime; for ii=1:44, for jj=1:8, mm=kk(:,:,:,ii,jj); endfor,
>>endfor, cputimet
>>ans = 0.072004
>>octave:3> cp=[1,cumprod(size(kk)(1:end1))]; len=156*222; t=cputime; for
>>ii=1:44, for jj=1:8,base=sum(([1,1,1,ii,jj]1).*cp); mm=kk(base+1:base+len);
>>endfor, endfor, cputimet
>>ans = 0.12401
>
>Wonderful! This will make my code more readable as soon as I upgrade :)
Hm. I installed Octave 3.1.51 from Debian's "experimental" (I never got
round to finding the time to build Octave myself). I use an amd64 box.
Unfortunately, I see no big difference with 3.0.1 :(
The timings of the first test versus the second on my box are:
 5.1s versus 0.30s with 3.0.1
 5.2s versus 0.27s with 3.1.51
So apparently I am not benefitting from the 3.1.51 dense indexing improvements.

Francesco Potortì (ricercatore) Voice: +39 050 315 3058 (op.2111)
ISTI  Area della ricerca CNR Fax: +39 050 315 2040
via G. Moruzzi 1, I56124 Pisa Email: [hidden email]
(entrance 20, 1st floor, room C71) Web: http://fly.isti.cnr.it/_______________________________________________
Helpoctave mailing list
[hidden email]
https://wwwold.cae.wisc.edu/mailman/listinfo/helpoctave


On Thu, Jan 15, 2009 at 1:48 PM, Francesco Potortì < [hidden email]> wrote:
> I had written:
>>>Also, Francesco, I forgot to do the obvious thing and test your examples
>>>with 3.1.51. At least on my machine, it looks like the performance has been
>>>reversed by the latest dense indexing improvements.
>>>
>>>octave:1> kk=rand(156,222,1,44,8);
>>>octave:2> t=cputime; for ii=1:44, for jj=1:8, mm=kk(:,:,:,ii,jj); endfor,
>>>endfor, cputimet
>>>ans = 0.072004
>>>octave:3> cp=[1,cumprod(size(kk)(1:end1))]; len=156*222; t=cputime; for
>>>ii=1:44, for jj=1:8,base=sum(([1,1,1,ii,jj]1).*cp); mm=kk(base+1:base+len);
>>>endfor, endfor, cputimet
>>>ans = 0.12401
>>
>>Wonderful! This will make my code more readable as soon as I upgrade :)
>
> Hm. I installed Octave 3.1.51 from Debian's "experimental" (I never got
> round to finding the time to build Octave myself). I use an amd64 box.
> Unfortunately, I see no big difference with 3.0.1 :(
>
> The timings of the first test versus the second on my box are:
>  5.1s versus 0.30s with 3.0.1
>  5.2s versus 0.27s with 3.1.51
>
> So apparently I am not benefitting from the 3.1.51 dense indexing improvements.
>
The dense indexing improvements started with
http://hg.savannah.gnu.org/hgweb/octave/rev/7cbe01c21986dated 20th October and lots of bugfixes afterwards. I don't think this
was in the 3.1.51 snapshot which I think was done in late July. There
was no 3.1.52 so far. So, if you want these, development sources are
your only resort.
regards

RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz
_______________________________________________
Helpoctave mailing list
[hidden email]
https://wwwold.cae.wisc.edu/mailman/listinfo/helpoctave


On Thu, Jan 15, 2009 at 08:17, Jaroslav Hajek <[hidden email]> wrote:
On Wed, Jan 14, 2009 at 5:24 PM, Nicholas Tung < [hidden email]> wrote:
> On Wed, Jan 14, 2009 at 01:36, Jaroslav Hajek < [hidden email]> wrote:
>>
>> On Wed, Jan 14, 2009 at 5:00 AM, Nicholas Tung < [hidden email]>
>> wrote:
>> > On Tue, Jan 13, 2009 at 16:46, Francesco Potortì < [hidden email]>
>> > wrote:
>> >>
>> >> >I've noticed that indexing expressions on large arrays can be quite
>> >> > slow,
>> >>
>> >> Have a look at this, which I wrote in November but for some reason does
>> >> not appear in the list archives. In an application of mine, I wrote an
>> >> ad hoc function to access a 5dim matrix using the last method below,
>> >> getting a significant speedup. I usually run 3.0.1, and I did not make
>> >> any check to compare its speed with the more recent Octave versions.
>> >>
>> >> Date: 11 Nov 2008 10:44:45 +0100
>> >> From: Francesco Potortì < [hidden email]>
>> >> To: Jaroslav Hajek < [hidden email]>
>> >> CC: Octave help list < [hidden email]>
>> >> Subject: Re: slow access to consecutive matrix elements
>> >> 
>> >> >On Sat, Oct 11, 2008 at 10:22 AM, Francesco Potorti`
>> >> >< [hidden email]> wrote:
>> >> >> This is a real life example that demonstrates how access to matrix
>> >> >> elements is not optimised for the case of complete ranges in the
>> >> first
>> >> >> dimensions. I am sending it here and not to the bug list because
>> >> this
>> >> >> example may be useful to others, at least until this cases is
>> >> optimised
>> >> >> in the sources.
>> >> >>
>> >> >> # This is a big 5dim matrix, about 100MB size
>> >> >> octave> kk=rand(156,222,1,44,8);
>> >> >>
>> >> >> # I access a slice of it, which is sequential in memory
>> >> >> octave> t=cputime; for ii=1:44, for jj=1:8
>> >> >> mm=kk(:,:,:,ii,jj); endfor, endfor, cputimet
>> >> >> ans = 5.9124
>> >> >>
>> >> >> # Using a linear index is much faster
>> >> >> octave> span=(1:156*222);
>> >> >> t=cputime; for ii=1:44, for jj=1:8
>> >> >> mm=kk(sub2ind(size(kk),1,1,1,ii,jj)+span1); endfor, endfor,
>> >> cputimet
>> >> >> ans = 2.6642
>> >> >>
>> >> >> # Removing the call to sub2ind reaches top speed
>> >> >> octave> cp=[1,cumprod(size(kk)(1:end1))]; span=(1:156*222);
>> >> >> t=cputime; for ii=1:44, for jj=1:8
>> >> >> mm=kk(sum(([1,1,1,ii,jj]1).*cp)+span); endfor, endfor, cputimet
>> >> >> ans = 1.4001
>> >> >>
>> >> >> Still, I wish access were faster yet. Is there a reason why
>> >> copying
>> >> >> consecutive memory is so slow? I wish I could help with optimising
>> >> >> this, even if I am certainly not the most qualified person to do
>> >> it.
>> >> >>
>> >> >
>> >> >I guess the indexing routines do deserve some attention w.r.t
>> >> >performance. Reducing code duplication would also be nice. I have
>> >> this
>> >> >on my TODO list, but I don't think it's a good idea to do it before
>> >> >3.2.x is forked, as such changes are, IMO, likely to introduce bugs.
>> >> 
>> >> Follwoing up on this, I realised that there is room for further
>> >> significant speedup:
>> >> 
>> >> # Be careful to not convert ranges into matrices
>> >> octave> cp=[1,cumprod(size(kk)(1:end1))]; len=156*222;
>> >>  t=cputime; for ii=1:44, for jj=1:8
>> >> base=sum(([1,1,1,ii,jj]1).*cp); mm=kk(base+1:base+len); endfor,
>> >> endfor,
>> >> cputimet
>> >> ans = 0.26402
>> >> 
>> >> The fact is, I was discounting the fact that a range remains a range
>> >> even after linear transformation, while this does not appear to be the
>> >> case:
>> >> 
>> >> octave3.1> typeinfo(1:4)
>> >> ans = range
>> >> octave3.1> typeinfo(4+(1:4))
>> >> ans = matrix
>> >> octave3.1> typeinfo(4*(1:4))
>> >> ans = matrix
>> >> 
>> >> >From the depth of my ignorance about Octave's internals, I dare say
>> >> that
>> >> it should not be too difficult to keep ranges as ranges even after sum
>> >> or product with a scalar. Maybe even after sum with a range with the
>> >> same number of elements. Am I wrong?
>> >
>> > neat, I didn't know that 1d vector indexing worked on matrices.
>> > However, I
>> > think it may be faster for you because of the multidimensional aspect,
>> > whereas I am working with more values (potentially large image samples).
>> > On
>> > Octave 3.1.51+ (built today) the following code runs in about the same
>> > time.
>> > If you're doing something more complicated, please tell me. I know the
>> > two
>> > snippets below don't compute exactly the same value...
>> >
>> > tic, x1 = 1:299; x2 = 2:300; for i = 1:100, a(:, x1) += a(:, x2);
>> > endfor,
>> > toc
>> > x1 = 1:89700; x2 = 301:90000; tic, for i = 1:100, a(x1) += a(x2);
>> > endfor,
>> > toc
>> >
>> > As another note, it doesn't seem to matter that much which way the
>> > matrix is
>> > indexed for the first, i.e. a(x1, :) vs a(:, x1).
>> >
>> > Jaroslav  I updated to hg head and it was faster (cut time in half for
>> > the
>> > test example), but on real code not quite as much as I'd hoped.
>> >
>> > Nicholas
>> >
>>
>> Hi Nicholas,
>>
>> if you can identify bottlenecks and post them as reasonably small
>> selfcontained examples, we can surely discuss them and possible
>> improvements.
>>
>> regards
>>
>> 
>> RNDr. Jaroslav Hajek
>> computing expert
>> Aeronautical Research and Test Institute (VZLU)
>> Prague, Czech Republic
>> url: www.highegg.matfyz.cz
>
> Thanks so much. I uploaded a real example to
> http://gatoatigrado.arvixe.com/files/temp/octave_indexing_perf.tgz
>
> The script is 60 lines long; it should perform a twostep Haar
> transformation (actually only for the wavelet coefficients) lifting method
> decomposition. Run it with the following:
> octave encode_image_standalone.m attachment.jpg
>
> I looped it 450 times to simulate the amount of processing that I need. On
> my machine, the base time is 26.6 seconds; if you remove the matrix scalar
> indexing (original file lines 36, 37 and 43, 44), it still takes 17 seconds
> just to run through the interpreter. Replacing both of the lines above with
> gamma_ += alpha_arr(shift_idx) * gamma_;
> lambda_ += alpha_arr(shift_idx) * lambda_;
> respectively, it takes about 20 seconds, which is much closer to the
> interpreter overhead (which I might be able to cut down by reducing the
> number of lines; I'm not sure.
>
20 seconds are certainly not the interpreter overhead  I daresay that
is almost negligible here. Apart from the lines you've picked up, most
time is spent extracting gamma and lambda from the matrix and putting
it back, and, more importantly, transposing it twice. Transposition is
a relatively costly operation for large matrices because it is not
cache friendly (even though David Bateman optimized it somewhat to
reduce the number of cache misses, so it's faster in development
sources). It's not so large, 72x128. With only extraction [and transformation], i.e. commenting lines 26 to 47, it only takes 0.71 seconds. Tell me if you get different results. I'm running this on a Core Duo [not core 2 duo] U2400; it has 2mb of cache.
I didn't check very closely but it appears to me you're really doing a
row & column convolution. Perhaps conv2 can do the job for you? That
would also avoid the transpose, so it should make your code ultimately
faster.
Conv2 is a compiled function and so should be fast, though I guess
(from a quick scan) there's room to improve performance even there. That's right! Thanks so much! That cuts it down to about 9.5 seconds.
_______________________________________________
Helpoctave mailing list
[hidden email]
https://wwwold.cae.wisc.edu/mailman/listinfo/helpoctave


Am Donnerstag, den 15.01.2009, 13:57 +0100 schrieb Jaroslav Hajek:
> On Thu, Jan 15, 2009 at 1:48 PM, Francesco Potortì < [hidden email]> wrote:
> > I had written:
> >>>Also, Francesco, I forgot to do the obvious thing and test your examples
> >>>with 3.1.51. At least on my machine, it looks like the performance has been
> >>>reversed by the latest dense indexing improvements.
> >>>
> >>>octave:1> kk=rand(156,222,1,44,8);
> >>>octave:2> t=cputime; for ii=1:44, for jj=1:8, mm=kk(:,:,:,ii,jj); endfor,
> >>>endfor, cputimet
> >>>ans = 0.072004
> >>>octave:3> cp=[1,cumprod(size(kk)(1:end1))]; len=156*222; t=cputime; for
> >>>ii=1:44, for jj=1:8,base=sum(([1,1,1,ii,jj]1).*cp); mm=kk(base+1:base+len);
> >>>endfor, endfor, cputimet
> >>>ans = 0.12401
> >>
> >>Wonderful! This will make my code more readable as soon as I upgrade :)
> >
> > Hm. I installed Octave 3.1.51 from Debian's "experimental" (I never got
> > round to finding the time to build Octave myself). I use an amd64 box.
> > Unfortunately, I see no big difference with 3.0.1 :(
> >
> > The timings of the first test versus the second on my box are:
> >  5.1s versus 0.30s with 3.0.1
> >  5.2s versus 0.27s with 3.1.51
> >
> > So apparently I am not benefitting from the 3.1.51 dense indexing improvements.
> >
>
> The dense indexing improvements started with
> http://hg.savannah.gnu.org/hgweb/octave/rev/7cbe01c21986> dated 20th October and lots of bugfixes afterwards. I don't think this
> was in the 3.1.51 snapshot which I think was done in late July. There
> was no 3.1.52 so far. So, if you want these, development sources are
> your only resort.
I'd vote for a new snapshot. Or, if you can give me a mercurial id where
sources are in a reasonable[1] shape, we can use that for a new snapshot
package in Debian.
[1] Reasonable by whatever metric you choose.
Thomas
_______________________________________________
Helpoctave mailing list
[hidden email]
https://wwwold.cae.wisc.edu/mailman/listinfo/helpoctave


On Fri, Jan 16, 2009 at 9:30 AM, Thomas Weber
< [hidden email]> wrote:
> Am Donnerstag, den 15.01.2009, 13:57 +0100 schrieb Jaroslav Hajek:
>> On Thu, Jan 15, 2009 at 1:48 PM, Francesco Potortì < [hidden email]> wrote:
>> > I had written:
>> >>>Also, Francesco, I forgot to do the obvious thing and test your examples
>> >>>with 3.1.51. At least on my machine, it looks like the performance has been
>> >>>reversed by the latest dense indexing improvements.
>> >>>
>> >>>octave:1> kk=rand(156,222,1,44,8);
>> >>>octave:2> t=cputime; for ii=1:44, for jj=1:8, mm=kk(:,:,:,ii,jj); endfor,
>> >>>endfor, cputimet
>> >>>ans = 0.072004
>> >>>octave:3> cp=[1,cumprod(size(kk)(1:end1))]; len=156*222; t=cputime; for
>> >>>ii=1:44, for jj=1:8,base=sum(([1,1,1,ii,jj]1).*cp); mm=kk(base+1:base+len);
>> >>>endfor, endfor, cputimet
>> >>>ans = 0.12401
>> >>
>> >>Wonderful! This will make my code more readable as soon as I upgrade :)
>> >
>> > Hm. I installed Octave 3.1.51 from Debian's "experimental" (I never got
>> > round to finding the time to build Octave myself). I use an amd64 box.
>> > Unfortunately, I see no big difference with 3.0.1 :(
>> >
>> > The timings of the first test versus the second on my box are:
>> >  5.1s versus 0.30s with 3.0.1
>> >  5.2s versus 0.27s with 3.1.51
>> >
>> > So apparently I am not benefitting from the 3.1.51 dense indexing improvements.
>> >
>>
>> The dense indexing improvements started with
>> http://hg.savannah.gnu.org/hgweb/octave/rev/7cbe01c21986>> dated 20th October and lots of bugfixes afterwards. I don't think this
>> was in the 3.1.51 snapshot which I think was done in late July. There
>> was no 3.1.52 so far. So, if you want these, development sources are
>> your only resort.
>
> I'd vote for a new snapshot. Or, if you can give me a mercurial id where
> sources are in a reasonable[1] shape, we can use that for a new snapshot
> package in Debian.
>
> [1] Reasonable by whatever metric you choose.
>
> Thomas
>
A very loose metric is as follows:
Checkout a current tip, and wait a whole day for possible patches
coined like "omission from last patch" etc.
If no such patch appears, you have a reasonable shape.
But, given that there were rumors about making 3.1.52 testing
snapshot, perhaps you'll want to wait for that?
My opinion for the best way to proceed is to fork 32x branch at a
suitable time (maybe now) and then make testing and later stable
releases from it, but I think John has a different idea.
Btw., I see Debian has a pretty complete octave3.0 package.
How hard would be packaging the qrupdate library for Debian?
( https://sourceforge.net/projects/qrupdate).
This will be used in 3.2 to provide qrupdate, qrinsert, cholupdate
etc. and I'm not yet sure whether to keep
a snapshot in libcruft or make it a weak dependency (Octave will run
fine without the functions, but some
functions will lose performance).
I am fairly ignorant about packaging in GNU/Linux distros :(.
cheers

RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz
_______________________________________________
Helpoctave mailing list
[hidden email]
https://wwwold.cae.wisc.edu/mailman/listinfo/helpoctave


Am Freitag, den 16.01.2009, 10:01 +0100 schrieb Jaroslav Hajek:
> >> The dense indexing improvements started with
> >> http://hg.savannah.gnu.org/hgweb/octave/rev/7cbe01c21986> >> dated 20th October and lots of bugfixes afterwards. I don't think this
> >> was in the 3.1.51 snapshot which I think was done in late July. There
> >> was no 3.1.52 so far. So, if you want these, development sources are
> >> your only resort.
> >
> > I'd vote for a new snapshot. Or, if you can give me a mercurial id where
> > sources are in a reasonable[1] shape, we can use that for a new snapshot
> > package in Debian.
> >
> > [1] Reasonable by whatever metric you choose.
> >
> > Thomas
> >
>
> A very loose metric is as follows:
> Checkout a current tip, and wait a whole day for possible patches
> coined like "omission from last patch" etc.
> If no such patch appears, you have a reasonable shape.
>
> But, given that there were rumors about making 3.1.52 testing
> snapshot, perhaps you'll want to wait for that?
How about a snapshot _now_? I mean, it's a snapshot, it's expected to
contain bugs.
> My opinion for the best way to proceed is to fork 32x branch at a
> suitable time (maybe now) and then make testing and later stable
> releases from it, but I think John has a different idea.
>
> Btw., I see Debian has a pretty complete octave3.0 package.
> How hard would be packaging the qrupdate library for Debian?
> ( https://sourceforge.net/projects/qrupdate).
Good question. I don't have much experience with packaging libraries.
The biggest problem are incompatible changes to the library's interface,
which means a proper SONAME.
It's somewhat complicated by the fact that the library uses Fortran. For
C/C++, you have knowledgable people at every corner. How much do you
expect the library to change in the future?
Thoma
_______________________________________________
Helpoctave mailing list
[hidden email]
https://wwwold.cae.wisc.edu/mailman/listinfo/helpoctave


On Fri, Jan 16, 2009 at 10:47 AM, Thomas Weber
< [hidden email]> wrote:
> Am Freitag, den 16.01.2009, 10:01 +0100 schrieb Jaroslav Hajek:
>> >> The dense indexing improvements started with
>> >> http://hg.savannah.gnu.org/hgweb/octave/rev/7cbe01c21986>> >> dated 20th October and lots of bugfixes afterwards. I don't think this
>> >> was in the 3.1.51 snapshot which I think was done in late July. There
>> >> was no 3.1.52 so far. So, if you want these, development sources are
>> >> your only resort.
>> >
>> > I'd vote for a new snapshot. Or, if you can give me a mercurial id where
>> > sources are in a reasonable[1] shape, we can use that for a new snapshot
>> > package in Debian.
>> >
>> > [1] Reasonable by whatever metric you choose.
>> >
>> > Thomas
>> >
>>
>> A very loose metric is as follows:
>> Checkout a current tip, and wait a whole day for possible patches
>> coined like "omission from last patch" etc.
>> If no such patch appears, you have a reasonable shape.
>>
>> But, given that there were rumors about making 3.1.52 testing
>> snapshot, perhaps you'll want to wait for that?
>
> How about a snapshot _now_? I mean, it's a snapshot, it's expected to
> contain bugs.
>
>> My opinion for the best way to proceed is to fork 32x branch at a
>> suitable time (maybe now) and then make testing and later stable
>> releases from it, but I think John has a different idea.
>>
>> Btw., I see Debian has a pretty complete octave3.0 package.
>> How hard would be packaging the qrupdate library for Debian?
>> ( https://sourceforge.net/projects/qrupdate).
>
> Good question. I don't have much experience with packaging libraries.
> The biggest problem are incompatible changes to the library's interface,
> which means a proper SONAME.
>
> It's somewhat complicated by the fact that the library uses Fortran. For
> C/C++, you have knowledgable people at every corner. How much do you
> expect the library to change in the future?
>
I will probably add more routines. The current set for QR & Cholesky
is fairly complete, but there are other factorizations left  QRZ, LU,
SVD... I guess adding a routine does not mean an incompatible change,
right? In that sense, I will aim for minimal changes.
Fortran is used for various reasons. My choices have only been C or
Fortran, because I want the lib to be easily usable from Fortran
programs  we here at VZLU Prague make fairly good use of Fortran for
inhouse codes. As usually, Fortran won on its native support for
matrices, which need to be simulated by ugly macro tricks in C, or
simply coded in alllineararrays style. Also, Fortran doesn't tend to
be slower than C, sometimes even faster.
I think gfortran is generally regarded mature (at least w.r.t. F90
support) since gcc 4.2, so you can say Fortran is well supported on
free platforms.
regards

RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz
_______________________________________________
Helpoctave mailing list
[hidden email]
https://wwwold.cae.wisc.edu/mailman/listinfo/helpoctave


On Fri, Jan 16, 2009 at 10:47 AM, Thomas Weber
< [hidden email]> wrote:
> Am Freitag, den 16.01.2009, 10:01 +0100 schrieb Jaroslav Hajek:
>> >> The dense indexing improvements started with
>> >> http://hg.savannah.gnu.org/hgweb/octave/rev/7cbe01c21986>> >> dated 20th October and lots of bugfixes afterwards. I don't think this
>> >> was in the 3.1.51 snapshot which I think was done in late July. There
>> >> was no 3.1.52 so far. So, if you want these, development sources are
>> >> your only resort.
>> >
>> > I'd vote for a new snapshot. Or, if you can give me a mercurial id where
>> > sources are in a reasonable[1] shape, we can use that for a new snapshot
>> > package in Debian.
>> >
>> > [1] Reasonable by whatever metric you choose.
>> >
>> > Thomas
>> >
>>
>> A very loose metric is as follows:
>> Checkout a current tip, and wait a whole day for possible patches
>> coined like "omission from last patch" etc.
>> If no such patch appears, you have a reasonable shape.
>>
>> But, given that there were rumors about making 3.1.52 testing
>> snapshot, perhaps you'll want to wait for that?
>
> How about a snapshot _now_? I mean, it's a snapshot, it's expected to
> contain bugs.
>
Snapshots from development sources are in JWE's hands.

RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz
_______________________________________________
Helpoctave mailing list
[hidden email]
https://wwwold.cae.wisc.edu/mailman/listinfo/helpoctave


2009/1/16 Thomas Weber < [hidden email]>:
> Am Freitag, den 16.01.2009, 10:01 +0100 schrieb Jaroslav Hajek:
>> Btw., I see Debian has a pretty complete octave3.0 package.
>> How hard would be packaging the qrupdate library for Debian?
>> ( https://sourceforge.net/projects/qrupdate).
>
> Good question. I don't have much experience with packaging libraries.
> The biggest problem are incompatible changes to the library's interface,
> which means a proper SONAME.
>
> It's somewhat complicated by the fact that the library uses Fortran. For
> C/C++, you have knowledgable people at every corner. How much do you
> expect the library to change in the future?
Well, we have some docs... I'm sure you've found them already:
http://packages.debian.org/sid/libpkgguideBut beware
http://bugs.debian.org/libpkgguideI didn't quite follow the discussion, so please tell me again why it's
desirable to package this library?
 Jordi G. H.
_______________________________________________
Helpoctave mailing list
[hidden email]
https://wwwold.cae.wisc.edu/mailman/listinfo/helpoctave

Administrator

On 16Jan2009, Thomas Weber wrote:
 How about a snapshot _now_? I mean, it's a snapshot, it's expected to
 contain bugs.
In that case, then just grab the sources from the Mercurial archive at
arbitrary times. You can do that on your own. You don't need to wait
for me. But I would prefer to make snapshots when I think things are
in a reasonable state so that we are not flooded with a bunch of bug
reports for obvious problems from people who think we are idiots for
making "releases" that contain such stupid bugs.
At the moment, I'm trying to work through the backlog of bug reports
and submitted patches. Once I've made some more progress on that,
I'll post to the maintainers list about what I think is left for the
3.2 release and then we can start making a series of snapshots leading
up to the release.
jwe
_______________________________________________
Helpoctave mailing list
[hidden email]
https://wwwold.cae.wisc.edu/mailman/listinfo/helpoctave


In reply to this post by Jordi Gutiérrez Hermoso
On Fri, Jan 16, 2009 at 3:07 PM, Jordi Gutiérrez Hermoso
< [hidden email]> wrote:
> 2009/1/16 Thomas Weber < [hidden email]>:
>> Am Freitag, den 16.01.2009, 10:01 +0100 schrieb Jaroslav Hajek:
>>> Btw., I see Debian has a pretty complete octave3.0 package.
>>> How hard would be packaging the qrupdate library for Debian?
>>> ( https://sourceforge.net/projects/qrupdate).
>>
>> Good question. I don't have much experience with packaging libraries.
>> The biggest problem are incompatible changes to the library's interface,
>> which means a proper SONAME.
>>
>> It's somewhat complicated by the fact that the library uses Fortran. For
>> C/C++, you have knowledgable people at every corner. How much do you
>> expect the library to change in the future?
>
> Well, we have some docs... I'm sure you've found them already:
>
> http://packages.debian.org/sid/libpkgguide>
> But beware
>
> http://bugs.debian.org/libpkgguide>
> I didn't quite follow the discussion, so please tell me again why it's
> desirable to package this library?
>
Octave uses it to implement the qrupdate, qrinsert, qrdelete, qrshift,
cholupdate, cholinsert, choldelete and cholshift functions. Up to now
a snapshot of version 0 of the lib was included directly in Octave
sources. During December I finally got to rewriting the library to get
a more robust interface and incorporate various improvements, and I
released it (it was GPLed from the beg) on SourceForge.
It is of course possible to carry on the inclusion in libcruft, but my
impression is that it is sort of the opposite direction we normally
go, so now I aim to remove it and simply check for qrupdate during
configure, disabling the relevant functionality if it is not found (we
do the same thing with other libs).
For those who build Octave from sources there is hardly a difference
(except that keeping a lib separate allows separate building), but it
has been pointed out that in a distro, a library needs to be packaged
and that can be problematic.
So, the question is, how much effort does it take to package a
(Fortran) library?
Will someone be willing to do it? I could probably help, if I'm told how.
I really have little knowledge here.
To make it more clear, if such a change is likely to results in the
state when in all distros you will get Octave without qrupdate, it may
be better to keep it in libcruft.

RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz
_______________________________________________
Helpoctave mailing list
[hidden email]
https://wwwold.cae.wisc.edu/mailman/listinfo/helpoctave

