# My vectorized code is slower than a loop Classic List Threaded 11 messages Open this post in threaded view
|

## My vectorized code is slower than a loop

 In the process of writing a routine to compute the Qn statistic of Croux and Rousseeuw, I started from the Fortran code they published in their paper (C&R, "Time-efficient algorithms for two highly robust estimators of scale") and ran into unexpected trouble with one section. Here's a fairly literal translation of the original, where x is an array of length n, and left and right are also arrays of length n that contain integers: w = zeros(1, n); j = 1 for i = 1:n   if(left(i) <= right(i))     for jj = left(i):right(i)       w(j) = x(i) - x(n+1-jj);       j += 1;     endfor   endif endfor Before we get to this section, left and right are such that at most n items will get copied into w. Not surprisingly, this code is pretty slow. I see no way to eliminate the outside loop on i; but getting rid of the inside loop on jj is straightforward: w = zeros(1, n); j = 1 for i = 1:n   if(left(i) <= right(i))     w(j:j+right(i)-left(i)) = x(i) - x(n+1-[left(i):right(i)]);     j = j + right(i)-left(i)+1;   endif endfor Problem is, the second, vectorized, code takes twice as long as the first, non-vectorized, code. Anybody have an idea why? I'm stumped. I have attached a script that contains these two snippets, some setup, and tics and tocs, for anyone who wants to play with it. Thanks, Dave Cottingham looptest.m
Open this post in threaded view
|

## Re: My vectorized code is slower than a loop

 On Mon, Oct 17, 2016 at 3:37 PM, Dave Cottingham 2 <[hidden email]> wrote: > In the process of writing a routine to compute the Qn statistic of Croux and > Rousseeuw, I started from the Fortran code they published in their paper > (C&R, "Time-efficient algorithms for two highly robust estimators of scale") > and ran into unexpected trouble with one section. Here's a fairly literal > translation of the original, where x is an array of length n, and left and > right are also arrays of length n that contain integers: > > w = zeros(1, n); > j = 1 > for i = 1:n >   if(left(i) <= right(i)) >     for jj = left(i):right(i) >       w(j) = x(i) - x(n+1-jj); >       j += 1; >     endfor >   endif > endfor > > Before we get to this section, left and right are such that at most n items > will get copied into w. > > Not surprisingly, this code is pretty slow. I see no way to eliminate the > outside loop on i; but getting rid of the inside loop on jj is > straightforward: > > w = zeros(1, n); > j = 1 > for i = 1:n >   if(left(i) <= right(i)) >     w(j:j+right(i)-left(i)) = x(i) - x(n+1-[left(i):right(i)]); >     j = j + right(i)-left(i)+1; >   endif > endfor > > Problem is, the second, vectorized, code takes twice as long as the first, > non-vectorized, code. > > Anybody have an idea why? I'm stumped. > > I have attached a script that contains these two snippets, some setup, and > tics and tocs, for anyone who wants to play with it. > > Thanks, > Dave Cottingham Don't have time to test at the moment, but have you tried using the profiler? Octave's profiler isn't as thorough as Matlab's, but it should be able to show you where your scripts are spending the most time, and maybe why the latter is slower. profile on; *run script* profile off profshow and/or profexplore _______________________________________________ Help-octave mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-octave
Open this post in threaded view
|

## Re: My vectorized code is slower than a loop

 On Mon, Oct 17, 2016 at 10:34 PM, Nicholas Jankowski <[hidden email]> wrote: > On Mon, Oct 17, 2016 at 3:37 PM, Dave Cottingham 2 > <[hidden email]> wrote: >> In the process of writing a routine to compute the Qn statistic of Croux and >> Rousseeuw, I started from the Fortran code they published in their paper >> (C&R, "Time-efficient algorithms for two highly robust estimators of scale") >> and ran into unexpected trouble with one section. Here's a fairly literal >> translation of the original, where x is an array of length n, and left and >> right are also arrays of length n that contain integers: >> >> w = zeros(1, n); >> j = 1 >> for i = 1:n >>   if(left(i) <= right(i)) >>     for jj = left(i):right(i) >>       w(j) = x(i) - x(n+1-jj); >>       j += 1; >>     endfor >>   endif >> endfor >> >> Before we get to this section, left and right are such that at most n items >> will get copied into w. >> >> Not surprisingly, this code is pretty slow. I see no way to eliminate the >> outside loop on i; but getting rid of the inside loop on jj is >> straightforward: >> >> w = zeros(1, n); >> j = 1 >> for i = 1:n >>   if(left(i) <= right(i)) >>     w(j:j+right(i)-left(i)) = x(i) - x(n+1-[left(i):right(i)]); >>     j = j + right(i)-left(i)+1; >>   endif >> endfor >> >> Problem is, the second, vectorized, code takes twice as long as the first, >> non-vectorized, code. >> >> Anybody have an idea why? I'm stumped. >> >> I have attached a script that contains these two snippets, some setup, and >> tics and tocs, for anyone who wants to play with it. >> >> Thanks, >> Dave Cottingham > > > Don't have time to test at the moment, but have you tried using the > profiler? Octave's profiler isn't as thorough as Matlab's, but it > should be able to show you where your scripts are spending the most > time, and maybe why the latter is slower. > > profile on; > > *run script* > > profile off > > profshow > > and/or > > profexplore > > _______________________________________________ > Help-octave mailing list > [hidden email] > https://lists.gnu.org/mailman/listinfo/help-octaveHi, If you provide code we can directly test or study it will make things easier! Can you provide a working case? _______________________________________________ Help-octave mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-octave
Open this post in threaded view
|

## Re: My vectorized code is slower than a loop

 On Tue, Oct 18, 2016 at 11:43 AM, Juan Pablo Carbajal wrote:On Mon, Oct 17, 2016 at 10:34 PM, Nicholas Jankowski <[hidden email]> wrote: > On Mon, Oct 17, 2016 at 3:37 PM, Dave Cottingham 2 > <[hidden email]> wrote: >> In the process of writing a routine to compute the Qn statistic of Croux and >> Rousseeuw, I started from the Fortran code they published in their paper >> (C&R, "Time-efficient algorithms for two highly robust estimators of scale") >> and ran into unexpected trouble with one section. Here's a fairly literal >> translation of the original, where x is an array of length n, and left and >> right are also arrays of length n that contain integers: >> >> w = zeros(1, n); >> j = 1 >> for i = 1:n >>   if(left(i) <= right(i)) >>     for jj = left(i):right(i) >>       w(j) = x(i) - x(n+1-jj); >>       j += 1; >>     endfor >>   endif >> endfor >> >> Before we get to this section, left and right are such that at most n items >> will get copied into w. >> >> Not surprisingly, this code is pretty slow. I see no way to eliminate the >> outside loop on i; but getting rid of the inside loop on jj is >> straightforward: >> >> w = zeros(1, n); >> j = 1 >> for i = 1:n >>   if(left(i) <= right(i)) >>     w(j:j+right(i)-left(i)) = x(i) - x(n+1-[left(i):right(i)]); >>     j = j + right(i)-left(i)+1; >>   endif >> endfor >> >> Problem is, the second, vectorized, code takes twice as long as the first, >> non-vectorized, code. >> >> Anybody have an idea why? I'm stumped. >> >> I have attached a script that contains these two snippets, some setup, and >> tics and tocs, for anyone who wants to play with it. >> >> Thanks, >> Dave Cottingham > > > Don't have time to test at the moment, but have you tried using the > profiler? Octave's profiler isn't as thorough as Matlab's, but it > should be able to show you where your scripts are spending the most > time, and maybe why the latter is slower. > > profile on; > > *run script* > > profile off > > profshow > > and/or > > profexplore > > _______________________________________________ > Help-octave mailing list > [hidden email] > https://lists.gnu.org/mailman/listinfo/help-octave Hi, If you provide code we can directly test or study it will make things easier! Can you provide a working case? Was in my original post -- here it is again:http://octave.1599824.n4.nabble.com/file/n4680192/looptest.mThanks,Dave Cottingham _______________________________________________ Help-octave mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-octave
Open this post in threaded view
|

## Re: My vectorized code is slower than a loop

 In reply to this post by Dave Cottingham 2 Am 17.10.2016 um 21:37 schrieb Dave Cottingham 2: > In the process of writing a routine to compute the Qn statistic of Croux and > Rousseeuw, I started from the Fortran code they published in their paper > (C&R, "Time-efficient algorithms for two highly robust estimators of scale") > and ran into unexpected trouble with one section. Here's a fairly literal > translation of the original, where x is an array of length n, and left and > right are also arrays of length n that contain integers: > > code snipped Your script can be simplified to w = x-x(end:-1:1); try it out! Since "if(left(i) <= right(i))" is always true. Perhaps you can provide a realistic example? -- Andy _______________________________________________ Help-octave mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-octave
Open this post in threaded view
|

## Re: My vectorized code is slower than a loop

 On Tue, Oct 18, 2016 at 1:38 PM, Andreas Weber <[hidden email]> wrote: > Am 17.10.2016 um 21:37 schrieb Dave Cottingham 2: >> In the process of writing a routine to compute the Qn statistic of Croux and >> Rousseeuw, I started from the Fortran code they published in their paper >> (C&R, "Time-efficient algorithms for two highly robust estimators of scale") >> and ran into unexpected trouble with one section. Here's a fairly literal >> translation of the original, where x is an array of length n, and left and >> right are also arrays of length n that contain integers: >> >> code snipped > > Your script can be simplified to > > w = x-x(end:-1:1); > > try it out! > > Since "if(left(i) <= right(i))" is always true. Perhaps you can provide > a realistic example? > > -- Andy Andy's got it, but for reference here's the profiler timing for your code.  Andy's code appears to be too fast for the profiler :) Note the additional operation count in your vectorized code. vectorized Elapsed time is 5.917 seconds.    #            Function Attr     Time (s)   Time (%)        Calls ------------------------------------------------------------------    5            binary -             0.131      43.52       400000    4            binary +             0.121      40.20       400000    3           binary <=             0.049      16.28       100000    1                 tic             0.000       0.00            1    2               zeros             0.000       0.00            1    6                 toc             0.000       0.00            1    7             profile             0.000       0.00            1    8              nargin             0.000       0.00            1    9           binary !=             0.000       0.00            1   10               false             0.000       0.00            1   11 __profiler_enable__             0.000       0.00            1 non-vectorized Elapsed time is 3.594 seconds.    #            Function Attr     Time (s)   Time (%)        Calls ------------------------------------------------------------------    5            binary -             0.053      47.32       200000    4            binary +             0.030      26.79       100000    3           binary <=             0.029      25.89       100000    1                 tic             0.000       0.00            1    2               zeros             0.000       0.00            1    6                 toc             0.000       0.00            1    7             profile             0.000       0.00            1    8              nargin             0.000       0.00            1    9           binary !=             0.000       0.00            1   10               false             0.000       0.00            1   11 __profiler_enable__             0.000       0.00            1 Andy supercode Elapsed time is 0.039 seconds.    #            Function Attr     Time (s)   Time (%)        Calls ------------------------------------------------------------------ warning: division by zero warning: called from     profshow at line 85 column 17     testy at line 59 column 3    1                 tic             0.000        NaN            1 warning: division by zero    2                 end             0.000        NaN            1 warning: division by zero    3            prefix -             0.000        NaN            1 warning: division by zero    4            binary -             0.000        NaN            1 warning: division by zero    5                 toc             0.000        NaN            1 warning: division by zero    6             profile             0.000        NaN            1 warning: division by zero    7              nargin             0.000        NaN            1 warning: division by zero    8           binary !=             0.000        NaN            1 warning: division by zero    9               false             0.000        NaN            1 warning: division by zero   10 __profiler_enable__             0.000        NaN            1 >> _______________________________________________ Help-octave mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-octave
Open this post in threaded view
|

## Re: My vectorized code is slower than a loop

 In reply to this post by Andreas Weber-6 Andreas Weber-6 wrote Your script can be simplified to w = x-x(end:-1:1); try it out! Since "if(left(i) <= right(i))" is always true. Perhaps you can provide a realistic example? -- Andy _______________________________________________ Help-octave mailing list [hidden email]https://lists.gnu.org/mailman/listinfo/help-octave I try to keep the examples short -- and the computation of a realistic example is a bit involved -- so rather than post the code to compute the example, here's an octave saveset with a realistic example, and a modified "looptest.m" that reads it and runs the timing comparison. Thanks, Dave Cottingham looptest.datlooptest.m
Open this post in threaded view
|

## Re: My vectorized code is slower than a loop

 On Wed, Oct 19, 2016 at 10:26 AM, Dave Cottingham 2 <[hidden email]> wrote: > Andreas Weber-6 wrote >> Your script can be simplified to >> >> w = x-x(end:-1:1); >> >> try it out! >> >> Since "if(left(i) <= right(i))" is always true. Perhaps you can provide >> a realistic example? >> >> -- Andy >> >> _______________________________________________ >> Help-octave mailing list > >> Help-octave@ > >> https://lists.gnu.org/mailman/listinfo/help-octave> > I try to keep the examples short -- and the computation of a realistic > example is a bit involved -- so rather than post the code to compute the > example, here's an octave saveset with a realistic example, and a modified > "looptest.m" that reads it and runs the timing comparison. > > Thanks, > Dave Cottingham > > looptest.dat > > looptest.m > Yes, so Andy's code no longer produces an equivalent w.  Also, I didn't look to see if you changed any of the code, but there is now less of a penalty for the vectorized function. Again, use the profiler. it will show you where you're code is spending time.  Your vectorized code is simply making almost twice as many binary operation calls (500,000 vs 325000), eating up any of the non-looping benefit. here's the summary: vectorized Elapsed time is 6.26725 seconds.    #            Function Attr     Time (s)   Time (%)        Calls ------------------------------------------------------------------    4            binary -             0.122      41.92       199956    3            binary +             0.111      38.15       199956    2           binary <=             0.058      19.93       100000    1                 tic             0.000       0.00            1    5                 toc             0.000       0.00            1    6             profile             0.000       0.00            1    7              nargin             0.000       0.00            1    8           binary !=             0.000       0.00            1    9               false             0.000       0.00            1   10 __profiler_enable__             0.000       0.00            1 looped j =  1 Elapsed time is 5.94019 seconds.    #            Function Attr     Time (s)   Time (%)        Calls ------------------------------------------------------------------    4            binary -             0.084      40.38       150344    2           binary <=             0.062      29.81       100000    3            binary +             0.062      29.81        75172    1                 tic             0.000       0.00            1    5                 toc             0.000       0.00            1    6             profile             0.000       0.00            1    7              nargin             0.000       0.00            1    8           binary !=             0.000       0.00            1    9               false             0.000       0.00            1   10 __profiler_enable__             0.000       0.00            1 _______________________________________________ Help-octave mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-octave
Open this post in threaded view
|

## Re: My vectorized code is slower than a loop

 In reply to this post by Dave Cottingham 2 No profiling, no runs, just looking at the code, so I may be off. w = zeros(1, n); j = 1 for i = 1:n   if (vectorised)     if(left(i) <= right(i))       w(j:j+right(i)-left(i)) = x(i) - x(n+1-[left(i):right(i)]);       j = j + right(i)-left(i)+1;     endif   else # looping     if(left(i) <= right(i))       for jj = left(i):right(i)         w(j) = x(i) - x(n+1-jj);         j += 1;       endfor   endif endfor In the loop version, the if is useless, because the for range will be empty, so the for will not run when the if clause is false.  This may result in a speedup or not. In the vectorised version, I would write it like this, which minimises the number of operations at the expense of the number of instructions, which may result in a speedup or not:    li = left(i);    d = right(i) - li;    if (d >= 0)      dr = 0:d;      w(j+dr) = x(i) - x(n+1-li+dr);      j += d+1;    endif -- Francesco Potortì (ricercatore)        Voice:  +39.050.621.3058 ISTI - Area della ricerca CNR          Mobile: +39.348.8283.107 via G. Moruzzi 1, I-56124 Pisa         Skype:  wnlabisti (entrance 20, 1st floor, room C71)     Web:    http://fly.isti.cnr.it_______________________________________________ Help-octave mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-octave