Vectorizing simple loops

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Vectorizing simple loops

pawelz
Hi,

I have been profiling some code going through big datasets and find it impossible to express these simple algorithms in high-performing vectorized form instead of slow loop version which is so trivial I don't even show it here in the first case. I would appreciate your advice. My only answer at this point is to re-write these in C instead, which i would like to avoid.

Case A) Fill the blanks with last non-zero value:
Go through the input vector values one by one and output the value if not zero, or copy over the last non-zero value. Sample input and expected output vectors:
in  = [ 1 0 2 0 7 7 7 0 5 0 0 0 9 ]
out = [ 1 1 2 2 7 7 7 7 5 5 5 5 9 ]
I tried the merge(v==0, shift()...) but it only works for the first zero occurrence, not an arbitrary number of them.

Case B) seems a bit more difficult but in fact it is simple. Produce an output with a rule based on simple memory of the previous step. The basic trick would be to make a decision in each step, based on the previous steps output, i.e. produce out(i) based on the value of acc which was built using out(i-1). Again tried merge(shift()) to no avail...

in  = [2 2 1 -1 0 0 -2 2 0 1]
x   = 1;
top = 5;
acc = 0;
for i = 1 : length(in)
  if     (in(i) == +2)
          if (acc <      0) out(i) = -acc+x;
      elseif (acc >  top-x) out(i) = 0;
      elseif (acc >=     0) out(i) = x;
       endif
  elseif (in(i) == +1)
          if (acc >=     0) out(i) = 0;
      elseif (acc <      0) out(i) = -acc;
       endif
  elseif (in(i) ==       0) out(i) = 0;
  elseif (in(i) ==      -1)
          if (acc <=     0) out(i) = 0;
      elseif (acc >      0) out(i) = -acc;
       endif
  elseif (in(i) == -2)
          if (acc >      0) out(i) = -acc-x;
      elseif (acc < -top+x) out(i) = 0;
      elseif (acc <=     0) out(i) = -x;
       endif
  endif
  acc += out(i);
endfor
out

Cheers
Pawel
Reply | Threaded
Open this post in threaded view
|

Re: Vectorizing simple loops

nrjank


On Dec 2, 2015 6:56 AM, "pawelz" <[hidden email]> wrote:
>
> Hi,
>
> I have been profiling some code going through big datasets and find it
> impossible to express these simple algorithms in high-performing vectorized
> form instead of slow loop version which is so trivial I don't even show it
> here in the first case. I would appreciate your advice. My only answer at
> this point is to re-write these in C instead, which i would like to avoid.
>
> Case A) Fill the blanks with last non-zero value:
> Go through the input vector values one by one and output the value if not
> zero, or copy over the last non-zero value. Sample input and expected output
> vectors:
> in  = [ 1 0 2 0 7 7 7 0 5 0 0 0 9 ]
> out = [ 1 1 2 2 7 7 7 7 5 5 5 5 9 ]
> I tried the merge(v==0, shift()...) but it only works for the first zero
> occurrence, not an arbitrary number of them.
>
> Case B) seems a bit more difficult but in fact it is simple. Produce an
> output with a rule based on simple memory of the previous step. The basic
> trick would be to make a decision in each step, based on the previous steps
> output, i.e. produce out(i) based on the value of acc which was built using
> out(i-1). Again tried merge(shift()) to no avail...
>
> in  = [2 2 1 -1 0 0 -2 2 0 1]
> x   = 1;
> top = 5;
> acc = 0;
> for i = 1 : length(in)
>   if     (in(i) == +2)
>           if (acc <      0) out(i) = -acc+x;
>       elseif (acc >  top-x) out(i) = 0;
>       elseif (acc >=     0) out(i) = x;
>        endif
>   elseif (in(i) == +1)
>           if (acc >=     0) out(i) = 0;
>       elseif (acc <      0) out(i) = -acc;
>        endif
>   elseif (in(i) ==       0) out(i) = 0;
>   elseif (in(i) ==      -1)
>           if (acc <=     0) out(i) = 0;
>       elseif (acc >      0) out(i) = -acc;
>        endif
>   elseif (in(i) == -2)
>           if (acc >      0) out(i) = -acc-x;
>       elseif (acc < -top+x) out(i) = 0;
>       elseif (acc <=     0) out(i) = -x;
>        endif
>   endif
>   acc += out(i);
> endfor
> out
>
> Cheers
> Pawel
>

I'll take a look later today, but in case you don't get a solution here, the people over at StackExchange watching the Matlab/Octave tags (some of whom are here) are really good at popping out code snippets for things like this.

Nick J


_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Vectorizing simple loops

pawelz
Thanks Nick. I had a similar thought too, so actually I posted a bit more simplified version of this question to the StackExchange half an hour ago. There have already been some reactions.


On Wed, Dec 2, 2015 at 1:36 PM NJank [via Octave] <[hidden email]> wrote:


On Dec 2, 2015 6:56 AM, "pawelz" <[hidden email]> wrote:
>
> Hi,
>
> I have been profiling some code going through big datasets and find it
> impossible to express these simple algorithms in high-performing vectorized
> form instead of slow loop version which is so trivial I don't even show it
> here in the first case. I would appreciate your advice. My only answer at
> this point is to re-write these in C instead, which i would like to avoid.
>
> Case A) Fill the blanks with last non-zero value:
> Go through the input vector values one by one and output the value if not
> zero, or copy over the last non-zero value. Sample input and expected output
> vectors:
> in  = [ 1 0 2 0 7 7 7 0 5 0 0 0 9 ]
> out = [ 1 1 2 2 7 7 7 7 5 5 5 5 9 ]
> I tried the merge(v==0, shift()...) but it only works for the first zero
> occurrence, not an arbitrary number of them.
>
> Case B) seems a bit more difficult but in fact it is simple. Produce an
> output with a rule based on simple memory of the previous step. The basic
> trick would be to make a decision in each step, based on the previous steps
> output, i.e. produce out(i) based on the value of acc which was built using
> out(i-1). Again tried merge(shift()) to no avail...
>
> in  = [2 2 1 -1 0 0 -2 2 0 1]
> x   = 1;
> top = 5;
> acc = 0;
> for i = 1 : length(in)
>   if     (in(i) == +2)
>           if (acc <      0) out(i) = -acc+x;
>       elseif (acc >  top-x) out(i) = 0;
>       elseif (acc >=     0) out(i) = x;
>        endif
>   elseif (in(i) == +1)
>           if (acc >=     0) out(i) = 0;
>       elseif (acc <      0) out(i) = -acc;
>        endif
>   elseif (in(i) ==       0) out(i) = 0;
>   elseif (in(i) ==      -1)
>           if (acc <=     0) out(i) = 0;
>       elseif (acc >      0) out(i) = -acc;
>        endif
>   elseif (in(i) == -2)
>           if (acc >      0) out(i) = -acc-x;
>       elseif (acc < -top+x) out(i) = 0;
>       elseif (acc <=     0) out(i) = -x;
>        endif
>   endif
>   acc += out(i);
> endfor
> out
>
> Cheers
> Pawel
>

I'll take a look later today, but in case you don't get a solution here, the people over at StackExchange watching the Matlab/Octave tags (some of whom are here) are really good at popping out code snippets for things like this.

Nick J


_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave



If you reply to this email, your message will be added to the discussion below:
http://octave.1599824.n4.nabble.com/Vectorizing-simple-loops-tp4673742p4673756.html
To unsubscribe from Vectorizing simple loops, click here.
NAML
Reply | Threaded
Open this post in threaded view
|

Re: Vectorizing simple loops

briankaz
In reply to this post by pawelz

Paweł,

I can think of a way to reduce your case A to a for loop where the loop index is the kth uninterrupted stream of zeroes instead of the index of the original vector. So for the example you showed the loop would execute 4 times instead of 13. Would that be helpful to you?

-Brian

2 gru 2015 12:56 "pawelz" <[hidden email]> napisał(a):
Hi,

I have been profiling some code going through big datasets and find it
impossible to express these simple algorithms in high-performing vectorized
form instead of slow loop version which is so trivial I don't even show it
here in the first case. I would appreciate your advice. My only answer at
this point is to re-write these in C instead, which i would like to avoid.

Case A) Fill the blanks with last non-zero value:
Go through the input vector values one by one and output the value if not
zero, or copy over the last non-zero value. Sample input and expected output
vectors:
in  = [ 1 0 2 0 7 7 7 0 5 0 0 0 9 ]
out = [ 1 1 2 2 7 7 7 7 5 5 5 5 9 ]
I tried the merge(v==0, shift()...) but it only works for the first zero
occurrence, not an arbitrary number of them.

Case B) seems a bit more difficult but in fact it is simple. Produce an
output with a rule based on simple memory of the previous step. The basic
trick would be to make a decision in each step, based on the previous steps
output, i.e. produce out(i) based on the value of acc which was built using
out(i-1). Again tried merge(shift()) to no avail...

in  = [2 2 1 -1 0 0 -2 2 0 1]
x   = 1;
top = 5;
acc = 0;
for i = 1 : length(in)
  if     (in(i) == +2)
          if (acc <      0) out(i) = -acc+x;
      elseif (acc >  top-x) out(i) = 0;
      elseif (acc >=     0) out(i) = x;
       endif
  elseif (in(i) == +1)
          if (acc >=     0) out(i) = 0;
      elseif (acc <      0) out(i) = -acc;
       endif
  elseif (in(i) ==       0) out(i) = 0;
  elseif (in(i) ==      -1)
          if (acc <=     0) out(i) = 0;
      elseif (acc >      0) out(i) = -acc;
       endif
  elseif (in(i) == -2)
          if (acc >      0) out(i) = -acc-x;
      elseif (acc < -top+x) out(i) = 0;
      elseif (acc <=     0) out(i) = -x;
       endif
  endif
  acc += out(i);
endfor
out

Cheers
Pawel



--
View this message in context: http://octave.1599824.n4.nabble.com/Vectorizing-simple-loops-tp4673742.html
Sent from the Octave - General mailing list archive at Nabble.com.

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Vectorizing simple loops

Francesco Potortì
In reply to this post by pawelz
>Case A) Fill the blanks with last non-zero value:
>Go through the input vector values one by one and output the value if not
>zero, or copy over the last non-zero value. Sample input and expected output
>vectors:
>in  = [ 1 0 2 0 7 7 7 0 5 0 0 0 9 ]
>out = [ 1 1 2 2 7 7 7 7 5 5 5 5 9 ]

Just an idea, not a complete imnplementation.

If your 0 sequences are short, you can work it out by looping as many
times as the max length of your 0 sequences.  This finds the indices of
the first 0 in each sequence:

>> a = [ 1 0 2 0 7 7 7 0 5 0 0 0 9 ];
>> zeros = find(a==0)
zeros =
    2    4    8   10   11   12
>> isfirst = diff([0 zeros])>1
isfirst =
   1   1   1   1   0   0
>> burstheads = zeros(isfirst)
burstheads =
    2    4    8   10

Now you can set each burst head to the previous value and start again
until no more bursts are there.  This is easy and reasonably efficient
if the burst lengths are small (if the max burst len is 10, you loop ten
times).

If the burst are few and long, then you can proceed in the opposite way:
find the burst lengths and loop over the bursts.  The data you need to
start with are:

>> a = [ 1 0 2 0 7 7 7 0 5 0 0 0 9 ];
>> zeros = find(a==0)
zeros =
    2    4    8   10   11   12
>> burstlengths = diff([0 zeros])
burstlengths =
   2   2   4   2   1   1
>> bl = diff([0 zeros])
bl =
   2   2   4   2   1   1
>> isfirst = diff([0 zeros])>1
isfirst =
   1   1   1   1   0   0
>> burstheads = zeros(isfirst)
burstheads =
    2    4    8   10
>> >> burstlengths = bl(isfirst)-1
burstlengths =
   1   1   3   1


If the burst are many, and some can be very long, then either you find a
way to vectorise everything based on the above, or you can loop over
burst lengths until a given burst length, then loop over remaining
bursts, depending ont he burst length distribution.

--
Francesco Potortì (ricercatore)        Voice:  +39.050.621.3058
ISTI - Area della ricerca CNR          Mobile: +39.348.8283.107
via G. Moruzzi 1, I-56124 Pisa         Skype:  wnlabisti
(entrance 20, 1st floor, room C71)     Web:    http://fly.isti.cnr.it


_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Vectorizing simple loops

pawelz
In reply to this post by pawelz
Brian, Francesco,
thank you very much for the answers. I posted these two cases to the StackExchange and already got an excellent, fast and clean solution for the first case:
http://stackoverflow.com/questions/34041614/trivial-impossible-algorithm-challenge-in-octave-matlab

The second one still stays unresolved:
http://stackoverflow.com/questions/34046161/trivial-impossible-algorithm-challenge-in-octave-matlab-iterations-memory
Reply | Threaded
Open this post in threaded view
|

Re: Vectorizing simple loops

briankaz
Thanks Paweł for sharing the stackoverflow link with us.  The last solution is indeed extremely elegant and useful!  I can use the same methods to speed up a bunch of my own code!

-Brian

2015-12-02 16:34 GMT+01:00 pawelz <[hidden email]>:
Brian, Francesco,
thank you very much for the answers. I posted these two cases to the
StackExchange and already got an excellent, fast and clean solution for the
first case:
http://stackoverflow.com/questions/34041614/trivial-impossible-algorithm-challenge-in-octave-matlab

The second one still stays unresolved:
http://stackoverflow.com/questions/34046161/trivial-impossible-algorithm-challenge-in-octave-matlab-iterations-memory



--
View this message in context: http://octave.1599824.n4.nabble.com/Vectorizing-simple-loops-tp4673742p4673763.html
Sent from the Octave - General mailing list archive at Nabble.com.

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave


_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Vectorizing simple loops

pawelz
Brian - sure, you are welcome. I learned a lot to reuse later too. Indeed the solution is very elegant, I never suspected such would be possible. I am pasting it here too, for reference:

in = [ 1 0 2 0 7 7 7 0 5 0 0 0 9 ]
out = nonzeros(in).'(cumsum(in~=0))

Btw - this second challenge is still unanswered. Given that I thought the same about the first one, I don't trust my judgment entirely, but to me the second challenge seems more difficult - if able to be vectorized at all...
Reply | Threaded
Open this post in threaded view
|

Re: Vectorizing simple loops

Francesco Potortì
>Brian - sure, you are welcome. I learned a lot to reuse later too. Indeed the
>solution is very elegant, I never suspected such would be possible. I am
>pasting it here too, for reference:
>
>in = [ 1 0 2 0 7 7 7 0 5 0 0 0 9 ]
>out = nonzeros(in).'(cumsum(in~=0))

Since this is a help list, here is a hint for those that are not
proficient in Octave.

The .' operator in this case (real numbers) is the same as the '
(transpose) operator, and is only needed to obtain a horizontal vector.

So this is equivalent to:

n = nonzeros(in)'
out = n((cumsum(in!=0))

--
Francesco Potortì (ricercatore)        Voice:  +39.050.621.3058
ISTI - Area della ricerca CNR          Mobile: +39.348.8283.107
via G. Moruzzi 1, I-56124 Pisa         Skype:  wnlabisti
(entrance 20, 1st floor, room C71)     Web:    http://fly.isti.cnr.it


_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

RE: Vectorizing simple loops

Jose Marcos Ferraro
In reply to this post by pawelz

-----Original Message-----
From: help-octave-bounces+jose.ferraro=[hidden email] [mailto:help-octave-bounces+jose.ferraro=[hidden email]] On Behalf Of pawelz
Sent: quinta-feira, 3 de dezembro de 2015 10:04
To: [hidden email]
Subject: Re: Vectorizing simple loops

Brian - sure, you are welcome. I learned a lot to reuse later too. Indeed the solution is very elegant, I never suspected such would be possible. I am pasting it here too, for reference:

in = [ 1 0 2 0 7 7 7 0 5 0 0 0 9 ]
out = nonzeros(in).'(cumsum(in~=0))

Btw -  this second challenge <http://stackoverflow.com/questions/34046161>
is still unanswered. Given that I thought the same about the first one, I don't trust my judgment entirely, but to me the second challenge seems more difficult - if able to be vectorized at all...



--
View this message in context: http://octave.1599824.n4.nabble.com/Vectorizing-simple-loops-tp4673742p4673781.html
Sent from the Octave - General mailing list archive at Nabble.com.

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave



I believe the second problem can also be vectorized, but I may have misunderstood what was being asked.
 Define three functions propagate.m , fsm.m and fsmf.m and fill them with

function outz  = propagate(inz)

z = find(inz, 1);
if(length(z)==0)
        outz = inz;
        return
endif
in = inz(z:end);
n = nonzeros(in)';
out = n((cumsum(in!=0)));
outz = [zeros(1,z-1) out];

endfunction


function out  = fsm(in )


#trim leading 0s

sinal = propagate(sign(in));
step = abs(in) == 2;
stepup = step .* (sinal>0);
stepdown = step .* (sinal<0);
rampup = cumsum(stepup);
rampdown =  cumsum(stepdown);
plateauup = propagate(rampup .* (sinal<0));
plateaudown = propagate(rampdown .* (sinal>0));
rampup_centered = rampup - plateauup;
rampdown_centered = rampdown - plateaudown;
ramp = -rampdown_centered+rampup_centered;
out = ramp;

endfunction

function out  = fsmf(in , step , top )


z = find(in, 1);
if(length(z)==0)
        out = in;
        return
endif
in2 = in(z:end);

sim = fsm(in2);
sim2 = sim * step;
sim3 = min(sim2 , top);
sim4 = max(sim3 , -top);

out = diff([0 sim4]);

endfunction

Then you will have

octave:120> fsmf([2 2 2 2 0 1 0 -1 0 -2 -2 -1 0 1] , 3 , 9)
ans =

   3   3   3   0   0   0   0  -9   0  -3  -3   0   0   6

Jose Marcos Ferraro
www.LOGITeng.com

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

RE: Vectorizing simple loops

pawelz
Jose, Thank you very much for working on it. I just tested your solution and its results match perfectly with the original function even on big data sets. The performance is over 200 times better, so it is exactly what I needed.
Now, would you be so kind to explain the train of thought in this algorithm? I'd very much like to understand it better.
Reply | Threaded
Open this post in threaded view
|

RE: Vectorizing simple loops

Jose Marcos Ferraro


-----Original Message-----
From: help-octave-bounces+jose.ferraro=[hidden email] [mailto:help-octave-bounces+jose.ferraro=[hidden email]] On Behalf Of pawelz
Sent: domingo, 6 de dezembro de 2015 21:03
To: [hidden email]
Subject: RE: Vectorizing simple loops

Jose, Thank you very much for working on it. I just tested your solution and its results match perfectly with the original function even on big data sets. The performance is over 200 times better, so it is exactly what I needed.
Now, would you be so kind to explain the train of thought in this algorithm?
I'd very much like to understand it better.
       



--
View this message in context: http://octave.1599824.n4.nabble.com/Vectorizing-simple-loops-tp4673742p4673860.html
Sent from the Octave - General mailing list archive at Nabble.com.

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave

Pawel,
your finite state machine is very peculiar, to the point that I though I had misunderstood it.
If the accumulator is positive and the input is -1 or -2, it forgets the old value and start again from zero. The same thing happens if the acc is negative and input is +1 or +2.
So we can separate in two parts , a positive and a negative one. That's why the intermediate variables have a positive"up" and negative "down" version.
Each of this two parts is independent and can be calculated separetedly.
Finally, inside each part there is a certain number of "runs" inside which the accumulator simply goes up till it reaches top.
When it reaches top it will not change anymore till the end of the run, so one can simply ignore it and later clean the values that go higher (or lower) then allowed.
So inside each run it behaves as a simple accumulator increasing at 2 and doing nothing at 0 or 1.
If you are really concerned about performance, it can most likely be improved. This is simply the first version that I wrote and that I believe is simpler to understand.
The code can certainly also be factored, as someone pointed on this list and as the evident analogy between the up and down variables show the duplication of code.
Jose Marcos Ferraro
www.LOGITeng.com

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave