On Wed, Dec 19, 2018 at 11:09:51AM -0800, Rik wrote:

> Recently, sliding windows of statistics functions (movXXX) were added to

> Octave. All of these functions depend on a base function movfun. As part

> of implementing movfun I used the profiler to examine the performance and I

> found a hot spot responsible for 70% of the run time. Unfortunately, I

> couldn't see any way to improve things without jumping out of the Octave

> language in to C++. I'm hoping someone else might look at this and have an

> idea.

>

> The code is below

>

> --- Code Start ---

> ## Apply "shrink" boundary conditions

> ## Function is not applied to any window elements outside the original data.

> function y = shrink_bc (fcn, x, idxp, win, wlen, odim)

> N = length (x);

> idx = idxp + win;

> tf = (idx > 0) & (idx <= N); # idx inside boundaries

>

> n = length (idxp);

> y = zeros (n, odim);

> ## FIXME: This nested for loop accounts for 70% of running time.

> ## Given that "shrink" is the default Endpoint value this

> ## code needs to be reworked.

> for i = 1:n

> k = idx(tf(:,i),i);

> y(i,:) = fcn (x(k));

> endfor

> endfunction

> --- Code End ---

>

> In my case, the following conditions are a suitable example

>

> +verbatim+

> fcn = @mean;

> x = randi (1e4, [1000, 1]);

> idxp = 1:25;

> win = (-25:25).';

> wlen = [51, 51];

> odim = 1;

> -verbatim-

>

> Indexing in Octave is insanely fast, but this is a weird case because it is

> not simply linear indexing [x(5)], nor is it logical indexing [x(x < 1)].

> It is a hybrid where the indices are linear (as would be returned from

> find()) but only some of them are valid so they need to be masked with a

> logical value.

>

> When I run this code repeatedly in a benchmark it requires ~3.0

> milliseconds, but in the overall code it is called 2000 times so the

> function is taking ~6 seconds which feels long.

>

> In this example, the value of k for i = 1 is 1:26, for i = 2, 1:27, etc., etc.

>

> I tried recoding using cellfun

>

> idx = cellfun (@(n) (1:n), num2cell ((wlen(1) - idxp(end)):(wlen(end)-1)),

> 'uniformoutput', false);

> y = cellfun (@(i) fcn (x(i)), idx);

>

> but this takes ~4.5 milliseconds to run.

At my system:

--- Code Start ---

k = 1:30;

x_k = x(k).';

inp = num2cell (1:25); # 1 : length (idxp)

tic;

cellfun (@ (id) mean (x_k), inp, "UniformOutput", false);

toc

--- Code End ---

takes roughly the same time as calling the original 'shrink_bc'

function with:

--- Code Start ---

tic;

shrink_bc (fcn, x, idxp, win, wlen, odim);

toc

--- Code End ---

(both times 32--33 ms, 5 times slower than at your system).

So neither indexing nor for-loops seem to be the cause, just the

repeated calling of the function 'mean'...

Olaf

--

public key id EAFE0591, e.g. on x-hkp://pool.sks-keyservers.net