The nanflag parameter

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

The nanflag parameter

Ștefan-Gabriel Mirea
Hello,

My name is Stefan Mirea. I am a fourth year student of Automatic
Control and Computer Science at the "Politehnica" University of
Bucharest. As Octave has been one of my favourite tools during the
faculty years, I wish to contribute to its development.

I chose to start with bug #50007. Unfortunately, I don't think there
is any elegant fix that does not imply changing the Octave API. Since
my solution would involve modifying multiple function prototypes, I
decided to describe it below and ask for your opinion before working
on a patch.

According to the MATLAB documentation[1], the functions which accept a
nanflag argument are:
* min
* max
* cummin
* cummax
* sum
* cumsum
* mean
* median
* var
* std
* cov
* medfilt1

together with some recently introduced descriptive statistics
functions, which are not currently implemented in Octave: movsum,
movmean, movmedian, movmax, movmin, movstd, movvar.

For the one-array-input syntaxes of min and max, I would:

* modify the signature of the do_minmax_red_op() function in max.cc,
in order to receive the boolean nan-flag from do_minmax_body().
do_minmax_red_op() is never called from elsewhere. Unhappily, the two
template specializations, for charNDArray and boolNDArray
respectively, would need to be updated as well, even though these
types don't support NaN values.

* add a nan-flag parameter to the min() and max() methods of all the
classes with which do_minmax_red_op() is instantiated (except
boolNDArray): SparseMatrix, NDArray, SparseComplexMatrix,
ComplexNDArray, FloatNDArray, FloatComplexNDArray, charNDArray and
intNDArray<*> (yet charNDArray methods could be left unchanged as well
because of the specialization). Although these min/max methods are
part of the Octave API, using a default argument would ensure that no
external code would be affected. Inside the Octave core, they are only
called from max.cc. Also, I believe that uniformity towards the min()
and max() methods of other classes would not be a problem, as the
classes above are already the only ones whose min/max methods appear
in this form (receiving the dimension along which the reductions will
be performed; on the other hand, ColumnVector::min() has no parameters
for example).

* add a nan-flag parameter to mx_inline_min() and mx_inline_max().
These functions are used only in the min()/max() methods of the Array
(not Sparse) classes above. The overloads defined with OP_MINMAX_FCNN
would just pass the flag unchanged when calling the OP_MINMAX_FCN or
OP_MINMAX_FCN2 version.

* update do_mx_minmax_op() to receive the nan-flag from the caller
min()/max() method and send it to mx_minmax_op() (which now accepts
it). Again, do_mx_minmax_op() is used only in the min()/max() methods
of the Array classes above.

* update SparseMatrix::min()/max() as well as the versions of
mx_inline_min() and mx_inline_max() defined with OP_MINMAX_FCN and
OP_MINMAX_FCN2 to take the received NaN policy into account.

For the two-array-input syntax of min and max, I would:

* send the nan-flag from do_minmax_body() to do_minmax_bin_op()
similarly (even if it is not needed by the charNDArray
specialization).

* either find a general mechanism to tell the binary operations if NaN
values must be propagated (by altering do_sm_binary_op() /
do_ms_binary_op() / do_mm_binary_op()) or create alternative
mx_inline_xmin() / mx_inline_xmax() functions for the "includenan"
case (e.g. mx_inline_xmin_includenan). Since other mx_inline_*()
functions would not use this feature given the current MATLAB syntax,
I believe that the second alternative is much better.

* either:
  a) add the nan-flag parameter to all the scalar-matrix /
matrix-scalar / matrix-matrix min() and max() functions. While, again,
this implies changing the Octave API, using a default argument can
keep external code functional. Inside the Octave core, these functions
are also called only from max.cc. Unfortunately, the Matrix,
FloatMatrix, ComplexMatrix and FloatComplexMatrix versions of min()
and max() should be also changed for consistency, although they would
never be called with the flag set (when calling min or max from the
interpreter). For NDArray, ComplexNDArray, FloatNDArray,
FloatComplexNDArray, intNDArray<*> and charNDArray, the min() / max()
functions (usually defined with MINMAX_FCNS, except in charNDArray)
would just check whether to pass mx_inline_x##FCN or
mx_inline_x##FCN##_includenan to do_*_binary_op() (assuming the
creation of the mx_inline_*_includenan functions). For SparseMatrix
and SparseComplexMatrix, the algorithm in min() and max() would be
updated to respect the flag.

  or:
  b) call do_sm/ms/mm_binary_op() directly from do_minmax_bin_op(),
avoiding the min() / max() functions. In addition to losing
modularity, SparseMatrix and SparseComplexMatrix would probably need
additional "min_includenan()" and "max_includenan()" functions.

The situation for cummin and cummax is very similar to the
one-array-input min/max case (actually, I think that SparseMatrix and
SparseComplexMatrix should also have their own cummin() and cummax()
methods, because in MATLAB cummin and cummax return a sparse matrix
when called with a sparse argument).

Regarding the other functions, they are different in that the
currently implemented behaviour is equivalent to "includenan". Most of
them seem to have "omitnan" counterparts in the nan package, but, of
course, not in a manner compatible with MATLAB.

I would be thankful if you could give me some feedback on the approach
above.

Regards,
Stefan

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: The nanflag parameter

Rik-4
On 01/14/2017 09:00 AM, [hidden email] wrote:
> Subject: > The nanflag parameter > From: > Ștefan-Gabriel Mirea [hidden email] > Date: > 01/13/2017 06:31 PM > To: > [hidden email] > List-Post: > [hidden email] > Precedence: > list > MIME-Version: > 1.0 > Message-ID: > [hidden email] > Content-Type: > text/plain; charset=UTF-8 > Message: > 2 > > Hello, > > My name is Stefan Mirea. I am a fourth year student of Automatic > Control and Computer Science at the "Politehnica" University of > Bucharest. As Octave has been one of my favourite tools during the > faculty years, I wish to contribute to its development. > > I chose to start with bug #50007. Unfortunately, I don't think there > is any elegant fix that does not imply changing the Octave API. Since > my solution would involve modifying multiple function prototypes, I > decided to describe it below and ask for your opinion before working > on a patch.

As long as changes are applied to the development branch and don't disrupt existing users, changing the API is okay.

> > > According to the MATLAB documentation[1], the functions which accept a > nanflag argument are: > * min > * max > * cummin > * cummax > * sum > * cumsum > * mean > * median > * var > * std > * cov > > * medfilt1

I don't see 'medfilt1' in the list of core Matlab functions.  I think you can skip it for now.

> > together with some recently introduced descriptive statistics > functions, which are not currently implemented in Octave: movsum, > movmean, movmedian, movmax, movmin, movstd, movvar. > > For the one-array-input syntaxes of min and max, I would: > > * modify the signature of the do_minmax_red_op() function in max.cc, > in order to receive the boolean nan-flag from do_minmax_body(). > do_minmax_red_op() is never called from elsewhere. Unhappily, the two > template specializations, for charNDArray and boolNDArray > respectively, would need to be updated as well, even though these > types don't support NaN values. > > * add a nan-flag parameter to the min() and max() methods of all the > classes with which do_minmax_red_op() is instantiated (except > boolNDArray): SparseMatrix, NDArray, SparseComplexMatrix, > ComplexNDArray, FloatNDArray, FloatComplexNDArray, charNDArray and > intNDArray<*> (yet charNDArray methods could be left unchanged as well > because of the specialization). Although these min/max methods are > part of the Octave API, using a default argument would ensure that no > external code would be affected. Inside the Octave core, they are only > called from max.cc. Also, I believe that uniformity towards the min() > and max() methods of other classes would not be a problem, as the > classes above are already the only ones whose min/max methods appear > in this form (receiving the dimension along which the reductions will > be performed; on the other hand, ColumnVector::min() has no parameters > for example). > > * add a nan-flag parameter to mx_inline_min() and mx_inline_max(). > These functions are used only in the min()/max() methods of the Array > (not Sparse) classes above. The overloads defined with OP_MINMAX_FCNN > would just pass the flag unchanged when calling the OP_MINMAX_FCN or > OP_MINMAX_FCN2 version. > > * update do_mx_minmax_op() to receive the nan-flag from the caller > min()/max() method and send it to mx_minmax_op() (which now accepts > it). Again, do_mx_minmax_op() is used only in the min()/max() methods > of the Array classes above. > > * update SparseMatrix::min()/max() as well as the versions of > mx_inline_min() and mx_inline_max() defined with OP_MINMAX_FCN and > OP_MINMAX_FCN2 to take the received NaN policy into account. > I would be thankful if you could give me some feedback on the approach > above.

I agree that this seems like a workable approach.  However, I wonder if
it is actually necessary?  The question that occurs to me is whether the
 nanflag parameter is something that users programming in C++ would
want.  If it is, then the change needs to be in liboctave and the
interpreter can piggyback on those modifications.  However, if this is
really something that only users of the Octave interpreter need then it
would be easier to just make this change in the m-files themselves or in
 libinterp.

For example, the existing max/min functions behave as if "omitnan" were specified, and that is the most common way to use the functions.  If a programmer wants the 'includenan' behavior they are really saying that they want a test for NaN, and there is already the function isnan() for that.  Pseudo-code for a new max.m function would be:

tmp = __max__ (x, [], dim);   # Call old max function
includenan = strcmpi (nanflag, "includenan");
if (includenan)
  idx = any (isnan (x), dim);
  tmp(idx) = NaN;
endif

Just for fun I tried,

x = rand (1000, 1000);
x(1000,1000) = NaN;
tmp = max (x);
tic; idx = any (isnan (x)); tmp(idx) = NaN; toc
Elapsed time is 0.00314093 seconds.

3 milliseconds is a pretty small price to pay, and it would only be done when using the non-default behavior.

What I coded as an m-file could also be done in C++ if you want to keep with the existing implementation of max/min in libinterp.

So, my fundamental question for Octave-Maintainers is whether we need a nanflag in liboctave, or only in the Octave interpreter?

--Rik

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: The nanflag parameter

Ștefan-Gabriel Mirea
Hi Rik,

Thank you for your reply.

On Mon, Jan 16, 2017 at 2:54 AM, Rik <[hidden email]> wrote:
> So, my fundamental question for Octave-Maintainers is whether we need a
> nanflag in liboctave, or only in the Octave interpreter?

If the performance deficit turns out to be acceptable, is there a way
to also compute the index array (the second output) using high level
functions? Unluckily, the find function cannot operate along a given
dimension. This question also arises when calling with one output if
the argument is a complex array, as NaN complex values can have
regular imaginary parts and then min/max must return the first one
met. For example, in MATLAB:

>> max([1, NaN + 2i, NaN + 3i], [], 'includenan')

ans =

      NaN + 2.0000i

Otherwise, as far as I can see I'd have to manually go through the
array content. For NDArrays, I'd have to use an index tuple (maybe
somehow simulating that the loop along dim is the innermost, in order
to be able to break it at the first NaN found, unless a full but
cache-friendly loop was preferably), while for Sparse(Complex)Matrix
I'd operate on the three underlying vectors. Is this correct?

As for the two-array-input form, there is always only one output, so a
method similar to your pseudocode is enough for non-complex inputs
including SparseMatrix (by using "or" instead of "any", which
fortunately knows to perform broadcasting).

For two complex input arrays, the current default behaviour is
surprisingly equivalent to "includenan" (because of octave::math::max
with complex parameters, which propagates NaN's). The following
pseudocode, although tricky, is the best I could think of for the
"omitnan" case of two complex inputs:

tmp = __max__(x, y);  # Call old max function
if (! includenan)
  x_extended = x + zeros(size(y));
  y_extended = y + zeros(size(x));

  x_extended_isnan = isnan(x_extended); # or isnan(x) | zeros(size(y));
  non_NaN_operands = ifelse(x_extended_isnan, y_extended, x_extended);

  idx = xor(isnan(x), isnan(y));
  tmp(idx) = non_NaN_operands(idx);
endif

Of course, suppose the equivalent C++ code. It also works on
SparseComplexMatrices.

Stefan

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: The nanflag parameter

Ștefan-Gabriel Mirea
I attached patches[1] for min/max and cummin/cummax in both
alternatives (liboctave and libinterp).

Stefan

https://savannah.gnu.org/bugs/?50007

Loading...