Hello,
My name is Stefan Mirea. I am a fourth year student of Automatic Control and Computer Science at the "Politehnica" University of Bucharest. As Octave has been one of my favourite tools during the faculty years, I wish to contribute to its development. I chose to start with bug #50007. Unfortunately, I don't think there is any elegant fix that does not imply changing the Octave API. Since my solution would involve modifying multiple function prototypes, I decided to describe it below and ask for your opinion before working on a patch. According to the MATLAB documentation[1], the functions which accept a nanflag argument are: * min * max * cummin * cummax * sum * cumsum * mean * median * var * std * cov * medfilt1 together with some recently introduced descriptive statistics functions, which are not currently implemented in Octave: movsum, movmean, movmedian, movmax, movmin, movstd, movvar. For the one-array-input syntaxes of min and max, I would: * modify the signature of the do_minmax_red_op() function in max.cc, in order to receive the boolean nan-flag from do_minmax_body(). do_minmax_red_op() is never called from elsewhere. Unhappily, the two template specializations, for charNDArray and boolNDArray respectively, would need to be updated as well, even though these types don't support NaN values. * add a nan-flag parameter to the min() and max() methods of all the classes with which do_minmax_red_op() is instantiated (except boolNDArray): SparseMatrix, NDArray, SparseComplexMatrix, ComplexNDArray, FloatNDArray, FloatComplexNDArray, charNDArray and intNDArray<*> (yet charNDArray methods could be left unchanged as well because of the specialization). Although these min/max methods are part of the Octave API, using a default argument would ensure that no external code would be affected. Inside the Octave core, they are only called from max.cc. Also, I believe that uniformity towards the min() and max() methods of other classes would not be a problem, as the classes above are already the only ones whose min/max methods appear in this form (receiving the dimension along which the reductions will be performed; on the other hand, ColumnVector::min() has no parameters for example). * add a nan-flag parameter to mx_inline_min() and mx_inline_max(). These functions are used only in the min()/max() methods of the Array (not Sparse) classes above. The overloads defined with OP_MINMAX_FCNN would just pass the flag unchanged when calling the OP_MINMAX_FCN or OP_MINMAX_FCN2 version. * update do_mx_minmax_op() to receive the nan-flag from the caller min()/max() method and send it to mx_minmax_op() (which now accepts it). Again, do_mx_minmax_op() is used only in the min()/max() methods of the Array classes above. * update SparseMatrix::min()/max() as well as the versions of mx_inline_min() and mx_inline_max() defined with OP_MINMAX_FCN and OP_MINMAX_FCN2 to take the received NaN policy into account. For the two-array-input syntax of min and max, I would: * send the nan-flag from do_minmax_body() to do_minmax_bin_op() similarly (even if it is not needed by the charNDArray specialization). * either find a general mechanism to tell the binary operations if NaN values must be propagated (by altering do_sm_binary_op() / do_ms_binary_op() / do_mm_binary_op()) or create alternative mx_inline_xmin() / mx_inline_xmax() functions for the "includenan" case (e.g. mx_inline_xmin_includenan). Since other mx_inline_*() functions would not use this feature given the current MATLAB syntax, I believe that the second alternative is much better. * either: a) add the nan-flag parameter to all the scalar-matrix / matrix-scalar / matrix-matrix min() and max() functions. While, again, this implies changing the Octave API, using a default argument can keep external code functional. Inside the Octave core, these functions are also called only from max.cc. Unfortunately, the Matrix, FloatMatrix, ComplexMatrix and FloatComplexMatrix versions of min() and max() should be also changed for consistency, although they would never be called with the flag set (when calling min or max from the interpreter). For NDArray, ComplexNDArray, FloatNDArray, FloatComplexNDArray, intNDArray<*> and charNDArray, the min() / max() functions (usually defined with MINMAX_FCNS, except in charNDArray) would just check whether to pass mx_inline_x##FCN or mx_inline_x##FCN##_includenan to do_*_binary_op() (assuming the creation of the mx_inline_*_includenan functions). For SparseMatrix and SparseComplexMatrix, the algorithm in min() and max() would be updated to respect the flag. or: b) call do_sm/ms/mm_binary_op() directly from do_minmax_bin_op(), avoiding the min() / max() functions. In addition to losing modularity, SparseMatrix and SparseComplexMatrix would probably need additional "min_includenan()" and "max_includenan()" functions. The situation for cummin and cummax is very similar to the one-array-input min/max case (actually, I think that SparseMatrix and SparseComplexMatrix should also have their own cummin() and cummax() methods, because in MATLAB cummin and cummax return a sparse matrix when called with a sparse argument). Regarding the other functions, they are different in that the currently implemented behaviour is equivalent to "includenan". Most of them seem to have "omitnan" counterparts in the nan package, but, of course, not in a manner compatible with MATLAB. I would be thankful if you could give me some feedback on the approach above. Regards, Stefan |
On 01/14/2017 09:00 AM, [hidden email] wrote:
> Subject: > The nanflag parameter > From: > Ștefan-Gabriel Mirea [hidden email] > Date: > 01/13/2017 06:31 PM > To: > [hidden email] > List-Post: > [hidden email] > Precedence: > list > MIME-Version: > 1.0 > Message-ID: > [hidden email] > Content-Type: > text/plain; charset=UTF-8 > Message: > 2 > > Hello, > > My name is Stefan Mirea. I am a fourth year student of Automatic > Control and Computer Science at the "Politehnica" University of > Bucharest. As Octave has been one of my favourite tools during the > faculty years, I wish to contribute to its development. > > I chose to start with bug #50007. Unfortunately, I don't think there > is any elegant fix that does not imply changing the Octave API. Since > my solution would involve modifying multiple function prototypes, I > decided to describe it below and ask for your opinion before working > on a patch. As long as changes are applied to the development branch and don't disrupt existing users, changing the API is okay. > > > According to the MATLAB documentation[1], the functions which accept a > nanflag argument are: > * min > * max > * cummin > * cummax > * sum > * cumsum > * mean > * median > * var > * std > * cov > > * medfilt1 I don't see 'medfilt1' in the list of core Matlab functions. I think you can skip it for now. > > together with some recently introduced descriptive statistics > functions, which are not currently implemented in Octave: movsum, > movmean, movmedian, movmax, movmin, movstd, movvar. > > For the one-array-input syntaxes of min and max, I would: > > * modify the signature of the do_minmax_red_op() function in max.cc, > in order to receive the boolean nan-flag from do_minmax_body(). > do_minmax_red_op() is never called from elsewhere. Unhappily, the two > template specializations, for charNDArray and boolNDArray > respectively, would need to be updated as well, even though these > types don't support NaN values. > > * add a nan-flag parameter to the min() and max() methods of all the > classes with which do_minmax_red_op() is instantiated (except > boolNDArray): SparseMatrix, NDArray, SparseComplexMatrix, > ComplexNDArray, FloatNDArray, FloatComplexNDArray, charNDArray and > intNDArray<*> (yet charNDArray methods could be left unchanged as well > because of the specialization). Although these min/max methods are > part of the Octave API, using a default argument would ensure that no > external code would be affected. Inside the Octave core, they are only > called from max.cc. Also, I believe that uniformity towards the min() > and max() methods of other classes would not be a problem, as the > classes above are already the only ones whose min/max methods appear > in this form (receiving the dimension along which the reductions will > be performed; on the other hand, ColumnVector::min() has no parameters > for example). > > * add a nan-flag parameter to mx_inline_min() and mx_inline_max(). > These functions are used only in the min()/max() methods of the Array > (not Sparse) classes above. The overloads defined with OP_MINMAX_FCNN > would just pass the flag unchanged when calling the OP_MINMAX_FCN or > OP_MINMAX_FCN2 version. > > * update do_mx_minmax_op() to receive the nan-flag from the caller > min()/max() method and send it to mx_minmax_op() (which now accepts > it). Again, do_mx_minmax_op() is used only in the min()/max() methods > of the Array classes above. > > * update SparseMatrix::min()/max() as well as the versions of > mx_inline_min() and mx_inline_max() defined with OP_MINMAX_FCN and > OP_MINMAX_FCN2 to take the received NaN policy into account. > I would be thankful if you could give me some feedback on the approach > above. I agree that this seems like a workable approach. However, I wonder if it is actually necessary? The question that occurs to me is whether the nanflag parameter is something that users programming in C++ would want. If it is, then the change needs to be in liboctave and the interpreter can piggyback on those modifications. However, if this is really something that only users of the Octave interpreter need then it would be easier to just make this change in the m-files themselves or in libinterp. For example, the existing max/min functions behave as if "omitnan" were specified, and that is the most common way to use the functions. If a programmer wants the 'includenan' behavior they are really saying that they want a test for NaN, and there is already the function isnan() for that. Pseudo-code for a new max.m function would be: tmp = __max__ (x, [], dim); # Call old max function includenan = strcmpi (nanflag, "includenan"); if (includenan) idx = any (isnan (x), dim); tmp(idx) = NaN; endif Just for fun I tried, x = rand (1000, 1000); x(1000,1000) = NaN; tmp = max (x); tic; idx = any (isnan (x)); tmp(idx) = NaN; toc Elapsed time is 0.00314093 seconds. 3 milliseconds is a pretty small price to pay, and it would only be done when using the non-default behavior. What I coded as an m-file could also be done in C++ if you want to keep with the existing implementation of max/min in libinterp. So, my fundamental question for Octave-Maintainers is whether we need a nanflag in liboctave, or only in the Octave interpreter? --Rik |
Hi Rik,
Thank you for your reply. On Mon, Jan 16, 2017 at 2:54 AM, Rik <[hidden email]> wrote: > So, my fundamental question for Octave-Maintainers is whether we need a > nanflag in liboctave, or only in the Octave interpreter? If the performance deficit turns out to be acceptable, is there a way to also compute the index array (the second output) using high level functions? Unluckily, the find function cannot operate along a given dimension. This question also arises when calling with one output if the argument is a complex array, as NaN complex values can have regular imaginary parts and then min/max must return the first one met. For example, in MATLAB: >> max([1, NaN + 2i, NaN + 3i], [], 'includenan') ans = NaN + 2.0000i Otherwise, as far as I can see I'd have to manually go through the array content. For NDArrays, I'd have to use an index tuple (maybe somehow simulating that the loop along dim is the innermost, in order to be able to break it at the first NaN found, unless a full but cache-friendly loop was preferably), while for Sparse(Complex)Matrix I'd operate on the three underlying vectors. Is this correct? As for the two-array-input form, there is always only one output, so a method similar to your pseudocode is enough for non-complex inputs including SparseMatrix (by using "or" instead of "any", which fortunately knows to perform broadcasting). For two complex input arrays, the current default behaviour is surprisingly equivalent to "includenan" (because of octave::math::max with complex parameters, which propagates NaN's). The following pseudocode, although tricky, is the best I could think of for the "omitnan" case of two complex inputs: tmp = __max__(x, y); # Call old max function if (! includenan) x_extended = x + zeros(size(y)); y_extended = y + zeros(size(x)); x_extended_isnan = isnan(x_extended); # or isnan(x) | zeros(size(y)); non_NaN_operands = ifelse(x_extended_isnan, y_extended, x_extended); idx = xor(isnan(x), isnan(y)); tmp(idx) = non_NaN_operands(idx); endif Of course, suppose the equivalent C++ code. It also works on SparseComplexMatrices. Stefan |
I attached patches[1] for min/max and cummin/cummax in both
alternatives (liboctave and libinterp). Stefan https://savannah.gnu.org/bugs/?50007 |
Free forum by Nabble | Edit this page |