restoring sizemax function

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

restoring sizemax function

I believe there are issues with the way I resolved bug #47469 in cset
2890a931e647 that require more thought.

Bug #47469 was about why it was possible to apparently create arrays where
the size exceeded the constant sizemax(); the constant sizemax being equal
to intmax (octave_idx_type) - 1.  It didn't seem that there was any use in
restricting the size of arrays to be less than that of the underlying
integer data type itself, so I made the change to deprecate sizemax.

But, it turns out there are repercussions.  With signed 32-bit integers for
octave_idx_type the maximum possible number of elements is 2^31 - 1. 
However, it is now possible to write "ones (2^32, 1)" and not have Octave
emit an error, nor does the returned array have 2^32 elements.

To give everyone a concrete example of the issue, the code for ones() or
zeros() or pi() ends up calling fill_matrix in

static octave_value
fill_matrix (const octave_value_list& args, double val, const char *fcn)
  octave_value retval;

  int nargin = args.length ();

  oct_data_conv::data_type dt = oct_data_conv::dt_double;

  dim_vector dims (1, 1);

  if (nargin > 0 && args(nargin-1).is_string ())
      std::string nm = args(nargin-1).string_value ();

      dt = oct_data_conv::string_to_data_type (nm);

  switch (nargin)
    case 0:

    case 1:
      octave::get_dimensions (args(0), fcn, dims);

        dims.resize (nargin);

        for (int i = 0; i < nargin; i++)
          dims(i) = (args(i).isempty () ? 0 : args(i).idx_type_value (true));

  dims.chop_trailing_singletons ();

  octave::check_dimensions (dims, fcn);

  // Note that automatic narrowing will handle conversion from
  // NDArray to scalar.

  switch (dt)
    case oct_data_conv::dt_single:
      retval = FloatNDArray (dims, static_cast<float> (val));

    case oct_data_conv::dt_double:
      if (dims.ndims () == 2 && dims(0) == 1 && octave::math::isfinite (val))
        retval = Range (val, 0.0, dims(1));  // Packed form
        retval = NDArray (dims, val);

      error ("%s: invalid class name", fcn);

  return retval;

The list of operations is

1) Get the dimensions of the new array (e.g, idx_type_value())
2) Consolidate (chop_trailing_singletons) and verify appropriateness
3) CallArray constructor with dim_vector and value to initialize array with.

The trouble is that the XXX_value() routines are based on a philosophical
principle that whenever possible they should attempt to complete their
operation.  It is possible to construct a 31-bit idx_type_value from 2^32,
but it requires restricting the value to 2^31-1.  No warning or error is
thrown when this happens. 
When check_dimensions looks at the resulting value it find it <= to 2^31-1
and lets the dimension pass without error.

There are various ways to tackle this.  One would be to revert the
changeset I mentioned, go back to using sizemax() when checking dimensions,
and then look through the rest of the code to make sure that all dimensions
get checked consistently.  In effect what we were doing before was using
the special value 2^31-1 to pass out-of-band information between functions
that there had been an overflow in a conversion routine.

A second approach, although I don't necessarily like it, would be to move
checking dimensions before these dimensions are converted to
octave_idx_type objects.  This seems like it would involve a lot of
overhead by converting to a wider data type, such as double, first,
checking dimensions, and only if that passed performing a second conversion
to the desired type.

A third approach would be to add an additional boolean input to
idx_type_value() that would emit an error when the range of the type was
exceeded and the boolean input were true.  The conversion code in already has an input like this (check_int) which checks that the
input was in integer.

#define INT_CONV_METHOD(T, F)                                           \
  T                                                                     \
  octave_base_value::F ## _value (bool require_int, bool frc_str_conv) const \
  {                                                                     \
    T retval = 0;                                                       \
    double d = 0.0;                                                     \
    try                                                                 \
      {                                                                 \
        d = double_value (frc_str_conv);                                \
      }                                                                 \
    catch (octave::execution_exception& e)                               \
      {                                                                 \
        err_wrong_type_arg (e, "octave_base_value::" #F "_value ()",
type_name ()); \
      }                                                                 \
    if (require_int && octave::math::x_nint (d) != d)                   \
      error_with_cfn ("conversion of %g to " #T " value failed", d);    \
    else if (d < std::numeric_limits<T>::min ())                        \
      retval = std::numeric_limits<T>::min ();                          \
    else if (d > std::numeric_limits<T>::max ())                        \
      retval = std::numeric_limits<T>::max ();                          \
    else                                                                \
      retval = static_cast<T> (octave::math::fix (d));                  \
    return retval;                                                      \

The only trouble with this is that it will require two more "else if" tests
and idx_type_value is a routine that gets called a lot, so performance is

A fourth alternative might be to rewrite idx_type_value()

octave_value::idx_type_value (bool req_int, bool frc_str_conv) const
#if defined (OCTAVE_ENABLE_64)
  return int64_value (req_int, frc_str_conv);
  return int_value (req_int, frc_str_conv);

Instead of relying on the integer conversion routines, which are really
designed for a different purpose, we could code up a conversion and
checking routine at the same time.  That might be the best option, although
probably something for Octave 7.

Any votes on an approach?