mutable considered harmful, Range edition

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

mutable considered harmful, Range edition

John W. Eaton
Administrator

When using --traditional, I noticed that the result of

   zeros (1, 0)

was [](0x0) instead of [](1x0).

This error is related to the use of a mutable cache value in the Range
data type.  Here's how:

Constant row vectors (like those produced by zeros and ones) are stored
as ranges with the number of elemnts set to the number of columns and
the increment set to zero.

But using --traditional implies "disable_range (true)" so the
Range::matrix_value method is called is called in the octave_value
constructor and that appears to fail.  If I understand correctly, the
matrix_value method returns the incorrect result because it is declared
const but also modifies and returns a mutable value (the cached Matrix
value).  The compiler appears to be choosing to return the cache value
before it is modified (the method is const, so normally it wouldn't
change any member variables).

Is it possible to make the mutable cache value reliable in situations
like this?  If not, then this appears to be another example of why we
need to eliminate mutable in most places in Octave.

Separately, I see that other than the Range::matrix_value method, we set
the cache value in the operators, like this:

   Range operator + (const Range& r, double x)
   {
     Range result (r.base () + x, r.limit () + x, r.inc (), r.numel ());
     if (result.m_numel < 0)
       result.m_cache = r.matrix_value () + x;

     return result;
   }

As I recall, setting the cache in these functions (and not just the
matrix_value method) is done so that, for example, adding a constant to
a range and then converting to a matrix will produce exactly the same
result as converting a range to a matrix and then adding a constant to
the matrix (in psuedo code):

   matrix (r) + c == matrix (r + c)

Using the cache this way does avoid the cost of any repeated conversions
to a matrix value, but it also forces the cache to be created for any
operation on a range, not just the result.  So it largely defeats the
purpose of the efficient range object storage, and I'm wondering whether
it is worth having a special range data type at all?  What do we really
gain for the additional complexity?

jwe

Reply | Threaded
Open this post in threaded view
|

Re: mutable considered harmful, Range edition

John W. Eaton
Administrator
On 6/9/20 12:11 AM, I wrote:

> Separately, I see that other than the Range::matrix_value method, we set
> the cache value in the operators, like this:
>
>    Range operator + (const Range& r, double x)
>    {
>      Range result (r.base () + x, r.limit () + x, r.inc (), r.numel ());
>      if (result.m_numel < 0)
>        result.m_cache = r.matrix_value () + x;
>
>      return result;
>    }
>
> As I recall, setting the cache in these functions (and not just the
> matrix_value method) is done so that, for example, adding a constant to
> a range and then converting to a matrix will produce exactly the same
> result as converting a range to a matrix and then adding a constant to
> the matrix (in psuedo code):
>
>    matrix (r) + c == matrix (r + c)
>
> Using the cache this way does avoid the cost of any repeated conversions
> to a matrix value, but it also forces the cache to be created for any
> operation on a range, not just the result.  So it largely defeats the
> purpose of the efficient range object storage, and I'm wondering whether
> it is worth having a special range data type at all?  What do we really
> gain for the additional complexity?

I see now that there are limited cases where result.m_numel will be
negative, so the cache is not updated for every operation.  However, the
problems with the mutable cache remain, as do the issues with operations
on ranges not being identical to the operations on the equivalent
matrices.  Here is a simple example:

   r0 = 1:0.1:10;
   r1 = r0 + 2.3;   # range + scalar
   r2 = [r0] + 2.3; # matrix + scalar
   all (r1 == r2)   # returns false for me
   d = r1 - r2;     # show elements with differences
   idx = find (d)
   d(idx)

I understand the arguments about Octave being a numerical tool and not
expecting exact results for floating point operations, but I'm still
wondering whether the complexity of these range operations is justified.
  If we do want to support operations that avoid immediate conversion to
Matrix data, maybe we should only do so when we can guarantee that

   matrix (r) OP val == matrix (r OP val)

is true?  We should be able to do this when VAL and all elements of R
are integers and will remain so after the operation.  Other cases might
be possible as well, but harder to detect.  And maybe the cache should
be eliminated and this test handled in the octave_value class hierarchy?

jwe


Reply | Threaded
Open this post in threaded view
|

Improving handling of ranges (was: Re: mutable considered harmful, Range edition)

John W. Eaton
Administrator
On 6/9/20 9:43 AM, I wrote:

> On 6/9/20 12:11 AM, I wrote:
>
>> Separately, I see that other than the Range::matrix_value method, we
>> set the cache value in the operators, like this:
>>
>>    Range operator + (const Range& r, double x)
>>    {
>>      Range result (r.base () + x, r.limit () + x, r.inc (), r.numel ());
>>      if (result.m_numel < 0)
>>        result.m_cache = r.matrix_value () + x;
>>
>>      return result;
>>    }
>>
>> As I recall, setting the cache in these functions (and not just the
>> matrix_value method) is done so that, for example, adding a constant
>> to a range and then converting to a matrix will produce exactly the
>> same result as converting a range to a matrix and then adding a
>> constant to the matrix (in psuedo code):
>>
>>    matrix (r) + c == matrix (r + c)
>>
>> Using the cache this way does avoid the cost of any repeated
>> conversions to a matrix value, but it also forces the cache to be
>> created for any operation on a range, not just the result.  So it
>> largely defeats the purpose of the efficient range object storage, and
>> I'm wondering whether it is worth having a special range data type at
>> all?  What do we really gain for the additional complexity?
>
> I see now that there are limited cases where result.m_numel will be
> negative, so the cache is not updated for every operation.  However, the
> problems with the mutable cache remain, as do the issues with operations
> on ranges not being identical to the operations on the equivalent
> matrices.  Here is a simple example:
>
>    r0 = 1:0.1:10;
>    r1 = r0 + 2.3;   # range + scalar
>    r2 = [r0] + 2.3; # matrix + scalar
>    all (r1 == r2)   # returns false for me
>    d = r1 - r2;     # show elements with differences
>    idx = find (d)
>    d(idx)
>
> I understand the arguments about Octave being a numerical tool and not
> expecting exact results for floating point operations, but I'm still
> wondering whether the complexity of these range operations is justified.
>   If we do want to support operations that avoid immediate conversion to
> Matrix data, maybe we should only do so when we can guarantee that
>
>    matrix (r) OP val == matrix (r OP val)
>
> is true?  We should be able to do this when VAL and all elements of R
> are integers and will remain so after the operation.  Other cases might
> be possible as well, but harder to detect.  And maybe the cache should
> be eliminated and this test handled in the octave_value class hierarchy?

Are there any thoughts on this topic?

At the very leas, I would like to support integer ({u,}int{8,16,32,64})
and single-precision ranges in Octave.  If possible, I'd like to have a
unified solution for all range types, but that might be a somewhat
disruptive change.

Comments would be helpful.

jwe


Reply | Threaded
Open this post in threaded view
|

Re: Improving handling of ranges (was: Re: mutable considered harmful, Range edition)

Olaf Till-2
On Mon, Jun 29, 2020 at 02:48:21PM -0400, John W. Eaton wrote:

> On 6/9/20 9:43 AM, I wrote:
> > On 6/9/20 12:11 AM, I wrote:
> >
> > > Separately, I see that other than the Range::matrix_value method, we
> > > set the cache value in the operators, like this:
> > >
> > >    Range operator + (const Range& r, double x)
> > >    {
> > >      Range result (r.base () + x, r.limit () + x, r.inc (), r.numel ());
> > >      if (result.m_numel < 0)
> > >        result.m_cache = r.matrix_value () + x;
> > >
> > >      return result;
> > >    }
> > >
> > > As I recall, setting the cache in these functions (and not just the
> > > matrix_value method) is done so that, for example, adding a constant
> > > to a range and then converting to a matrix will produce exactly the
> > > same result as converting a range to a matrix and then adding a
> > > constant to the matrix (in psuedo code):
> > >
> > >    matrix (r) + c == matrix (r + c)
> > >
> > > Using the cache this way does avoid the cost of any repeated
> > > conversions to a matrix value, but it also forces the cache to be
> > > created for any operation on a range, not just the result.  So it
> > > largely defeats the purpose of the efficient range object storage,
> > > and I'm wondering whether it is worth having a special range data
> > > type at all?  What do we really gain for the additional complexity?
> >
> > I see now that there are limited cases where result.m_numel will be
> > negative, so the cache is not updated for every operation.  However, the
> > problems with the mutable cache remain, as do the issues with operations
> > on ranges not being identical to the operations on the equivalent
> > matrices.  Here is a simple example:
> >
> >    r0 = 1:0.1:10;
> >    r1 = r0 + 2.3;   # range + scalar
> >    r2 = [r0] + 2.3; # matrix + scalar
> >    all (r1 == r2)   # returns false for me
> >    d = r1 - r2;     # show elements with differences
> >    idx = find (d)
> >    d(idx)
> >
> > I understand the arguments about Octave being a numerical tool and not
> > expecting exact results for floating point operations, but I'm still
> > wondering whether the complexity of these range operations is justified.
> >  If we do want to support operations that avoid immediate conversion to
> > Matrix data, maybe we should only do so when we can guarantee that
> >
> >    matrix (r) OP val == matrix (r OP val)
> >
> > is true?  We should be able to do this when VAL and all elements of R
> > are integers and will remain so after the operation.  Other cases might
> > be possible as well, but harder to detect.  And maybe the cache should
> > be eliminated and this test handled in the octave_value class hierarchy?
>
> Are there any thoughts on this topic?
>
> At the very leas, I would like to support integer ({u,}int{8,16,32,64}) and
> single-precision ranges in Octave.  If possible, I'd like to have a unified
> solution for all range types, but that might be a somewhat disruptive
> change.
>
> Comments would be helpful.
I don't know if the following will be helpful, just some shared
thoughts:

Under the point of view (which I'd share) that differences in floating
point results got in numerically different ways are not critical, it
could suffice to mark the cache as invalid for each operation on a
range(?).

I don't know, but maybe the above would eliminate the need for the
cache to be mutable?

(Don't know if it's of interest, but I think the decision to store
ones(1,n) as ranges was made mostly to facilitate the use of ones(1,n)
in repmat.m.)

Olaf

--
public key id EAFE0591, e.g. on x-hkp://pool.sks-keyservers.net

signature.asc (849 bytes) Download Attachment