lazy contiguous subrange indexing

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

lazy contiguous subrange indexing

Jaroslav Hajek-2
Hello,

please consider the attached two patches.

Summary (1st patch):
If a dense indexing expression can be reduced to a contiguous subrange
(like x(2:end-1), A(:,:,3:10) etc.), a shared (i.e. cheap) copy is
made and the actual extraction is postponed until (if ever) the value
is written to. The obvious benefit is greater performance of various
operations with indexed expressions, such as y = x(1:n-1) + x(2:n),
which will avoid making two temporary copies. This is done by
modifying Array<T> to allow it work with contiguous subranges of its
(possibly shared) data. There is an obvious risk of increased memory
consumption, because a big block of memory may be kept allocated while
only part of it is actually used.
This is resolved by checking prior to storing a value to a "permanent"
location (same where checks for null matrices are done), and possibly
economizing if an orphaned block is detected. The detection (see
Array<T>::maybe_economize) is fairly simple and could be improved at
the cost of performance of the operation. Since most indexing
expressions do not live longer than their parent object I suppose this
will seldom be an issue, but it is still easily possible to construct
an example where memory is wastefully kept allocated.
However, the issue is easy to avoid if you know what you do, so I
think it's OK. Comments are welcome.

To get an idea of the performance speed up, here's a simplistic test
(set n to suitable value):

n = 2e7;
x = 1/n * [1:n]; y = x.^2;
# compute 2nd-order differences
tic; d2y = diff (y, 2); toc
# compute integral
tic; int = trapz (x, y); toc

with a recent tip, I get:

Elapsed time is 0.697262 seconds.
Elapsed time is 0.960276 seconds.

with the new patch, I got:

Elapsed time is 0.213054 seconds.
Elapsed time is 0.497459 seconds.

which means speedups by 228% and 93%.
P.S. I picked some benefitted functions because measuring directly the
performance speed-up of the affected indexing expressions is
meaningless.


Summary (2nd patch):
This is a long-postponed cleanup of Array<T>, DiagArray2<T> and
PermMatrix, to allow the private members of Array<T> to become really
private,
and reimplement DiagArray and PermMatrix without the dangerous abuse
of Array<T>::dimensions (that has caused me quite a pain with the
previous patch).
This patch should probably be applied in any case, I'm putting it here
because right now it's created on top of the 1st patch and I've only
ran make check on the result.

comments? OK to push?

cheers to Octave

--
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz

lazy.diff (26K) Download Attachment
arclp.diff (28K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

lazy contiguous subrange indexing

John W. Eaton
Administrator
On 14-Jan-2009, Jaroslav Hajek wrote:

| Summary (1st patch):
| If a dense indexing expression can be reduced to a contiguous subrange
| (like x(2:end-1), A(:,:,3:10) etc.), a shared (i.e. cheap) copy is
| made and the actual extraction is postponed until (if ever) the value
| is written to. The obvious benefit is greater performance of various
| operations with indexed expressions, such as y = x(1:n-1) + x(2:n),
| which will avoid making two temporary copies. This is done by
| modifying Array<T> to allow it work with contiguous subranges of its
| (possibly shared) data. There is an obvious risk of increased memory
| consumption, because a big block of memory may be kept allocated while
| only part of it is actually used.
| This is resolved by checking prior to storing a value to a "permanent"
| location (same where checks for null matrices are done), and possibly
| economizing if an orphaned block is detected. The detection (see
| Array<T>::maybe_economize) is fairly simple and could be improved at
| the cost of performance of the operation. Since most indexing
| expressions do not live longer than their parent object I suppose this
| will seldom be an issue, but it is still easily possible to construct
| an example where memory is wastefully kept allocated.
| However, the issue is easy to avoid if you know what you do, so I
| think it's OK. Comments are welcome.
|
| To get an idea of the performance speed up, here's a simplistic test
| (set n to suitable value):
|
| n = 2e7;
| x = 1/n * [1:n]; y = x.^2;
| # compute 2nd-order differences
| tic; d2y = diff (y, 2); toc
| # compute integral
| tic; int = trapz (x, y); toc
|
| with a recent tip, I get:
|
| Elapsed time is 0.697262 seconds.
| Elapsed time is 0.960276 seconds.
|
| with the new patch, I got:
|
| Elapsed time is 0.213054 seconds.
| Elapsed time is 0.497459 seconds.
|
| which means speedups by 228% and 93%.
| P.S. I picked some benefitted functions because measuring directly the
| performance speed-up of the affected indexing expressions is
| meaningless.
|
|
| Summary (2nd patch):
| This is a long-postponed cleanup of Array<T>, DiagArray2<T> and
| PermMatrix, to allow the private members of Array<T> to become really
| private,
| and reimplement DiagArray and PermMatrix without the dangerous abuse
| of Array<T>::dimensions (that has caused me quite a pain with the
| previous patch).
| This patch should probably be applied in any case, I'm putting it here
| because right now it's created on top of the 1st patch and I've only
| ran make check on the result.
|
| comments? OK to push?

I think these changes are fine.  I see that you already pushed them,
which is OK, but maybe in the future when you ask for comments you
could wait a few days?  :-)

I noticed the following incomplete comment in ov.cc:

  // FIXME: This is a good place for pre-storage hooks, but the functions should
  // probably be named differently. These functions will be called

Can you please explain (in a comment somewhere) what is meant by
"storable_value"?

Also, it might be good to add some explanation in Array.h about how
slice_data is supposed to be used.  I'd like to try to avoid confusion
for people who might want to make modifications to the Array class in
the future.

Thanks,

jwe
Reply | Threaded
Open this post in threaded view
|

lazy contiguous subrange indexing

John W. Eaton
Administrator
In reply to this post by Jaroslav Hajek-2
On 14-Jan-2009, Jaroslav Hajek wrote:

| with a recent tip, I get:
|
| Elapsed time is 0.697262 seconds.
| Elapsed time is 0.960276 seconds.
|
| with the new patch, I got:
|
| Elapsed time is 0.213054 seconds.
| Elapsed time is 0.497459 seconds.
|
| which means speedups by 228% and 93%.
| P.S. I picked some benefitted functions because measuring directly the
| performance speed-up of the affected indexing expressions is
| meaningless.

Oh, and I should mention that it is great to have the speedup.

Thanks!

jwe
Reply | Threaded
Open this post in threaded view
|

Re: lazy contiguous subrange indexing

Jaroslav Hajek-2
In reply to this post by John W. Eaton
On Fri, Jan 16, 2009 at 10:24 PM, John W. Eaton <[hidden email]> wrote:

> On 14-Jan-2009, Jaroslav Hajek wrote:
>
> | Summary (1st patch):
> | If a dense indexing expression can be reduced to a contiguous subrange
> | (like x(2:end-1), A(:,:,3:10) etc.), a shared (i.e. cheap) copy is
> | made and the actual extraction is postponed until (if ever) the value
> | is written to. The obvious benefit is greater performance of various
> | operations with indexed expressions, such as y = x(1:n-1) + x(2:n),
> | which will avoid making two temporary copies. This is done by
> | modifying Array<T> to allow it work with contiguous subranges of its
> | (possibly shared) data. There is an obvious risk of increased memory
> | consumption, because a big block of memory may be kept allocated while
> | only part of it is actually used.
> | This is resolved by checking prior to storing a value to a "permanent"
> | location (same where checks for null matrices are done), and possibly
> | economizing if an orphaned block is detected. The detection (see
> | Array<T>::maybe_economize) is fairly simple and could be improved at
> | the cost of performance of the operation. Since most indexing
> | expressions do not live longer than their parent object I suppose this
> | will seldom be an issue, but it is still easily possible to construct
> | an example where memory is wastefully kept allocated.
> | However, the issue is easy to avoid if you know what you do, so I
> | think it's OK. Comments are welcome.
> |
> | To get an idea of the performance speed up, here's a simplistic test
> | (set n to suitable value):
> |
> | n = 2e7;
> | x = 1/n * [1:n]; y = x.^2;
> | # compute 2nd-order differences
> | tic; d2y = diff (y, 2); toc
> | # compute integral
> | tic; int = trapz (x, y); toc
> |
> | with a recent tip, I get:
> |
> | Elapsed time is 0.697262 seconds.
> | Elapsed time is 0.960276 seconds.
> |
> | with the new patch, I got:
> |
> | Elapsed time is 0.213054 seconds.
> | Elapsed time is 0.497459 seconds.
> |
> | which means speedups by 228% and 93%.
> | P.S. I picked some benefitted functions because measuring directly the
> | performance speed-up of the affected indexing expressions is
> | meaningless.
> |
> |
> | Summary (2nd patch):
> | This is a long-postponed cleanup of Array<T>, DiagArray2<T> and
> | PermMatrix, to allow the private members of Array<T> to become really
> | private,
> | and reimplement DiagArray and PermMatrix without the dangerous abuse
> | of Array<T>::dimensions (that has caused me quite a pain with the
> | previous patch).
> | This patch should probably be applied in any case, I'm putting it here
> | because right now it's created on top of the 1st patch and I've only
> | ran make check on the result.
> |
> | comments? OK to push?
>
> I think these changes are fine.  I see that you already pushed them,
> which is OK, but maybe in the future when you ask for comments you
> could wait a few days?  :-)
>

Sorry. Basically the impulse to push it was that I wanted to refer to
the first changeset in a discussion (on this ML). A stupid reason, of
course.
My apologies, once more.

> I noticed the following incomplete comment in ov.cc:
>
>  // FIXME: This is a good place for pre-storage hooks, but the functions should
>  // probably be named differently. These functions will be called
>
> Can you please explain (in a comment somewhere) what is meant by
> "storable_value"?
>

storable_value and make_storable_value is a hook that is called on an
octave_value
prior to storing it to a "permanent" location, like a named variable,
a cell or a struct component, or a return value of a function.
Previously, this was named non_null_value because all it did was to
canonicalize the special null matrices (for element deletion). I
realized that the proper places to check for maybe_economize are the
same, so I renamed the hook function to reflect
*where it's being called* rather than what it does. If, later, some
other "lazy" mechanism is employed in Octave, then checks for possible
"unlazifying" will probably also go in the same place.
A suggestion for a better name or better scheme is welcome. Ill put in
an explaining comment.


> Also, it might be good to add some explanation in Array.h about how
> slice_data is supposed to be used.  I'd like to try to avoid confusion
> for people who might want to make modifications to the Array class in
> the future.

OK, I'll do it.

cheers

--
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz
Reply | Threaded
Open this post in threaded view
|

Re: lazy contiguous subrange indexing

John W. Eaton
Administrator
On 17-Jan-2009, Jaroslav Hajek wrote:

| My apologies, once more.

There's no need to apologize.  Sometimes its difficult to tell whether
people are slow to comment or just not planning to comment...

| other "lazy" mechanism is employed in Octave, then checks for possible
| "unlazifying" will probably also go in the same place.
| A suggestion for a better name or better scheme is welcome. Ill put in
| an explaining comment.

OK.  I don't have a better name at the moment.

Thanks,

jwe