Really fix indexing in orderfields.m

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Really fix indexing in orderfields.m

Jason Riedy
# HG changeset patch
# User Jason Riedy <[hidden email]>
# Date 1233089444 18000
# Node ID 202bd681f367fad0c3966576569483be4deb0350
# Parent  827f0285a2016fd6ce72d2906e08faea48fb205a
Really fix indexing in orderfields.m

My previous fix did not work, and the tests were insufficient
to notice.  There was one original problem; the assignment to
a non-existent t(:) triggered an error about querying t's size
before it exists.  I replaced that error with a non-loop that
happened to pick only one name out of the struct and spread it
across all the fields.  The syntax struct.(name, name, ...)
doesn't work, although it'd be nice.

So the final fix is actually to add another loop to index
t(1), t(2), etc.  Not high-performance, but neither are
structs.

Sorry.  Struct arrays still hurt my head, so hopefully this is
a bit more correct.

diff -r 827f0285a201 -r 202bd681f367 scripts/ChangeLog
--- a/scripts/ChangeLog Tue Jan 27 12:48:34 2009 -0500
+++ b/scripts/ChangeLog Tue Jan 27 15:50:44 2009 -0500
@@ -1,3 +1,8 @@
+2009-01-27  Jason Riedy  <[hidden email]>
+
+ * miscellaneous/orderfields.m: Really fix the indexing for struct
+ arrays.
+
 2009-01-27  Carlo de Falco  <[hidden email]>
 
  * polynomial/spline.m: Doc fix.
diff -r 827f0285a201 -r 202bd681f367 scripts/miscellaneous/orderfields.m
--- a/scripts/miscellaneous/orderfields.m Tue Jan 27 12:48:34 2009 -0500
+++ b/scripts/miscellaneous/orderfields.m Tue Jan 27 15:50:44 2009 -0500
@@ -87,26 +87,49 @@
   endif
 
   ## Permute the names in the structure.
-  args = cell (1, 2 * numel (names));
-  args(1:2:end) = names;
   if (numel (s1) == 0)
+    args = cell (1, 2 * numel (names));
+    args(1:2:end) = names;
     args(2:2:end) = {[]};
+    t = struct (args{:});
   else
-    args(2:2:end) = {s1.(names)};
+    for i = 1:numel (names)
+      el = names(i);
+      for k = 1:length (s1)
+ t(k).(el) = s1(k).(el);
+      endfor
+    endfor
   endif
-  t = struct (args{:});
 
 endfunction
 
-%!shared a, b
+%!shared a, b, c
 %! a = struct ("foo", {1, 2}, "bar", {3, 4});
 %! b = struct ("bar", 6, "foo", 5);
+%! c = struct ("bar", {7, 8}, "foo", 9);
 %!test
 %! a(2) = orderfields (b, a);
 %! assert (a(2).foo, 5)
+%! assert (a(2).bar, 6)
 %!test
 %! a(2) = orderfields (b, [2 1]);
 %! assert (a(2).foo, 5)
+%! assert (a(2).bar, 6)
 %!test
 %! a(2) = orderfields (b, fieldnames (a));
 %! assert (a(2).foo, 5)
+%! assert (a(2).bar, 6)
+%!test
+%! a(1:2) = orderfields (c, fieldnames (a));
+%! assert (a(2).foo, 9)
+%! assert (a(2).bar, 8)
+
+%!test
+%! aa.x = {1, 2};
+%! aa.y = 3;
+%! aa(2).x = {4, 5};
+%! bb.y = {6, 7};
+%! bb.x = 8;
+%! aa(2) = orderfields (bb, aa);
+%! assert (aa(2).x, 8);
+%! assert (aa(2).y{1}, 6);
Reply | Threaded
Open this post in threaded view
|

Really fix indexing in orderfields.m

John W. Eaton
Administrator
On 27-Jan-2009, Jason Riedy wrote:

| # HG changeset patch
| # User Jason Riedy <[hidden email]>
| # Date 1233089444 18000
| # Node ID 202bd681f367fad0c3966576569483be4deb0350
| # Parent  827f0285a2016fd6ce72d2906e08faea48fb205a
| Really fix indexing in orderfields.m

I applied this changeset.

Thanks,

jwe
Reply | Threaded
Open this post in threaded view
|

Re: Really fix indexing in orderfields.m

Jaroslav Hajek-2
In reply to this post by Jason Riedy
On Tue, Jan 27, 2009 at 10:07 PM, Jason Riedy <[hidden email]> wrote:

> # HG changeset patch
> # User Jason Riedy <[hidden email]>
> # Date 1233089444 18000
> # Node ID 202bd681f367fad0c3966576569483be4deb0350
> # Parent  827f0285a2016fd6ce72d2906e08faea48fb205a
> Really fix indexing in orderfields.m
>
> My previous fix did not work, and the tests were insufficient
> to notice.  There was one original problem; the assignment to
> a non-existent t(:) triggered an error about querying t's size
> before it exists.  I replaced that error with a non-loop that
> happened to pick only one name out of the struct and spread it
> across all the fields.  The syntax struct.(name, name, ...)
> doesn't work, although it'd be nice.
>
> So the final fix is actually to add another loop to index
> t(1), t(2), etc.  Not high-performance, but neither are
> structs.
>
> Sorry.  Struct arrays still hurt my head, so hopefully this is
> a bit more correct.
>

Okay, so please note this patch that replaces the loop by a proper
index expression and also fixes the function to work with
multdimensional structs (maybe you were not aware structs can have
multiple dimensions?):
http://hg.savannah.gnu.org/hgweb/octave/rev/20d23d65cc84

After my recent work at optimizing struct and cell indexing I finally
understand the quirks of cells, structs and cs-lists myself, so I may
give you some explanations:

I see you were surprised that the assigment [t(:).x] = s(:).x doesn't
work if t does not exist. The reason is mostly technical: The number
of elements on the LHS should be known before rhs is evaluated, to be
able to pass proper "nargout" to a function. That's why
colon-subscripting of an empty or non-existent cell or struct is
forbidden (with arrays, an analogical expression attempts to inquire
the shape from the RHS).
Here, there is no function call, so it would be technically possible,
but complicated. Not that Matlab 2007a actually allows the expression,
but treats it as [t(1).x] and silently ignores the rest of rhs, which
I find very confusing (or a bug). That's why I decided to forbid it -
the workaround is easy, just figure out the proper length beforehand,
as in my patch above.
Note that the assignment
n = numel (s); [t(1:n).x] = s(:).x;
will (in development version) be *very* fast because it will actually
only make a *shallow copy* of s.x, i.e. there will be now a single
Cell object referenced by both s.x and t.x. The same would apply if
you substituted one or both of the sides with cells. This makes cells,
structs and cs-lists interchangeable very cheaply if you avoid
looping.
I hope this is going to contribute to performance of some complicated codes.

regards


--
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz
Reply | Threaded
Open this post in threaded view
|

Re: Really fix indexing in orderfields.m

Jason Riedy
And Jaroslav Hajek writes:
> Okay, so please note this patch that replaces the loop by a proper
> index expression and also fixes the function to work with
> multdimensional structs (maybe you were not aware structs can have
> multiple dimensions?):

Thank you! I know about the multi-dims (I've written a few
partial matfile readers), but I was hung up on the field part.

> I see you were surprised that the assigment [t(:).x] = s(:).x doesn't
> work if t does not exist.

No, that was the obvious part. My disappointment was that t.("f1",
"f2") doesn't work to slice by fields like R's df$c("f1", "f2").
If it did, then t = s.(names) would be the easiest (and likely
fastest) method.

The surprise here is that t.("f1", "f2") returned *only* t.f1.
Right now, I can't spend the time on making this work, so chalk it
up to a future patch.

> This makes cells, structs and cs-lists interchangeable very cheaply if
> you avoid looping.  I hope this is going to contribute to performance
> of some complicated codes.

Thank you for all the indexing work! It very well might help for my
data analysis side.  Although now I seem to be hitting a memory
leak / gc issue. sigh. Have to work down to a test case to see if
it's my code's problem or not.

Jason
Reply | Threaded
Open this post in threaded view
|

Re: Really fix indexing in orderfields.m

Jaroslav Hajek-2
On Wed, Jan 28, 2009 at 4:00 PM, Jason Riedy <[hidden email]> wrote:

> And Jaroslav Hajek writes:
>> Okay, so please note this patch that replaces the loop by a proper
>> index expression and also fixes the function to work with
>> multdimensional structs (maybe you were not aware structs can have
>> multiple dimensions?):
>
> Thank you! I know about the multi-dims (I've written a few
> partial matfile readers), but I was hung up on the field part.
>
>> I see you were surprised that the assigment [t(:).x] = s(:).x doesn't
>> work if t does not exist.
>
> No, that was the obvious part. My disappointment was that t.("f1",
> "f2") doesn't work to slice by fields like R's df$c("f1", "f2").
> If it did, then t = s.(names) would be the easiest (and likely
> fastest) method.
>

That's an interesting point. I don't think Matlab supports the syntax,
however. And it seems a little weird (though the logic is obvious).
Maybe we could just extend "getfield" to support this, i.e. "t =
getfield (s, names)".
But anyway I think that structs with more than, say, a dozen of fields
are very rare, so I don't see optimizations here as necessary
(currently getfield is m-file so it would loop anyway).
Instead of a struct with lots of fields, one may be better off using a
list of keys and values separately, and use lookup + indexing (lookup
now works with strings and does a binary search).

> The surprise here is that t.("f1", "f2") returned *only* t.f1.
> Right now, I can't spend the time on making this work, so chalk it
> up to a future patch.
>

I don't think this is supposed to work like that - it should give you
a parse error. Do you have a counterexample?

>> This makes cells, structs and cs-lists interchangeable very cheaply if
>> you avoid looping.  I hope this is going to contribute to performance
>> of some complicated codes.
>
> Thank you for all the indexing work! It very well might help for my
> data analysis side.  Although now I seem to be hitting a memory
> leak / gc issue. sigh. Have to work down to a test case to see if
> it's my code's problem or not.
>

You're welcome. Bug reports are also welcome, of course.

--
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz
Reply | Threaded
Open this post in threaded view
|

Re: Really fix indexing in orderfields.m

Jason Riedy
And Jaroslav Hajek writes:
> But anyway I think that structs with more than, say, a dozen of fields
> are very rare, so I don't see optimizations here as necessary
> (currently getfield is m-file so it would loop anyway).

In Octave, definitely. If you're using structs as an imitation of R's
data frames, then having dozens of fields is not uncommon.  Or if you're
using it to slice a database[1] without knowing much about the data
structure.

> Instead of a struct with lots of fields, one may be better off using a
> list of keys and values separately, and use lookup + indexing (lookup
> now works with strings and does a binary search).

Well, yes, and you're also better off performance-wise writing a C++
extension. ;)

> > The surprise here is that t.("f1", "f2") returned *only* t.f1.
[...]
> I don't think this is supposed to work like that - it should give you
> a parse error. Do you have a counterexample?

That probably is a parse error, but put a cell array in there:
  t = struct ("foo", {1, 2}, "bar", {3, 4});
  nm = fieldnames (t);
  t.(nm)
will return the first field. It probably should issue an error,
or a warning if existing code assumes that behavior. So there's
a cellstr -> string conversion happening somewhere...

t.(nm{:}) errors out, as currently is appropriate.

Jason

Footnotes:
[1]  I have a toy, *simple* DBI adapter for SQLite.  I need to rework
the matrix stuffing before it's ready for wide release, but it's at
  http://jriedy.users.sonic.net/cgi/jriedy/cgit/cgit.cgi/osdbi/
if you want a laugh.  I haven't tested it against a development version
of Octave yet.  Handy for throwing plot-generating data between Octave,
R, and other tools, although definitely not the right choice for most
uses.
Reply | Threaded
Open this post in threaded view
|

Re: Really fix indexing in orderfields.m

Jaroslav Hajek-2
On Wed, Jan 28, 2009 at 7:55 PM, Jason Riedy <[hidden email]> wrote:

> And Jaroslav Hajek writes:
>> But anyway I think that structs with more than, say, a dozen of fields
>> are very rare, so I don't see optimizations here as necessary
>> (currently getfield is m-file so it would loop anyway).
>
> In Octave, definitely. If you're using structs as an imitation of R's
> data frames, then having dozens of fields is not uncommon.  Or if you're
> using it to slice a database[1] without knowing much about the data
> structure.
>
>> Instead of a struct with lots of fields, one may be better off using a
>> list of keys and values separately, and use lookup + indexing (lookup
>> now works with strings and does a binary search).
>
> Well, yes, and you're also better off performance-wise writing a C++
> extension. ;)
>
>> > The surprise here is that t.("f1", "f2") returned *only* t.f1.
> [...]
>> I don't think this is supposed to work like that - it should give you
>> a parse error. Do you have a counterexample?
>
> That probably is a parse error, but put a cell array in there:
>  t = struct ("foo", {1, 2}, "bar", {3, 4});
>  nm = fieldnames (t);
>  t.(nm)
> will return the first field. It probably should issue an error,
> or a warning if existing code assumes that behavior. So there's
> a cellstr -> string conversion happening somewhere...
>
> t.(nm{:}) errors out, as currently is appropriate.
>
> Jason
>
> Footnotes:
> [1]  I have a toy, *simple* DBI adapter for SQLite.  I need to rework
> the matrix stuffing before it's ready for wide release, but it's at
>  http://jriedy.users.sonic.net/cgi/jriedy/cgit/cgit.cgi/osdbi/
> if you want a laugh.  I haven't tested it against a development version
> of Octave yet.  Handy for throwing plot-generating data between Octave,
> R, and other tools, although definitely not the right choice for most
> uses.
>

This seems interesting. Do you intend to include this in OctaveForge?
There's a database package, but I think it provides only raw sql
queries (maybe I'm wrong).

Anyway, if in the future you will want the fast slice-by-names
operation, I think the best way is to rewrite getfield as a compiled
func and optimize that.

cheers

--
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz