Re: Cell arrays of strings in "sort" and "unique"

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Cell arrays of strings in "sort" and "unique"

David Bateman-3
Moving this discusson to maintainers as it seems to go a bit beyond
help :-)

> Although Octave doesn't support classes in the same way as Matlab, I
> think we should probably fix sort to handle cell arrays in a
> compatible way, even if that means we have to add another case to
> Fsort.

I thought this was a bit crufty as a solution. However, given that the
basic sorting code is written as a template it would be relatively
painless to add sorting of cell arrays of strings as well. Should
probably clean up sort.cc in any case since the NDArray, complexNDArray
and charNDArray stuff might also be written as a template.

In any case, looking at

http://www.mathworks.com/access/helpdesk/help/techdoc/ref/sort.html

it seems that matlab has also added a "mode" flag that can define
ascending or descending sorted order. Which again is relatively easy
to add, but again more work :-(

However, if we are going to this level of compatiability, maybe we
should also revisit the complex sorting code and the decision to do
the sorting only on the absolute value rather than in a matlab
compatiable way. Then again maybe not as sorting non ordinate values
doesn't make much sense in any case.

>What about structs?  Does Matlab also have a special sort method for
>those?

Matlab doesn't seem to sort structs...

D.

--
David Bateman                                [hidden email]
Motorola CRM                                 +33 1 69 35 48 04 (Ph)
Parc Les Algorithmes, Commune de St Aubin    +33 1 69 35 77 01 (Fax)
91193 Gif-Sur-Yvette FRANCE

The information contained in this communication has been classified as:

[x] General Business Information
[ ] Motorola Internal Use Only
[ ] Motorola Confidential Proprietary


Reply | Threaded
Open this post in threaded view
|

Re: Cell arrays of strings in "sort" and "unique"

John W. Eaton-6
On 14-Sep-2004, David Bateman <[hidden email]> wrote:

| I thought this was a bit crufty as a solution. However, given that the
| basic sorting code is written as a template it would be relatively
| painless to add sorting of cell arrays of strings as well. Should
| probably clean up sort.cc in any case since the NDArray, complexNDArray
| and charNDArray stuff might also be written as a template.

Yes, I noticed that there is now a lot of duplicated code in sort.cc.

| http://www.mathworks.com/access/helpdesk/help/techdoc/ref/sort.html
|
| it seems that matlab has also added a "mode" flag that can define
| ascending or descending sorted order. Which again is relatively easy
| to add, but again more work :-(

The price of aiming at a moving target.

| However, if we are going to this level of compatiability, maybe we
| should also revisit the complex sorting code and the decision to do
| the sorting only on the absolute value rather than in a matlab
| compatiable way. Then again maybe not as sorting non ordinate values
| doesn't make much sense in any case.

We might as well be compatible if it is not too difficult.  In the old
days, I think they only said that complex elements X were sorted by
ABS(X), so I believe Octave was doing the compatible thing at some
point.  But now they say that matches are further sorted by ANGLE(X).
I would think that it should not be too hard to add that to the
complex comparison function.

jwe


Reply | Threaded
Open this post in threaded view
|

Re: Cell arrays of strings in "sort" and "unique"

David Bateman-3
According to John W. Eaton <[hidden email]> (on 09/14/04):
> On 14-Sep-2004, David Bateman <[hidden email]> wrote:
>
> | I thought this was a bit crufty as a solution. However, given that the
> | basic sorting code is written as a template it would be relatively
> | painless to add sorting of cell arrays of strings as well. Should
> | probably clean up sort.cc in any case since the NDArray, complexNDArray
> | and charNDArray stuff might also be written as a template.
>
> Yes, I noticed that there is now a lot of duplicated code in sort.cc.


Well I've converted it all to use templates, and included sorting of
cell arrays of strings. However, ...

> | http://www.mathworks.com/access/helpdesk/help/techdoc/ref/sort.html
> |
> | it seems that matlab has also added a "mode" flag that can define
> | ascending or descending sorted order. Which again is relatively easy
> | to add, but again more work :-(
>
> The price of aiming at a moving target.

I have a problem in implementing the "descend" flag. Where are the
NaN's sorted? As this was introduced in R14, which I don't have access
to.. So I need some one to run the example

x = [Inf, NaN, -Inf, 3, 2, 1];
[a, ai] = sort (x, "ascend")
[a, ai] = sort (x, "descend")
x = [Inf, NaN, -Inf, 3, 2, 1I];
[a, ai] = sort (x, "ascend")
[a, ai] = sort (x, "descend")

> | However, if we are going to this level of compatiability, maybe we
> | should also revisit the complex sorting code and the decision to do
> | the sorting only on the absolute value rather than in a matlab
> | compatiable way. Then again maybe not as sorting non ordinate values
> | doesn't make much sense in any case.
>
> We might as well be compatible if it is not too difficult.  In the old
> days, I think they only said that complex elements X were sorted by
> ABS(X), so I believe Octave was doing the compatible thing at some
> point.  But now they say that matches are further sorted by ANGLE(X).
> I would think that it should not be too hard to add that to the
> complex comparison function.

Ok, as I have most of this stuff implemented, I'll do this at the same
time. At least I will when I get feedback on the above question

D.


--
David Bateman                                [hidden email]
Motorola CRM                                 +33 1 69 35 48 04 (Ph)
Parc Les Algorithmes, Commune de St Aubin    +33 1 69 35 77 01 (Fax)
91193 Gif-Sur-Yvette FRANCE

The information contained in this communication has been classified as:

[x] General Business Information
[ ] Motorola Internal Use Only
[ ] Motorola Confidential Proprietary


Reply | Threaded
Open this post in threaded view
|

Re: Cell arrays of strings in "sort" and "unique"

Quentin Spencer
David Bateman wrote:

> I have a problem in implementing the "descend" flag. Where are the
>
>NaN's sorted? As this was introduced in R14, which I don't have access
>to.. So I need some one to run the example
>
>x = [Inf, NaN, -Inf, 3, 2, 1];
>[a, ai] = sort (x, "ascend")
>[a, ai] = sort (x, "descend")
>x = [Inf, NaN, -Inf, 3, 2, 1I];
>[a, ai] = sort (x, "ascend")
>[a, ai] = sort (x, "descend")
>  
>

Output from MATLAB R14:



 >> x = [Inf, NaN, -Inf, 3, 2, 1];
 >> [a, ai] = sort (x, 'ascend')

a =

  -Inf     1     2     3   Inf   NaN


ai =

     3     6     5     4     1     2

 >> [a, ai] = sort (x, 'descend')

a =

   NaN   Inf     3     2     1  -Inf


ai =

     2     1     4     5     6     3
/home/qspencer> mat
Warning: Could not query OpenGL.
Warning: OpenGL appears to be installed incorrectly.

                              < M A T L A B >
                  Copyright 1984-2004 The MathWorks, Inc.
                         Version 7.0.0.19901 (R14)
                               May 06, 2004


  To get started, type one of these: helpwin, helpdesk, or demo.
  For product information, visit www.mathworks.com.

 >> x = [Inf, NaN, -Inf, 3, 2, 1];
 >> [a, ai] = sort (x, 'ascend')

a =

  -Inf     1     2     3   Inf   NaN


ai =

     3     6     5     4     1     2

 >> [a, ai] = sort (x, 'descend')

a =

   NaN   Inf     3     2     1  -Inf


ai =

     2     1     4     5     6     3

 >> x = [Inf, NaN, -Inf, 3, 2, 1i];
 >> [a, ai] = sort (x, 'ascend')

a =

  Columns 1 through 4

        0 + 1.0000i   2.0000             3.0000                Inf

  Columns 5 through 6

     -Inf                NaN


ai =

     6     5     4     1     3     2

 >> [a, ai] = sort (x, 'descend')

a =

  Columns 1 through 4

      NaN               -Inf                Inf             3.0000

  Columns 5 through 6

   2.0000                  0 + 1.0000i


ai =

     2     3     1     4     5     6


Reply | Threaded
Open this post in threaded view
|

Re: Cell arrays of strings in "sort" and "unique"

David Bateman-3
According to Quentin Spencer <[hidden email]> (on 09/15/04):

>
> Output from MATLAB R14:
>
>
>
> >> x = [Inf, NaN, -Inf, 3, 2, 1];
> >> [a, ai] = sort (x, 'ascend')
>
> a =
>
>  -Inf     1     2     3   Inf   NaN
>
>
> ai =
>
>     3     6     5     4     1     2
>
> >> [a, ai] = sort (x, 'descend')
>
> a =
>
>   NaN   Inf     3     2     1  -Inf
>
>
> ai =
>
>     2     1     4     5     6     3


Now this isn't what I would have expected. I'd assumed that the NaN's
were always sorted to the end. It seems that matlab always does an
ascending sort and then a flip(lr|ud) or whatever... It might in fact
be faster this way, but the advantage of always sorting the NaN's
to the end would be that it is easier to drop them...

For compatiability we shoudl probably do the same thing...

Thanks
D.


--
David Bateman                                [hidden email]
Motorola CRM                                 +33 1 69 35 48 04 (Ph)
Parc Les Algorithmes, Commune de St Aubin    +33 1 69 35 77 01 (Fax)
91193 Gif-Sur-Yvette FRANCE

The information contained in this communication has been classified as:

[x] General Business Information
[ ] Motorola Internal Use Only
[ ] Motorola Confidential Proprietary


Reply | Threaded
Open this post in threaded view
|

Re: Cell arrays of strings in "sort" and "unique"

Alois Schlögl-2


David Bateman wrote:

>According to Quentin Spencer <[hidden email]> (on 09/15/04):
>  
>
>>Output from MATLAB R14:
>>
>>
>>
>>    
>>
>>>>x = [Inf, NaN, -Inf, 3, 2, 1];
>>>>[a, ai] = sort (x, 'ascend')
>>>>        
>>>>
>>a =
>>
>> -Inf     1     2     3   Inf   NaN
>>
>>
>>ai =
>>
>>    3     6     5     4     1     2
>>
>>    
>>
>>>>[a, ai] = sort (x, 'descend')
>>>>        
>>>>
>>a =
>>
>>  NaN   Inf     3     2     1  -Inf
>>
>>
>>ai =
>>
>>    2     1     4     5     6     3
>>    
>>
>
>
>Now this isn't what I would have expected. I'd assumed that the NaN's
>were always sorted to the end. It seems that matlab always does an
>ascending sort and then a flip(lr|ud) or whatever... It might in fact
>be faster this way, but the advantage of always sorting the NaN's
>to the end would be that it is easier to drop them...
>
>For compatiability we shoudl probably do the same thing...
>  
>

In this case, compatibility is not a "must". The result might differ
already in the next version of matlab.
If the position of NaN is important, its recommended doing explicit
checks for NaN's - and if only for the sake of readibility of the code.

- just my 2c -


  Alois



Reply | Threaded
Open this post in threaded view
|

Re: Cell arrays of strings in "sort" and "unique"

John W. Eaton-6
On 15-Sep-2004, Alois Schloegl <[hidden email]> wrote:

| In this case, compatibility is not a "must". The result might differ
| already in the next version of matlab.

That could be said for any Matlab feature.  Why is this one different
from any of the others?

jwe


Reply | Threaded
Open this post in threaded view
|

Re: Cell arrays of strings in "sort" and "unique"

David Bateman-3
In reply to this post by David Bateman-3
> If the position of NaN is important, its recommended doing explicit
> checks for NaN's - and if only for the sake of readibility of the code.

You've not read sort.cc lately :-) ... If floating format is IEEE754,
doubles are cast as "long long unsigned int" and with a bit of magic
the NaN's are sorted correctly, where the operator < alone won't do
it. This little trick gains a speed factor of 2.5 is sort, but readable
it is not.

Cheers
David

--
David Bateman                                [hidden email]
Motorola CRM                                 +33 1 69 35 48 04 (Ph)
Parc Les Algorithmes, Commune de St Aubin    +33 1 69 35 77 01 (Fax)
91193 Gif-Sur-Yvette FRANCE

The information contained in this communication has been classified as:

[x] General Business Information
[ ] Motorola Internal Use Only
[ ] Motorola Confidential Proprietary


Reply | Threaded
Open this post in threaded view
|

Re: Cell arrays of strings in "sort" and "unique"

David Bateman-3
In reply to this post by John W. Eaton-6
Ok, then how about this patch to sort to that uses templates where
possible, sorts on cell arrays of strings, makes sorting of
complex values matlab compatiable, and adds a mode flag to define
the direction of the sort.

Patch is against the CVS (or rather against an older CVS with my
previous sort patch applied), so it should apply cleanly.

Cheers
David

--
David Bateman                                [hidden email]
Motorola CRM                                 +33 1 69 35 48 04 (Ph)
Parc Les Algorithmes, Commune de St Aubin    +33 1 69 35 77 01 (Fax)
91193 Gif-Sur-Yvette FRANCE

The information contained in this communication has been classified as:

[x] General Business Information
[ ] Motorola Internal Use Only
[ ] Motorola Confidential Proprietary

changelog-save-2 (1K) Download Attachment
patch.sort-2 (33K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Cell arrays of strings in "sort" and "unique"

John W. Eaton-6
On 15-Sep-2004, David Bateman <[hidden email]> wrote:

| Ok, then how about this patch to sort to that uses templates where
| possible, sorts on cell arrays of strings, makes sorting of
| complex values matlab compatiable, and adds a mode flag to define
| the direction of the sort.
|
| Patch is against the CVS (or rather against an older CVS with my
| previous sort patch applied), so it should apply cleanly.

I applied this set of changes.

Thanks,

jwe


Reply | Threaded
Open this post in threaded view
|

Re: Cell arrays of strings in "sort" and "unique"

Alois Schlögl-2
In reply to this post by John W. Eaton-6


John W. Eaton wrote:

>On 15-Sep-2004, Alois Schloegl <[hidden email]> wrote:
>
>| In this case, compatibility is not a "must". The result might differ
>| already in the next version of matlab.
>
>That could be said for any Matlab feature.  Why is this one different
>from any of the others?
>
>jwe
>

Ok, the above answer can be made more impartial using the following
critiria,
(a) does the "purpose" of a function suggest a certain solution
(b) which solution is more powerful,  i.e. which  solution  can be used
in a wider range of  applications
(c) efficiency of the implementation, which  solution provides a higher
performance ?
(d) ... <something I forgot> ?

With respect to the question whether NaNs should be sorted to which end,
criteria (a) and (b) do not suggest any preferred solution. Therefore,
only criteria (c) should be applied.

Alois





Reply | Threaded
Open this post in threaded view
|

Re: Cell arrays of strings in "sort" and "unique"

Alois Schlögl-2
In reply to this post by David Bateman-3


David Bateman wrote:

>>If the position of NaN is important, its recommended doing explicit
>>checks for NaN's - and if only for the sake of readibility of the code.
>>    
>>
>
>You've not read sort.cc lately :-) ... If floating format is IEEE754,
>doubles are cast as "long long unsigned int" and with a bit of magic
>the NaN's are sorted correctly, where the operator < alone won't do
>it. This little trick gains a speed factor of 2.5 is sort, but readable
>it is not.
>
>Cheers
>David
>  
>

In the statement above, I was refering to readability in user space
m-files, not to readability within sort.cc. I'm aware that the bit of
"magic" works very well and provides an very efficient implementation of
SORT. From the implementation point of view - using the bit of "magic"
and considering IEEE754 - it is reasonable to put NaN's at the same end
of the sequence than +INF.

With the following example, I'll try to clarify the
"user-point-of-view"  (point of view in the m-domain).

    If somebody needs to sort a sequence x, like

    mode = 'ascend'; % or mode='descend';
    [y,i]=sort(x,'mode') % x contains nan's

    and if the position of NaNs in y is important (if not, who cares),
    than the user should explicitly check for NaNs using isnan(y).
    For example y(isnan(y)) = [] or alternatively y = y(~isnan(y)),
would remove all NaN's.

    The code in m-files becomes better readable, because one becomes
aware that NaNs are important and have been considered.

This means, from a users-point-of-view, SORT can put NaNs in the
beginning or at the end - it does not really matter. If you provide just
the most efficient implementation of SORT, that's fine :-)

Alois




Reply | Threaded
Open this post in threaded view
|

Re: Cell arrays of strings in "sort" and "unique"

John W. Eaton-6
On 16-Sep-2004, Alois Schloegl <[hidden email]> wrote:

| This means, from a users-point-of-view, SORT can put NaNs in the
| beginning or at the end - it does not really matter.

If it does not really matter where they go, then they might as well be
placed in a way that is compatible with the other leading brand.  That
way, people can expect consistent results when they move their code to
Octave.

jwe


Reply | Threaded
Open this post in threaded view
|

Re: Cell arrays of strings in "sort" and "unique"

Alois Schlögl-2


John W. Eaton wrote:

>On 16-Sep-2004, Alois Schloegl <[hidden email]> wrote:
>
>| This means, from a users-point-of-view, SORT can put NaNs in the
>| beginning or at the end - it does not really matter.
>
>If it does not really matter where they go, then they might as well be
>placed in a way that is compatible with the other leading brand.  That
>way, people can expect consistent results when they move their code to
>Octave.
>
>jwe
>  
>


Yes, of course. I tried only to constitute a reasing that does not
require any reference to the "other leading brand".
IMHO, the outcome of such kind of reasoning should be the basis for the
decision.

Alois