new set functions

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

new set functions

Lippert, Ross A.



I have made a modification to create_set because I found myself
often wanting not just the set of unique elements but also the
counts of those elements in the original input.  Perhaps that is
just my own eccentricity.

However, if one modifies create_set in this way (returning a
second output of counts), then I believe that the runtime complexity
of intersection and complement can be reduced to n log n instead of
n^2 (which they appear to be now, but I am no expert on complexity).

Anyhow, as a quick example I tried
x = fix(20000*rand(1000,1)); y = fix(20000*rand(1000,1));

and found a big difference between the timings of
intersection(x,y), so I think I am on the right track.

I'd be sending this out in the form of a patch, but for the moment,
I'd really like feedback to tell me if I have done something stupid
so I don't waste time updating my CVS, installing the new scripts
and then diffing all to find out I missed something.



-r



set.tgz (27K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: new set functions

Paul Kienzle-5

The run-time complexity of the set functions should be as follows:

        create_set  =  O (n log(n)) since you have to look up
                each new element in the existing set before
                inserting it, and lookup is O(log n).  So the
                sort algorithm is a fine way to handle this.

        union = O(n+m) since in the worst case, you need to
                copy both sets.

        intersection, complement = O(max(n,m)) if the set
                sizes are similar, or O(m log(n)) if m
                is much smaller than n, since in the worst
                case you need to copy the whole set (m~n)
                or just look up a few elements in the
                larger set (m<<n).

Octave 2.1.31 uses an O(n log n) algorithm for create_set and union, and
an O(n) interpreted loop for intersection, complement.  The CVS archive
has an O(n log n) algorithm for intersection and complement as well,
which is faster than the O(n) algorithm only because for loops are so
slow.  If you are concerned about speed for very large sets, it is worth
recoding union, intersect, and complement in C++ using the O(n) algorithm,
otherwise get the most recent scripts from the archives.

Paul Kienzle
[hidden email]

On Wed, Nov 15, 2000 at 09:52:44AM -0500, Lippert, Ross A. wrote:

>
>
>
> I have made a modification to create_set because I found myself
> often wanting not just the set of unique elements but also the
> counts of those elements in the original input.  Perhaps that is
> just my own eccentricity.
>
> However, if one modifies create_set in this way (returning a
> second output of counts), then I believe that the runtime complexity
> of intersection and complement can be reduced to n log n instead of
> n^2 (which they appear to be now, but I am no expert on complexity).
>
> Anyhow, as a quick example I tried
> x = fix(20000*rand(1000,1)); y = fix(20000*rand(1000,1));
>
> and found a big difference between the timings of
> intersection(x,y), so I think I am on the right track.
>
> I'd be sending this out in the form of a patch, but for the moment,
> I'd really like feedback to tell me if I have done something stupid
> so I don't waste time updating my CVS, installing the new scripts
> and then diffing all to find out I missed something.
>
>
>
> -r
>
>