Shuffling elements in a dataset (fwd)

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Shuffling elements in a dataset (fwd)

Ted.Harding
( Re Message From: John W. Eaton )

>
> On 21-Feb-1997, Ted Harding <[hidden email]> wrote:
>
> | I find that something like
> |
> |    [dummy,ix] = sort(rand(1,rows(x)));  new_x = x(ix,:);
> |
> | seems pretty fast. (0.04 secs for 10000 rows, 0.05 secs for 100000 rows,
> | or for 1000000, on a 386-DX/25MHz; 0.003 secs for 10000 rows, 0.004 secs
> | for 100000 rows, or for 1000000, on Pentium-120, i.e. almost independent
> | of number of rows. However for 10000000 rows it starts swapping and takes
> | a while (48 MB RAM)). Above timings for 1 column only; reduce sizes pro
> | rata for extra columns (RAM limit).
>
> Hmm.  Here is what I get (counting down to minimize memory problems):
>
>   for n = [[8, 6, 4, 2]*1e5, 10.^[5, 4, 3, 2, 1]]
>     x = rand (n, 1);
>     t = cputime ();
>     [dummy,ix] = sort(rand(1,rows(x)));
>     new_x = x(ix,:);
>     t = cputime () - t;
>     printf ("N = %8d, t = %6.2f seconds, n*log(n) = %.2e\n", n, t, n*log(n));
>     fflush (stdout);
>   endfor
>
>   N =   500000, t =  13.96 seconds
>   N =   100000, t =   2.30 seconds
>   N =    10000, t =   0.17 seconds
>   N =     1000, t =   0.02 seconds
>   N =      100, t =   0.01 seconds
>   N =       10, t =   0.01 seconds
>
> at N == 10^6, I get into paging trouble on my 64MB Pentium system.
>
> How did you end up with almost constant time?  I took Octave's sort
> algorithm from Knuth.  It's good, but I don't think it's *that* good.
>
> jwe

Eerrrmm OOPS. Turns out I inadvertently did it on a row vector instead of
a column vector. Too late at night. Anyway, it's next day now, and here's
some typical results (Pentium 120, 48MB RAM).

> x=(1:1000)'  ; tic; [x0,ix]=sort(rand(1,rows(x))); x(ix); toc
ans = 0.049538
> x=(1:10000)' ; tic; [x0,ix]=sort(rand(1,rows(x))); x(ix); toc
ans = 0.74168
> x=(1:100000)'; tic; [x0,ix]=sort(rand(1,rows(x))); x(ix); toc
ans = 13.884

Very comparable. I still reckon it's pretty fast! (especially since
there's a lot else happening on the machine at the moment).

Best wishes,
Ted.                                    ([hidden email])