Shuffling elements in a dataset

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Shuffling elements in a dataset

John W. Eaton-6

[Sorry about posting this more than once, it got away while I was
playing around with the example code and before I was ready to send
it. --jwe]

On 21-Feb-1997, Ted Harding <[hidden email]> wrote:

| I find that something like
|
|    [dummy,ix] = sort(rand(1,rows(x)));  new_x = x(ix,:);
|
| seems pretty fast. (0.04 secs for 10000 rows, 0.05 secs for 100000 rows,
| or for 1000000, on a 386-DX/25MHz; 0.003 secs for 10000 rows, 0.004 secs
| for 100000 rows, or for 1000000, on Pentium-120, i.e. almost independent
| of number of rows. However for 10000000 rows it starts swapping and takes
| a while (48 MB RAM)). Above timings for 1 column only; reduce sizes pro
| rata for extra columns (RAM limit).

Hmm.  Here is what I get (counting down to minimize memory problems):

  for n = [5e5, 10.^[5, 4, 3, 2, 1]]
    x = rand (n, 1);
    t = cputime ();
    [dummy,ix] = sort(rand(1,rows(x)));
    new_x = x(ix,:);
    t = cputime () - t;
    printf ("N = %8d, t = %6.2f seconds\n", n, t);
    fflush (stdout);
  endfor

  N =   500000, t =  13.50 seconds
  N =   100000, t =   2.31 seconds
  N =    10000, t =   0.18 seconds
  N =     1000, t =   0.02 seconds
  N =      100, t =   0.01 seconds
  N =       10, t =   0.00 seconds

at N ~ 8e5, I get into paging trouble on my 64MB Pentium system.

How did you end up with almost constant time?  I took Octave's sort
algorithm from Knuth.  It's good, but I don't think it's *that* good.

jwe