Shuffling elements in a dataset

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Shuffling elements in a dataset

Joao Cardoso-8

Hi,

I have the need to randomize the order (shuffle) of very large
datasets. The way I devise, randonly sampling with elimination, is
not very efficient. Is there a better way, using octave's matrix
manipulation?

My way:

    nm = num = rows(data);

    for i=1:num
        rn = ceil(rand * nm--);
        new_data(i,:) = data(rn,:);
        data(rn,:) = [];
    endfor

Better way: perhaps creating a vector of unique indexes? but how to
do this?

    idx = 1:rows(data);
    now shuffle idx
    new_data = data(idx,:)

Of course, this it is the same problem in one dimension...

Thanks,
Joao
--
Joao Cardoso, INESC  |  e-mail: [hidden email]
R. Jose Falcao 110   |  tel:    + 351 2 2094345
4050 Porto, Portugal |  fax:    + 351 2 2008487





Reply | Threaded
Open this post in threaded view
|

Re: Shuffling elements in a dataset

Marvin Vis
> I have the need to randomize the order (shuffle) of very large
> datasets. The way I devise, randonly sampling with elimination, is
> not very efficient. Is there a better way, using octave's matrix
> manipulation?
>
> Better way: perhaps creating a vector of unique indexes? but how to
> do this?
>
>     idx = 1:rows(data);
>     now shuffle idx
>     new_data = data(idx,:)
>
> Of course, this it is the same problem in one dimension...

I've done something along the lines of the following before:

        x = <data>  ....original data vector that you want to randomize

        n = rand(length(x),1);
        [garbage index] = sort(n);
        x_randomized = x(index);

You should be able to extend this to more dimensions as well....sort() is
relatively quick, so I've found this to be a decent approach.

M.