Split data depending on values

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Split data depending on values

Matthieu Chourrout
Hi,
 
I currently have a set of randomly spread data in a 2048-by-2048 grid (non-integer values between 1 and 2048), arranged in an N-by-2 matrix with X,Y coordinates, and I would like to split these data for further analysis, starting from 4 quarters of 1024-by-1024 (but with the idea of splitting it again).
I also need to be able to extract the indexes as some other data is "paired" with this.

It's the same idea as the BLOCKPROC function, but I'm not working with an image at this point.

To be more precise, here is my data:

colors = "rgb";
for channel_id = 1:3
   regions.(colors(channel_id)) = regionprops(labels(:,:,2), ...
filtered_image(:,:,channel_id),'Area','SubarrayIdx','BoundingBox','WeightedCentroid','MaxIntensity','MinIntensity','PixelValues');
   centers_of_mass.(colors(channel_id)) = cat(1, regions.(colors(channel_id)).WeightedCentroid);
endfor
% centers_of_mass.r, centers_of_mass.g and centers_of_mass.b are my N-by-2 matrices

I tried to use some sorting and then split at the different thresholds, but maybe is there another more efficient way?
 

function [new_idx, splitting_idx] = split(data_to_rearrange, splitting_values)
 splitting_idx = [0 0 0];

 [rearranged_data, new_idx] = sortrows(data_to_rearrange, [1 2]);
 splitting_idx(2) = -1 + find(rearranged_data(:,1) > splitting_values(1),1);

 [rearranged_data(1:splitting_idx(2), :), tmp_idx] = sortrows(rearranged_data(1:splitting_idx(2), :), [2 1]);
 new_idx(1:splitting_idx(2)) = new_idx(1:splitting_idx(2))(tmp_idx);
 splitting_idx(1) = -1 + find(rearranged_data(1:splitting_idx(2),2) > splitting_values(2),1);
 [rearranged_data(splitting_idx(2)+1:end, :), tmp_idx] = sortrows(rearranged_data(splitting_idx(2)+1:end, :), [2 1]);
 new_idx(splitting_idx(2)+1:end) = new_idx(splitting_idx(2)+1:end)(tmp_idx);
 splitting_idx(3) = -1 + splitting_idx(2) + find(rearranged_data(splitting_idx(2)+1:end,2) > splitting_values(2),1);
endfunction

 

With this function and the piece of code below, I can achieve it for the quarters. But now I want to split it further.

 

[I4,s4] = split(centers_of_mass.g,[acquisition_parameters.camera.center.x acquisition_parameters.camera.center.y]);
 [I4,s4] = split(centers_of_mass.g,[1024 1024]);
 centers_of_mass.g = centers_of_mass.g(I4,:);
 centers_of_mass.r = centers_of_mass.r(I4,:);
 centers_of_mass.b = centers_of_mass.b(I4,:);

 splitted_centers_of_mass.mean_g = [ nanmean(centers_of_mass.g(1:s4(1),:)) ; ...
  nanmean(centers_of_mass.g(s4(1)+1:s4(2),:)) ; ...
  nanmean(centers_of_mass.g(s4(2)+1:s4(3),:)) ; ...
  nanmean(centers_of_mass.g(s4(3)+1:end,:)) ];
 splitted_centers_of_mass.mean_r = [ nanmean(centers_of_mass.r(1:s4(1),:)) ; ...
  nanmean(centers_of_mass.r(s4(1)+1:s4(2),:)) ; ...
  nanmean(centers_of_mass.r(s4(2)+1:s4(3),:)) ; ...
  nanmean(centers_of_mass.r(s4(3)+1:end,:)) ];
 splitted_centers_of_mass.mean_b = [ nanmean(centers_of_mass.b(1:s4(1),:)) ; ...
  nanmean(centers_of_mass.b(s4(1)+1:s4(2),:)) ; ...
  nanmean(centers_of_mass.b(s4(2)+1:s4(3),:)) ; ...
  nanmean(centers_of_mass.b(s4(3)+1:end,:)) ];

 
Or maybe I could use BLOCKPROC earlier in my script (the problem with this solution is that I may also split some of my labeled groups and get duplicates or corrupted data on the edges...)?
 
Thanks in advance,
        
Matthieu CHOURROUT

PS: I remember I have read a thread quoting a function to export pastable variables, so that you can directly work on my data, but I never managed to find it again: if you know it, could you also mention it please?


Reply | Threaded
Open this post in threaded view
|

Re: Split data depending on values

Juan Pablo Carbajal-2
Hi,
Please do not send private e-mails. Keep the conversation in the mailing list.

Observations:
1. Boolean operations broadcast, check e.g.
x = (0:10).';
[3 6] <= x
ans =

  0  0
  0  0
  0  0
  1  0
  1  0
  1  0
  1  1
  1  1
  1  1
  1  1
  1  1

2. You can meshgrid your parameters


criteria = @(X,a,b,c,d) (a <= X(:,1) & X(:,1) < b) & (c <= X(:,2) & X(:,2) < d);
M = rand (1e4, 2);
np = 60;
p = linspace (0, 1, np);

tic
[A C] = meshgrid (p(1:end-1));
[B D] = meshgrid (p(2:end));
a = A(:).'; b = B(:).'; c = C(:).'; d = D(:).';

tf = criteria(M,a, b, c, d);
# Slicing will give different sizes in general
npar = size (tf, 2);
M_selected = cell(npar, 1);
for i=1:npar
  M_selected{i} = M(tf(:,i),:);
endfor
toc

tic
M_selected2 = cell (np-1, np-1);
for i=1:np-1
  for j=1:np-1
    tf = criteria (M,p(i),p(i+1),p(j),p(j+1));
    M_selected2{j,i} = M(tf,:);
  endfor
endfor
toc

success = cellfun(@(x,y) all((x==y)(:)), M_selected, M_selected2(:));
all (success)

I get
Elapsed time is 0.35539 seconds.
Elapsed time is 0.496975 seconds.
ans = 1

You can check that the two algorithms have the same complexity and the
gain is marginal, so you can stick to your loopy version. In this
particular example you are doing much fo the work to build a
histrogram of a 2D variable, Check hist3 in the statistics package for
similar work.

Cheers




On Thu, Jul 12, 2018 at 3:26 PM Matthieu Chourrout
<[hidden email]> wrote:

>
> Hi,
>
> Thank you for your answer. In fact, I realized my function was too complicated.
>
> I want to achieve what you did there but with different values, and I was trying to do it w/o any 'for' loop. After thinking about your code, this is what I wanted to do:
>
> criteria = @(a,b,c,d) M(:,1) >= a & M(:,2) >= c & M(:,1) < b & M(:,2) < d;
>
> M = rand(500, 2);
> A = linspace(0,1,10);
> result = {};
> for i = 1:9
>         for j = 1:9
>                 result(i,j) = M(criteria(A(i),A(i+1),A(j),A(j+1)), :);
>         endfor
> endfor
>
> Is there a way to get rid of the 'for' loops at this point?
>
> Best regards,
>
> Matthieu CHOURROUT
>
>
> ----- Mail original -----
>
> De: "Juan Pablo Carbajal" <[hidden email]>
> À: "matthieu chourrout" <[hidden email]>
> Envoyé: Jeudi 12 Juillet 2018 14:20:27
> Objet: Re: Split data depending on values
>
> Hi,
>
> I am not sure I understood your problem fully. Also your code doesn't
> help much because I can't run it. Always provide a **simple** running
> example to get good answers.
>
> In general any N-by-k matrix can be sorted by rows or columns. you
> need to define the criteria as a function that returns booleans with
> the length of columns and rows, e.g. an array of booleans with N rows
> to filter rows.
>
> A simple example: lets say you want all the rows for which the row
> vector has euclidean norm between a and b
>
> function tf = criteria (X, a, b)
>   Xnorm = sqrt (sumsq (X,2));
>   tf = Xnorm > a & Xnorm < b;
> endfunction
>
> M = randn (500, 2);
> tf = criteria (M, 0.5, 1);
> m = M(tf,:);
> plot(M(:,1), M(:,2), 'o;data;', m(:,1), m(:,2), 'x;selected;')
> t = linspace (0,2*pi,100).';
> Bu = [cos(t), sin(t)];
> hold on
> plot([0.5 1] .* Bu(:,1), [0.5 1] .* Bu(:,2), '-g')
> hold off
> axis tight
>
> You can generalize this to rows. If you want a 2D filter then your
> booleans should have the same shape as your matrix, e.g.
> tf = false(10,2); tf(1:3,1) = true; tf(7:10, 1) = true; tf(4:6, 2) = true;
> will select sections of the first and second column of a 10-by-2 matrix.
>
> On Wed, Jul 11, 2018 at 3:55 PM Matthieu Chourrout
> <[hidden email]> wrote:
> >
> > Hi,
> >
> > I currently have a set of randomly spread data in a 2048-by-2048 grid (non-integer values between 1 and 2048), arranged in an N-by-2 matrix with X,Y coordinates, and I would like to split these data for further analysis, starting from 4 quarters of 1024-by-1024 (but with the idea of splitting it again).
> > I also need to be able to extract the indexes as some other data is "paired" with this.
> >
> > It's the same idea as the BLOCKPROC function, but I'm not working with an image at this point.
> >
> > To be more precise, here is my data:
> >
> > colors = "rgb";
> > for channel_id = 1:3
> >    regions.(colors(channel_id)) = regionprops(labels(:,:,2), ...
> > filtered_image(:,:,channel_id),'Area','SubarrayIdx','BoundingBox','WeightedCentroid','MaxIntensity','MinIntensity','PixelValues');
> >    centers_of_mass.(colors(channel_id)) = cat(1, regions.(colors(channel_id)).WeightedCentroid);
> > endfor
> > % centers_of_mass.r, centers_of_mass.g and centers_of_mass.b are my N-by-2 matrices
> >
> > I tried to use some sorting and then split at the different thresholds, but maybe is there another more efficient way?
> >
> >
> > function [new_idx, splitting_idx] = split(data_to_rearrange, splitting_values)
> >  splitting_idx = [0 0 0];
> >
> >  [rearranged_data, new_idx] = sortrows(data_to_rearrange, [1 2]);
> >  splitting_idx(2) = -1 + find(rearranged_data(:,1) > splitting_values(1),1);
> >
> >  [rearranged_data(1:splitting_idx(2), :), tmp_idx] = sortrows(rearranged_data(1:splitting_idx(2), :), [2 1]);
> >  new_idx(1:splitting_idx(2)) = new_idx(1:splitting_idx(2))(tmp_idx);
> >  splitting_idx(1) = -1 + find(rearranged_data(1:splitting_idx(2),2) > splitting_values(2),1);
> >  [rearranged_data(splitting_idx(2)+1:end, :), tmp_idx] = sortrows(rearranged_data(splitting_idx(2)+1:end, :), [2 1]);
> >  new_idx(splitting_idx(2)+1:end) = new_idx(splitting_idx(2)+1:end)(tmp_idx);
> >  splitting_idx(3) = -1 + splitting_idx(2) + find(rearranged_data(splitting_idx(2)+1:end,2) > splitting_values(2),1);
> > endfunction
> >
> >
> >
> > With this function and the piece of code below, I can achieve it for the quarters. But now I want to split it further.
> >
> >
> >
> > [I4,s4] = split(centers_of_mass.g,[acquisition_parameters.camera.center.x acquisition_parameters.camera.center.y]);
> >  [I4,s4] = split(centers_of_mass.g,[1024 1024]);
> >  centers_of_mass.g = centers_of_mass.g(I4,:);
> >  centers_of_mass.r = centers_of_mass.r(I4,:);
> >  centers_of_mass.b = centers_of_mass.b(I4,:);
> >
> >  splitted_centers_of_mass.mean_g = [ nanmean(centers_of_mass.g(1:s4(1),:)) ; ...
> >   nanmean(centers_of_mass.g(s4(1)+1:s4(2),:)) ; ...
> >   nanmean(centers_of_mass.g(s4(2)+1:s4(3),:)) ; ...
> >   nanmean(centers_of_mass.g(s4(3)+1:end,:)) ];
> >  splitted_centers_of_mass.mean_r = [ nanmean(centers_of_mass.r(1:s4(1),:)) ; ...
> >   nanmean(centers_of_mass.r(s4(1)+1:s4(2),:)) ; ...
> >   nanmean(centers_of_mass.r(s4(2)+1:s4(3),:)) ; ...
> >   nanmean(centers_of_mass.r(s4(3)+1:end,:)) ];
> >  splitted_centers_of_mass.mean_b = [ nanmean(centers_of_mass.b(1:s4(1),:)) ; ...
> >   nanmean(centers_of_mass.b(s4(1)+1:s4(2),:)) ; ...
> >   nanmean(centers_of_mass.b(s4(2)+1:s4(3),:)) ; ...
> >   nanmean(centers_of_mass.b(s4(3)+1:end,:)) ];
> >
> >
> > Or maybe I could use BLOCKPROC earlier in my script (the problem with this solution is that I may also split some of my labeled groups and get duplicates or corrupted data on the edges...)?
> >
> > Thanks in advance,
> >
> > Matthieu CHOURROUT
> >
> > PS: I remember I have read a thread quoting a function to export pastable variables, so that you can directly work on my data, but I never managed to find it again: if you know it, could you also mention it please?
> >
>