How to compare the colums ( 2nd and 3rd) of 2 .csv files based on a condition

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

How to compare the colums ( 2nd and 3rd) of 2 .csv files based on a condition

GK19
I have a program based on some functionality and the results are displayed on 2 .csv files ( file1 and file2)
The files consists of same rows and columns, the columns are named as condition,supply1,supply2
I want to read both the files into the program, load them and then compare the contents of supply1,supply2, based on the condition and number.
We have compare the respective contents and display the difference of supply1 and suppy2 between those 2 files.
Please help me on this.   



file2.csv (232 bytes) Download Attachment
file1.csv (232 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: How to compare the colums ( 2nd and 3rd) of 2 .csv files based on a condition

nrjank
Administrator
On Tue, Jun 23, 2020 at 12:36 PM Ganesh Kini <[hidden email]> wrote:
I have a program based on some functionality and the results are displayed on 2 .csv files ( file1 and file2)
The files consists of same rows and columns, the columns are named as condition,supply1,supply2
I want to read both the files into the program, load them and then compare the contents of supply1,supply2, based on the condition and number.
We have compare the respective contents and display the difference of supply1 and suppy2 between those 2 files.

since you have both numbers and text, you can use textscan to read in each file into a Cell array.  (if they were all numbers, you could just use csvread. but it will turn all text into zeros, and you said you need to retain the 'condition' data.)

>> file1 = fopen('file1.csv');
>> file2 = fopen('file2.csv');  
>> data1 = textscan(file1, "%s,%f,%f,%d\n", "HeaderLines", 1, "Delimiter",",")
data1 =
{
  [1,1] =
  {
    [1,1] = tb
    [2,1] = tb
    [3,1] = tb
    [4,1] = ta
    [5,1] = ta
    [6,1] = tc
    [7,1] = tc
    [8,1] = tc
  }

  [1,2] =

     1.070000
     0.070000
     1.170000
     1.270000
     1.370000
     1.470000
     1.470000
     1.670000

  [1,3] =

    -1.1100
    -1.2300
    -1.6700
    -1.5700
    -1.3700
    -1.0700
    -1.4700
    -3.9700

  [1,4] =

    1
    2
    3
    1
    2
    1
    2
    3

}

>> data2 = textscan(file1, "%s,%f,%f,%d\n", "HeaderLines", 1, "Delimiter",",");
data2 =
{
  [1,1] =
  {
    [1,1] = ta
    [2,1] = ta
    [3,1] = tb
    [4,1] = tb
    [5,1] = tb
    [6,1] = tc
    [7,1] = tc
    [8,1] = tc
  }

  [1,2] =

     5.010000
     0.070000
     1.120000
     3.340000
     4.770000
     4.500000
     2.400000
     1.550000

  [1,3] =

    -6.1100
    -4.2300
    -7.6700
    -8.5700
    -2.3700
    -2.0700
    -4.4700
    -9.9700

  [1,4] =

    1
    2
    1
    2
    3
    1
    2
    3

}

>> fclose(file1);fclose(file2);


You now have all of your data loaded and can do whatever math and comparisons you want, working with the cell arrays.  Note that you index cells with { }, and the arrays within each cell element as usual with [ ]



Reply | Threaded
Open this post in threaded view
|

Re: How to compare the colums ( 2nd and 3rd) of 2 .csv files based on a condition

GK19
I tried, but it reads only one row and not all the rows

On Tue, Jun 23, 2020 at 7:15 PM Nicholas Jankowski <[hidden email]> wrote:
On Tue, Jun 23, 2020 at 12:36 PM Ganesh Kini <[hidden email]> wrote:
I have a program based on some functionality and the results are displayed on 2 .csv files ( file1 and file2)
The files consists of same rows and columns, the columns are named as condition,supply1,supply2
I want to read both the files into the program, load them and then compare the contents of supply1,supply2, based on the condition and number.
We have compare the respective contents and display the difference of supply1 and suppy2 between those 2 files.

since you have both numbers and text, you can use textscan to read in each file into a Cell array.  (if they were all numbers, you could just use csvread. but it will turn all text into zeros, and you said you need to retain the 'condition' data.)

>> file1 = fopen('file1.csv');
>> file2 = fopen('file2.csv');  
>> data1 = textscan(file1, "%s,%f,%f,%d\n", "HeaderLines", 1, "Delimiter",",")
data1 =
{
  [1,1] =
  {
    [1,1] = tb
    [2,1] = tb
    [3,1] = tb
    [4,1] = ta
    [5,1] = ta
    [6,1] = tc
    [7,1] = tc
    [8,1] = tc
  }

  [1,2] =

     1.070000
     0.070000
     1.170000
     1.270000
     1.370000
     1.470000
     1.470000
     1.670000

  [1,3] =

    -1.1100
    -1.2300
    -1.6700
    -1.5700
    -1.3700
    -1.0700
    -1.4700
    -3.9700

  [1,4] =

    1
    2
    3
    1
    2
    1
    2
    3

}

>> data2 = textscan(file1, "%s,%f,%f,%d\n", "HeaderLines", 1, "Delimiter",",");
data2 =
{
  [1,1] =
  {
    [1,1] = ta
    [2,1] = ta
    [3,1] = tb
    [4,1] = tb
    [5,1] = tb
    [6,1] = tc
    [7,1] = tc
    [8,1] = tc
  }

  [1,2] =

     5.010000
     0.070000
     1.120000
     3.340000
     4.770000
     4.500000
     2.400000
     1.550000

  [1,3] =

    -6.1100
    -4.2300
    -7.6700
    -8.5700
    -2.3700
    -2.0700
    -4.4700
    -9.9700

  [1,4] =

    1
    2
    1
    2
    3
    1
    2
    3

}

>> fclose(file1);fclose(file2);


You now have all of your data loaded and can do whatever math and comparisons you want, working with the cell arrays.  Note that you index cells with { }, and the arrays within each cell element as usual with [ ]



Reply | Threaded
Open this post in threaded view
|

Re: How to compare the colums ( 2nd and 3rd) of 2 .csv files based on a condition

nrjank
Administrator
On Tue, Jun 23, 2020 at 6:03 PM Ganesh Kini <[hidden email]> wrote:
I tried, but it reads only one row and not all the rows


I don't think it should matter on your system, but try:

data1 = textscan(file1, "%s,%f,%f,%d\r\n", "HeaderLines", 1, "Delimiter",",") 

certain text file formats require the extra line-end character. but that's usually only windows systems being MSDOS compatible.  the inputs/outputs i showed before were from the files you attached, so I'm not sure what else could be the issue. 


Reply | Threaded
Open this post in threaded view
|

Re: How to compare the colums ( 2nd and 3rd) of 2 .csv files based on a condition

GK19
Yes I used 
data1 = textscan(file1, '%s,%f,%f,%d\n', 'HeaderLines', 1, 'Delimiter',',')

But again I want to get the difference between the 2nd and 3rd column based on condition and number. How do I map it and then compare the value?  

Eg: condition ta and  number 1 of the 4th row of file1 has to compare with  condition ta and  number 1 of the 1st row of file2 

How do I find the difference in this case? Please guide me 


On Wed, Jun 24, 2020 at 1:12 AM Nicholas Jankowski <[hidden email]> wrote:
On Tue, Jun 23, 2020 at 6:03 PM Ganesh Kini <[hidden email]> wrote:
I tried, but it reads only one row and not all the rows


I don't think it should matter on your system, but try:

data1 = textscan(file1, "%s,%f,%f,%d\r\n", "HeaderLines", 1, "Delimiter",",") 

certain text file formats require the extra line-end character. but that's usually only windows systems being MSDOS compatible.  the inputs/outputs i showed before were from the files you attached, so I'm not sure what else could be the issue. 


Reply | Threaded
Open this post in threaded view
|

Re: How to compare the colums ( 2nd and 3rd) of 2 .csv files based on a condition

nrjank
Administrator
On Tue, Jun 23, 2020 at 8:08 PM Ganesh Kini <[hidden email]> wrote:
Yes I used 
data1 = textscan(file1, '%s,%f,%f,%d\n', 'HeaderLines', 1, 'Delimiter',',')


  and did adding the \r help or make any difference? and you tested on the exact same csv files you sent? i'm not sure why the exact same thing wouldn't work. 
 
But again I want to get the difference between the 2nd and 3rd column based on condition and number. How do I map it and then compare the value?  

Eg: condition ta and  number 1 of the 4th row of file1 has to compare with  condition ta and  number 1 of the 1st row of file2 

Once you solve the problem reading in the data, you can pull subsets and do subtraction of the data.  

if you prefer to work with named entities, you can pull them out of the cells:

file1 = fopen('file1.csv');
file2 = fopen('file2.csv');
data1 = textscan(file1, "%s,%f,%f,%d\n", "HeaderLines", 1, "Delimiter",",");
data2 = textscan(file2, "%s,%f,%f,%d\n", "HeaderLines", 1, "Delimiter",",");
fclose(file1);
fclose(file2);

condition1 = cell2mat(data1{1});  ##textscan made the char array into a cell array, so cell2mat pulls it back out.
supply1_1 = data1{2};
supply2_1 = data1{3};
number1 = data1{4};

condition2 = cell2mat(data2{1});
supply1_2 = data2{2};
supply2_2 = data2{3};
number2 = data2{4};

then, you use logical indexing to pull out subsets based on the values of condition1 and 2:

>> pos_a_1 = (condition1(:,2)=='a')
pos_a_1 =

  0
  0
  0
  1
  1
  0
  0
  0

>> pos_a_2 = (condition2(:,2)=='a')
pos_a_2 =

  1
  1
  0
  0
  0
  0
  0
  0

>> supply1_1(pos_a_1)
ans =

   1.2700
   1.3700

>> supply1_2(pos_a_2)
ans =

   5.010000
   0.070000

>> supply1_1(pos_a_1) - supply1_2(pos_a_2)
ans =

  -3.7400
   1.3000

repeat for tb and tc.  perform whatever math you want, and store and save the answers however you want. 


Reply | Threaded
Open this post in threaded view
|

Re: How to compare the colums ( 2nd and 3rd) of 2 .csv files based on a condition

GK19
results: for file 1
 [1,1] =
  {
    [1,1] = tb
    [2,1] = tb
    [3,1] = tb
    [4,1] = ta
    [5,1] = ta
    [6,1] = tc 
    [7,1] = tc
    [8,1] = tc  
}

condition1 = cell2mat(data1{1})

error 
error: cat: dimension mismatch
error: called from
    cell2mat at line 80 column 11
    fileoopencheck at line 27 column 12


On Wed, Jun 24, 2020 at 2:37 AM Nicholas Jankowski <[hidden email]> wrote:
On Tue, Jun 23, 2020 at 8:08 PM Ganesh Kini <[hidden email]> wrote:
Yes I used 
data1 = textscan(file1, '%s,%f,%f,%d\n', 'HeaderLines', 1, 'Delimiter',',')


  and did adding the \r help or make any difference? and you tested on the exact same csv files you sent? i'm not sure why the exact same thing wouldn't work. 
 
But again I want to get the difference between the 2nd and 3rd column based on condition and number. How do I map it and then compare the value?  

Eg: condition ta and  number 1 of the 4th row of file1 has to compare with  condition ta and  number 1 of the 1st row of file2 

Once you solve the problem reading in the data, you can pull subsets and do subtraction of the data.  

if you prefer to work with named entities, you can pull them out of the cells:

file1 = fopen('file1.csv');
file2 = fopen('file2.csv');
data1 = textscan(file1, "%s,%f,%f,%d\n", "HeaderLines", 1, "Delimiter",",");
data2 = textscan(file2, "%s,%f,%f,%d\n", "HeaderLines", 1, "Delimiter",",");
fclose(file1);
fclose(file2);

condition1 = cell2mat(data1{1});  ##textscan made the char array into a cell array, so cell2mat pulls it back out.
supply1_1 = data1{2};
supply2_1 = data1{3};
number1 = data1{4};

condition2 = cell2mat(data2{1});
supply1_2 = data2{2};
supply2_2 = data2{3};
number2 = data2{4};

then, you use logical indexing to pull out subsets based on the values of condition1 and 2:

>> pos_a_1 = (condition1(:,2)=='a')
pos_a_1 =

  0
  0
  0
  1
  1
  0
  0
  0

>> pos_a_2 = (condition2(:,2)=='a')
pos_a_2 =

  1
  1
  0
  0
  0
  0
  0
  0

>> supply1_1(pos_a_1)
ans =

   1.2700
   1.3700

>> supply1_2(pos_a_2)
ans =

   5.010000
   0.070000

>> supply1_1(pos_a_1) - supply1_2(pos_a_2)
ans =

  -3.7400
   1.3000

repeat for tb and tc.  perform whatever math you want, and store and save the answers however you want. 


Reply | Threaded
Open this post in threaded view
|

Re: How to compare the colums ( 2nd and 3rd) of 2 .csv files based on a condition

nrjank
Administrator
On Wed, Jun 24, 2020, 5:10 AM Ganesh Kini <[hidden email]> wrote:
results: for file 1
 [1,1] =
  {
    [1,1] = tb
    [2,1] = tb
    [3,1] = tb
    [4,1] = ta
    [5,1] = ta
    [6,1] = tc 
    [7,1] = tc
    [8,1] = tc  
}

condition1 = cell2mat(data1{1})

error 
error: cat: dimension mismatch
error: called from
    cell2mat at line 80 column 11
    fileoopencheck at line 27 column 12


Again, please stop top-posting your replies.

After loading the data, what is the output of:

>> data1

and

>> data1{1}



Reply | Threaded
Open this post in threaded view
|

Re: How to compare the colums ( 2nd and 3rd) of 2 .csv files based on a condition

GK19
Again, please stop top-posting your replies -- I click on the latest email and then click on reply all. Because I don't have access on octave to post directly. I have requested earlier but no responseimage.png

So for the answer of data1 

data1 = textscan (file1, "%s,%f,%f,%d \n" , "HeaderLines" , 1, "Delimiter" , "," )
>> data1
data1 =
{
  [1,1] =
  {
    [1,1] = tb
    [2,1] = tb
    [3,1] = tb
    [4,1] = ta
    [5,1] = ta
    [6,1] = tc
    [7,1] = tc
    [8,1] = tc
  }

  [1,2] =

     1.070000
     0.070000
     1.170000
     1.270000
     1.370000
     1.470000
     1.470000
     1.670000

  [1,3] =

    -1.1100
    -1.2300
    -1.6700
    -1.5700
    -1.3700
    -1.0700
    -1.4700
    -3.9700

  [1,4] =

    1
    2
    3
    1
    2
    1
    2
    3

}

On Wed, Jun 24, 2020 at 3:12 PM Nicholas Jankowski <[hidden email]> wrote:
On Wed, Jun 24, 2020, 5:10 AM Ganesh Kini <[hidden email]> wrote:
results: for file 1
 [1,1] =
  {
    [1,1] = tb
    [2,1] = tb
    [3,1] = tb
    [4,1] = ta
    [5,1] = ta
    [6,1] = tc 
    [7,1] = tc
    [8,1] = tc  
}

condition1 = cell2mat(data1{1})

error 
error: cat: dimension mismatch
error: called from
    cell2mat at line 80 column 11
    fileoopencheck at line 27 column 12


Again, please stop top-posting your replies.

After loading the data, what is the output of:

>> data1

and

>> data1{1}



Reply | Threaded
Open this post in threaded view
|

Re: How to compare the colums ( 2nd and 3rd) of 2 .csv files based on a condition

GK19
Rather than hardcore it, like   >> pos_a_1 = (condition1(:,2)=='a')
how can i make it in general? Do I need to use for loop? Please let me know how to do it ?

On Wed, Jun 24, 2020 at 3:19 PM Ganesh Kini <[hidden email]> wrote:
Again, please stop top-posting your replies -- I click on the latest email and then click on reply all. Because I don't have access on octave to post directly. I have requested earlier but no responseimage.png

So for the answer of data1 

data1 = textscan (file1, "%s,%f,%f,%d \n" , "HeaderLines" , 1, "Delimiter" , "," )
>> data1
data1 =
{
  [1,1] =
  {
    [1,1] = tb
    [2,1] = tb
    [3,1] = tb
    [4,1] = ta
    [5,1] = ta
    [6,1] = tc
    [7,1] = tc
    [8,1] = tc
  }

  [1,2] =

     1.070000
     0.070000
     1.170000
     1.270000
     1.370000
     1.470000
     1.470000
     1.670000

  [1,3] =

    -1.1100
    -1.2300
    -1.6700
    -1.5700
    -1.3700
    -1.0700
    -1.4700
    -3.9700

  [1,4] =

    1
    2
    3
    1
    2
    1
    2
    3

}

On Wed, Jun 24, 2020 at 3:12 PM Nicholas Jankowski <[hidden email]> wrote:
On Wed, Jun 24, 2020, 5:10 AM Ganesh Kini <[hidden email]> wrote:
results: for file 1
 [1,1] =
  {
    [1,1] = tb
    [2,1] = tb
    [3,1] = tb
    [4,1] = ta
    [5,1] = ta
    [6,1] = tc 
    [7,1] = tc
    [8,1] = tc  
}

condition1 = cell2mat(data1{1})

error 
error: cat: dimension mismatch
error: called from
    cell2mat at line 80 column 11
    fileoopencheck at line 27 column 12


Again, please stop top-posting your replies.

After loading the data, what is the output of:

>> data1

and

>> data1{1}



Reply | Threaded
Open this post in threaded view
|

Re: How to compare the colums ( 2nd and 3rd) of 2 .csv files based on a condition

nrjank
Administrator
In reply to this post by GK19
On Wed, Jun 24, 2020 at 9:20 AM Ganesh Kini <[hidden email]> wrote:
Again, please stop top-posting your replies -- I click on the latest email and then click on reply all.

In Gmail, when you Reply All, you will see 'three dots'. Click those to see the email reply chain. then you can trim and interleave/bottom post accordingly. Gmail admittedly does not make bottom posting as easy as it could.

image.png


Because I don't have access on octave to post directly. I have requested earlier but no responseimage.png


That is not the access request for posting, that's the access request for administering the mailing list.  There is no 'access on octave to post directly', although some of the archives like Nabble will emulate a direct post by sending an email to the list on your behalf.  Best to just learn to use Gmail.

 
So for the answer of data1 

data1 = textscan (file1, "%s,%f,%f,%d \n" , "HeaderLines" , 1, "Delimiter" , "," )
>> data1
data1 =
{
  [1,1] =
  {
    [1,1] = tb
    [2,1] = tb
    [3,1] = tb
    [4,1] = ta
    [5,1] = ta
    [6,1] = tc
    [7,1] = tc
    [8,1] = tc
  }

  [1,2] =

     1.070000
     0.070000
     1.170000
     1.270000
     1.370000
     1.470000
     1.470000
     1.670000

  [1,3] =

    -1.1100
    -1.2300
    -1.6700
    -1.5700
    -1.3700
    -1.0700
    -1.4700
    -3.9700

  [1,4] =

    1
    2
    3
    1
    2
    1
    2
    3

}

that's the exact output I get for data1.  

What about for data1{1}?  

What do you get from class(data1{1})? 

condition1 = cell2mat(data1{1})

gives me:
>> condition1 = cell2mat(data1{1})
condition1 =

tb
tb
tb
ta
ta
tc
tc
tc



Reply | Threaded
Open this post in threaded view
|

Re: How to compare the colums ( 2nd and 3rd) of 2 .csv files based on a condition

nrjank
Administrator
In reply to this post by GK19
On Wed, Jun 24, 2020 at 11:09 AM Ganesh Kini <[hidden email]> wrote:
>
> Rather than hardcore it, like   >> pos_a_1 = (condition1(:,2)=='a')
> how can i make it in general? Do I need to use for loop? Please let me know how to do it ?
>>>
>>>

what do you mean 'in general'? you mean for the 'a' part?  'a' is just a string.  you can put any variable in there. do you mean where I saved that result in the pos_a_1 variable? I just did that for convenience. You can index the data directly without assigning the index array to a variable.  e.g.,:

current_condition = 'a';

output = supply1_1(condition1(:,2)==current_condition) -    supply1_2(condition2(:,2)==current_condition)

you could do it in a for loop. But only you know what you want to do with the data after you've done the subtraction, so I don't know how you want to build your output. but something like the following stores the info in a cell array (since the number of data points for ta, tb, tc are different):

ouput_position = 1;
for current_condition = ['a', 'b', 'c']
  output{output_position}  = supply1_1(condition1(:,2)==current_condition) -    supply1_2(condition2(:,2)==current_condition) ;
  output_position++; 
endfor

>> output

output =
{
  [1,1] =

    -3.7400
     1.3000

  [1,2] =

    -0.050000
    -3.270000
    -3.600000

  [1,3] =

    -3.03000
    -0.93000
     0.12000

}



Reply | Threaded
Open this post in threaded view
|

Re: How to compare the colums ( 2nd and 3rd) of 2 .csv files based on a condition

GK19
In reply to this post by nrjank
Hi Nicholas,
>> data1{1}
ans =
{
  [1,1] = tb
  [2,1] = tb
  [3,1] = tb
  [4,1] = ta
  [5,1] = ta
  [6,1] = tc
  [7,1] = tc
  [8,1] = tc
}

>> class(data1{1})
ans = cell

 condition1 = cell2mat(data1{1})

Now I am getting the same output



On Wed, Jun 24, 2020 at 5:21 PM Nicholas Jankowski <[hidden email]> wrote:
On Wed, Jun 24, 2020 at 9:20 AM Ganesh Kini <[hidden email]> wrote:
Again, please stop top-posting your replies -- I click on the latest email and then click on reply all.

In Gmail, when you Reply All, you will see 'three dots'. Click those to see the email reply chain. then you can trim and interleave/bottom post accordingly. Gmail admittedly does not make bottom posting as easy as it could.

image.png


Because I don't have access on octave to post directly. I have requested earlier but no responseimage.png


That is not the access request for posting, that's the access request for administering the mailing list.  There is no 'access on octave to post directly', although some of the archives like Nabble will emulate a direct post by sending an email to the list on your behalf.  Best to just learn to use Gmail.

 
So for the answer of data1 

data1 = textscan (file1, "%s,%f,%f,%d \n" , "HeaderLines" , 1, "Delimiter" , "," )
>> data1
data1 =
{
  [1,1] =
  {
    [1,1] = tb
    [2,1] = tb
    [3,1] = tb
    [4,1] = ta
    [5,1] = ta
    [6,1] = tc
    [7,1] = tc
    [8,1] = tc
  }

  [1,2] =

     1.070000
     0.070000
     1.170000
     1.270000
     1.370000
     1.470000
     1.470000
     1.670000

  [1,3] =

    -1.1100
    -1.2300
    -1.6700
    -1.5700
    -1.3700
    -1.0700
    -1.4700
    -3.9700

  [1,4] =

    1
    2
    3
    1
    2
    1
    2
    3

}

that's the exact output I get for data1.  

What about for data1{1}?  

What do you get from class(data1{1})? 

condition1 = cell2mat(data1{1})

gives me:
>> condition1 = cell2mat(data1{1})
condition1 =

tb
tb
tb
ta
ta
tc
tc
tc