reading data from ascii files.

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

reading data from ascii files.

Veloci
Hi everyone,

I know that the topic sounds familiar to many of you, so i apologize for any annoyance it may cause.
My problem with the data input is not so much the "what functions to use", more like "making it faster".
Starting point are a couple of data files in ascii format with a specified number of header rows (data on the experiment and so on) followed by a unspecified number of data-rows (depending on the measurement time and the sampling rate i use).
What i want to do: I want to implement a automated input of these data files into octave matrices and calculate data from these (as the most people would).
What i want to use for that: The most promising function seems to be "dlmread", as my data is organised in rows with a fixed number of columns and a fixed delimiter.
My problem: As the number of rows is not specified, i can't give dlmread a range vector with the coordinates of the last row. This is not a tragic problem for dlmread, because it still works with only the starting row and column, but it is a rather time consuming matter concerning the huge data it has to read. It would be much faster if i could allocate the needed memory in a zero matrix with the size of the data. For this i need to know the number of rows in my ascii file. You could say, "Open the file in a good editor and seek for the end of rows". That would be a solution for manual data processing, but not for a automated procedure.
The solution i seek: I've searched on the net for a solution to my problem, but i couldn't find anything that would be possible in Octave. In MATLAB there was a possibility with memmapfile, that calculates the number of rows in a file. But memmapfile is not implemented in Octave. So i tried to find something similar, but couldn't find any example. I could imagine using the "eof"-pointer, but i don't know how to calculate the end row from that position.  

If there is any solution that you could suggest, it would be much appreciated.

Greetings,
Veloci

PS: Please don't consider calling any linux functions, because i am limited to a WinXP as OS. Sorry!
Reply | Threaded
Open this post in threaded view
|

Re: reading data from ascii files.

Siep Kroonenberg
On Fri, Jul 16, 2010 at 02:58:43AM -0700, Veloci wrote:

>
> Hi everyone,
>
> I know that the topic sounds familiar to many of you, so i apologize for any
> annoyance it may cause.
> My problem with the data input is not so much the "what functions to use",
> more like "making it faster".
> Starting point are a couple of data files in ascii format with a specified
> number of header rows (data on the experiment and so on) followed by a
> unspecified number of data-rows (depending on the measurement time and the
> sampling rate i use).
> What i want to do: I want to implement a automated input of these data files
> into octave matrices and calculate data from these (as the most people
> would).
> What i want to use for that: The most promising function seems to be
> "dlmread", as my data is organised in rows with a fixed number of columns
> and a fixed delimiter.
> My problem: As the number of rows is not specified, i can't give dlmread a
> range vector with the coordinates of the last row. This is not a tragic
> problem for dlmread, because it still works with only the starting row and
> column, but it is a rather time consuming matter concerning the huge data it
> has to read. It would be much faster if i could allocate the needed memory
> in a zero matrix with the size of the data. For this i need to know the
> number of rows in my ascii file. You could say, "Open the file in a good
> editor and seek for the end of rows". That would be a solution for manual
> data processing, but not for a automated procedure.
> The solution i seek: I've searched on the net for a solution to my problem,
> but i couldn't find anything that would be possible in Octave. In MATLAB
> there was a possibility with memmapfile, that calculates the number of rows
> in a file. But memmapfile is not implemented in Octave. So i tried to find
> something similar, but couldn't find any example. I could imagine using the
> "eof"-pointer, but i don't know how to calculate the end row from that
> position.  
>
> If there is any solution that you could suggest, it would be much
> appreciated.

Couldn't you just read and count all lines without parsing?

rnum = 0;
while (! feof(fid))
  rnum = rnum + 1;
  l = fgetl (fid);
end

--
Siep Kroonenberg
_______________________________________________
Help-octave mailing list
[hidden email]
https://www-old.cae.wisc.edu/mailman/listinfo/help-octave
MPD
Reply | Threaded
Open this post in threaded view
|

Re: reading data from ascii files.

MPD
I have a very similar problem, slightly more complicated i would think. I have 7 text files, which i read sequentially, each file has about 400,000 rows, i had written script which had nested loops , and it took over 3 days for my script to complete. I quickly changed the scope of my script, and also decided avoid nested loops , since i read it some where on this forum, that nested loops can be a problem. yet i seem to get very slow processing for these 3million rows.

End motive : loop over  7 text files, in each file, read one row at a time(while(fgets)and dlmread), identify if row is useful, append one of the column values into one of the 12 matrices i have, at the end of reading all rows, draw histograms, one for  each matrix,

Problem : No prior knowledge of size of matrices, or the number of rows in all the files.
And yes i am doing all this on RH7 linux.

Is there a way to quicken the read and append to matrix, whose sizes cant be pre-determined ?

Thanks,
Manoj

On Fri, Jul 16, 2010 at 8:17 AM, Siep Kroonenberg <[hidden email]> wrote:
On Fri, Jul 16, 2010 at 02:58:43AM -0700, Veloci wrote:
>
> Hi everyone,
>
> I know that the topic sounds familiar to many of you, so i apologize for any
> annoyance it may cause.
> My problem with the data input is not so much the "what functions to use",
> more like "making it faster".
> Starting point are a couple of data files in ascii format with a specified
> number of header rows (data on the experiment and so on) followed by a
> unspecified number of data-rows (depending on the measurement time and the
> sampling rate i use).
> What i want to do: I want to implement a automated input of these data files
> into octave matrices and calculate data from these (as the most people
> would).
> What i want to use for that: The most promising function seems to be
> "dlmread", as my data is organised in rows with a fixed number of columns
> and a fixed delimiter.
> My problem: As the number of rows is not specified, i can't give dlmread a
> range vector with the coordinates of the last row. This is not a tragic
> problem for dlmread, because it still works with only the starting row and
> column, but it is a rather time consuming matter concerning the huge data it
> has to read. It would be much faster if i could allocate the needed memory
> in a zero matrix with the size of the data. For this i need to know the
> number of rows in my ascii file. You could say, "Open the file in a good
> editor and seek for the end of rows". That would be a solution for manual
> data processing, but not for a automated procedure.
> The solution i seek: I've searched on the net for a solution to my problem,
> but i couldn't find anything that would be possible in Octave. In MATLAB
> there was a possibility with memmapfile, that calculates the number of rows
> in a file. But memmapfile is not implemented in Octave. So i tried to find
> something similar, but couldn't find any example. I could imagine using the
> "eof"-pointer, but i don't know how to calculate the end row from that
> position.
>
> If there is any solution that you could suggest, it would be much
> appreciated.

Couldn't you just read and count all lines without parsing?

rnum = 0;
while (! feof(fid))
 rnum = rnum + 1;
 l = fgetl (fid);
end

--
Siep Kroonenberg
_______________________________________________
Help-octave mailing list
[hidden email]
https://www-old.cae.wisc.edu/mailman/listinfo/help-octave


_______________________________________________
Help-octave mailing list
[hidden email]
https://www-old.cae.wisc.edu/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: reading data from ascii files.

Jaroslav Hajek-2
On Fri, Jul 16, 2010 at 5:32 PM, Manoj Deshpande
<[hidden email]> wrote:

> I have a very similar problem, slightly more complicated i would think. I
> have 7 text files, which i read sequentially, each file has about 400,000
> rows, i had written script which had nested loops , and it took over 3 days
> for my script to complete. I quickly changed the scope of my script, and
> also decided avoid nested loops , since i read it some where on this forum,
> that nested loops can be a problem. yet i seem to get very slow processing
> for these 3million rows.
>
> End motive : loop over  7 text files, in each file, read one row at a
> time(while(fgets)and dlmread), identify if row is useful, append one of the
> column values into one of the 12 matrices i have, at the end of reading all
> rows, draw histograms, one for  each matrix,
>
> Problem : No prior knowledge of size of matrices, or the number of rows in
> all the files.
> And yes i am doing all this on RH7 linux.
>
> Is there a way to quicken the read and append to matrix, whose sizes cant be
> pre-determined ?
>
> Thanks,
> Manoj
>

Note that appending to a vector usually requires reallocating storage
and copying the whole vector to the new storage.
If you append a single element to a vector repeatedly using
v(end+1) = x;
then Octave 3.2+ will detect this as a stack usage and attempt to
reduce the copying by over-allocating storage in advance, much like
std::vector in C++ does. This doesn't work for rows or columns,
however, so in that case the best approach is to collect the rows into
a cell array and concatenate them afterwards.

Show some code if you want more detailed advice.

--
RNDr. Jaroslav Hajek, PhD
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz

_______________________________________________
Help-octave mailing list
[hidden email]
https://www-old.cae.wisc.edu/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: reading data from ascii files.

Claudio Belotti-3
In reply to this post by Siep Kroonenberg
On Fri, Jul 16, 2010 at 13:17, Siep Kroonenberg <[hidden email]> wrote:
> Couldn't you just read and count all lines without parsing?

or you can call
wc -l filename
via system() and parse the output
_______________________________________________
Help-octave mailing list
[hidden email]
https://www-old.cae.wisc.edu/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: reading data from ascii files.

andy buckle
On Fri, Jul 16, 2010 at 5:15 PM, Claudio Belotti
<[hidden email]> wrote:
> On Fri, Jul 16, 2010 at 13:17, Siep Kroonenberg <[hidden email]> wrote:
>> Couldn't you just read and count all lines without parsing?
>
> or you can call
> wc -l filename
> via system() and parse the output

Quote from first email...

"
PS: Please don't consider calling any linux functions, because i am limited
to a WinXP as OS. Sorry!
"

--
/* andy buckle */
_______________________________________________
Help-octave mailing list
[hidden email]
https://www-old.cae.wisc.edu/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: reading data from ascii files.

Ron.Simonson
Andy Buckle wrote:

> On Fri, Jul 16, 2010 at 5:15 PM, Claudio Belotti
> <[hidden email]> wrote:
>> On Fri, Jul 16, 2010 at 13:17, Siep Kroonenberg <[hidden email]> wrote:
>>> Couldn't you just read and count all lines without parsing?
>> or you can call
>> wc -l filename
>> via system() and parse the output
>
> Quote from first email...
>
> "
> PS: Please don't consider calling any linux functions, because i am limited
> to a WinXP as OS. Sorry!
> "
>

wc and many other unix/linux commands are available for WinXP.
Many are available for immediate download on the web if needed.

_______________________________________________
Help-octave mailing list
[hidden email]
https://www-old.cae.wisc.edu/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

RE: reading data from ascii files.

Allen.Windhorn-2
In reply to this post by andy buckle
Andy,

-----Original Message-----
From: Andy Buckle [mailto:[hidden email]]

On Fri, Jul 16, 2010 at 5:15 PM, Claudio Belotti
<[hidden email]> wrote:
>> On Fri, Jul 16, 2010 Siep Kroonenberg <[hidden email]> wrote:
>>> Couldn't you just read and count all lines without parsing?
>>
>> or you can call
>> wc -l filename
>> via system() and parse the output

>Quote from first email...
>
> "PS: Please don't consider calling any linux functions, because
> i am limited to a WinXP as OS. Sorry!"

wc.exe is available for Windows systems.  Unfortunately the site
that normally has it <unxutils.sourceforge.net> seems to be down.
I can send you my copy (1996) if you want.

Regards,
Allen

_______________________________________________
Help-octave mailing list
[hidden email]
https://www-old.cae.wisc.edu/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: reading data from ascii files.

martin_helm
Am Freitag, 16. Juli 2010 18:52:53 schrieb [hidden email]:

> Andy,
>
> -----Original Message-----
> From: Andy Buckle [mailto:[hidden email]]
>
> On Fri, Jul 16, 2010 at 5:15 PM, Claudio Belotti
>
> <[hidden email]> wrote:
> >> On Fri, Jul 16, 2010 Siep Kroonenberg <[hidden email]> wrote:
> >>> Couldn't you just read and count all lines without parsing?
> >>
> >> or you can call
> >> wc -l filename
> >> via system() and parse the output
> >
> >Quote from first email...
> >
> > "PS: Please don't consider calling any linux functions, because
> > i am limited to a WinXP as OS. Sorry!"
>
> wc.exe is available for Windows systems.  Unfortunately the site
> that normally has it <unxutils.sourceforge.net> seems to be down.
> I can send you my copy (1996) if you want.
>
> Regards,
> Allen
>
> _______________________________________________
> Help-octave mailing list
> [hidden email]
> https://www-old.cae.wisc.edu/mailman/listinfo/help-octave
>

Hi,

I looked into the windows installation of octave, which I have on a virtual
machine and can see that the MSYS which is installed in the octave folder
contains the wc command anyway:

C:\Octave\3.2.3_gcc-4.4.0\MSYS\bin

You should look into your octave installation folder if you have this also
there.

Hope that helps.

- mh
_______________________________________________
Help-octave mailing list
[hidden email]
https://www-old.cae.wisc.edu/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: reading data from ascii files.

Veloci
In reply to this post by Siep Kroonenberg
Thanks for your fast replies!

As i am currently processing the data, i've implemented some of your ideas into code:

function [val_max,cnt_gr_as,val_min,cnt_lw_as,val_mean,val_std,val_count,data] = read_cal_data(filename,gr_as,lw_as)

fid = fopen(filename);

rnum = 0;
while (feof(fid)~= 1)
  rnum = rnum + 1;
  l = fgetl (fid);
end;

%allocating memory
data =zeros(rnum,4);

%range vector-[first_row,first_column,last_row,last_column]
r= [9 1 rnum-1 4];

%delimited read of the data
data = dlmread(filename,'\t',r);

val_max = max(data(:,4));
val_min = min(data(:,4));
val_mean = mean(data(:,4));
val_std = std(data(:,4));
data_size = size(data);
val_count = data_size(1,1);

%calculate border breaks
cnt_lw_as = 0;
for i = 1:val_count
if data(i,4)< lw_as
cnt_lw_as = cnt_lw_as +1;
end
end
cnt_gr_as = 0;
for i = 1:val_count
if data(i,4)>gr_as
cnt_gr_as = cnt_gr_as +1;
end
end

With that i can read the data and process it further. The thing is, dlmread takes ages to read the data. If i reduce the amount by adjusting the range vector (for example: r = [9 1 1000 4]) then the result is posted in an instant. But for my files (about 300000 rows and more) there is no end to see.
I will try to build a loop for the dlmread procedure - pushing the data in smaller pieces into my preallocated data matrix. I hope that will speed up the process.

Greetings,
Veloci.  
Reply | Threaded
Open this post in threaded view
|

Re: reading data from ascii files.

Veloci
Ok, after some trial and error, i've got a working version.

As i wrote the dlmread function is realy slow if you try to read big packages. So i changed the whole thing into a read piece by piece:

function [val_max,cnt_gr_as,val_min,cnt_lw_as,val_mean,val_std,val_count,data] = read_cal_data_oct(filename,gr_as,lw_as)

nr_of_header_rows = 9;
nr_of_rows2read = 1000;

fid = fopen(filename);

rnum = 0;
while (feof(fid)~= 1)
  rnum = rnum + 1;
  l = fgetl (fid);
end;

%allocating memory
data =zeros((rnum-1)-(nr_of_header_rows-1),4);  %rnum-1 because last row in my files is empty
disp(size(data));

%delimited read of the data

%calculate number of fitting range vectors
%range vector = [first_row,first_column,last_row,last_column]

nr_rv_cnt = floor((rnum-1-nr_of_header_rows)/nr_of_rows2read);
disp(nr_rv_cnt );

for j = nr_of_header_rows:nr_of_rows2read:((nr_rv_cnt-1)*nr_of_rows2read+9)
                r = [j 1 (j+nr_of_rows2read-1) 4];
                disp(r);
                disp(j);
                be= j- (nr_of_header_rows-1);
                en= be+ nr_of_rows2read-1;
                disp(be);
                disp(en);
                data(be:en,:) = dlmread(filename,'\t',r);
end;

%calculate last range vectors

j= j+nr_of_rows2read;
r = [j 1 (rnum-1) 4];
disp(r);

be= en + 1;
en= be +  (rnum-1-j);
disp(be);
disp(en);
data(be:en,:) = dlmread(filename,'\t',r);

fclose(fid);

%data processing
...


With these changes i can process my files in a matter of some seconds/minutes (the bigger "nr_of_rows2read", the slower it gets).  
Maybe with some adjustment it can be helpful to some of you.

Greetings,
Veloci.

PS: The disp() calls are not needed. They were just added for debugging reasons.
Reply | Threaded
Open this post in threaded view
|

Re: reading data from ascii files.

Jaroslav Hajek-2
On Sat, Jul 17, 2010 at 12:32 AM, Veloci <[hidden email]> wrote:

>
> Ok, after some trial and error, i've got a working version.
>
> As i wrote the dlmread function is realy slow if you try to read big
> packages. So i changed the whole thing into a read piece by piece:
>
> function
> [val_max,cnt_gr_as,val_min,cnt_lw_as,val_mean,val_std,val_count,data] =
> read_cal_data_oct(filename,gr_as,lw_as)
>
> nr_of_header_rows = 9;
> nr_of_rows2read = 1000;
>
> fid = fopen(filename);
>
> rnum = 0;
> while (feof(fid)~= 1)
>  rnum = rnum + 1;
>  l = fgetl (fid);
> end;
>
> %allocating memory
> data =zeros((rnum-1)-(nr_of_header_rows-1),4);  %rnum-1 because last row in
> my files is empty
> disp(size(data));
>
> %delimited read of the data
>
> %calculate number of fitting range vectors
> %range vector = [first_row,first_column,last_row,last_column]
>
> nr_rv_cnt = floor((rnum-1-nr_of_header_rows)/nr_of_rows2read);
> disp(nr_rv_cnt );
>
> for j = nr_of_header_rows:nr_of_rows2read:((nr_rv_cnt-1)*nr_of_rows2read+9)
>                r = [j 1 (j+nr_of_rows2read-1) 4];
>                disp(r);
>                disp(j);
>                be= j- (nr_of_header_rows-1);
>                en= be+ nr_of_rows2read-1;
>                disp(be);
>                disp(en);
>                data(be:en,:) = dlmread(filename,'\t',r);
> end;
>
> %calculate last range vectors
>
> j= j+nr_of_rows2read;
> r = [j 1 (rnum-1) 4];
> disp(r);
>
> be= en + 1;
> en= be +  (rnum-1-j);
> disp(be);
> disp(en);
> data(be:en,:) = dlmread(filename,'\t',r);
>
> fclose(fid);
>
> %data processing
> ...
>
>
> With these changes i can process my files in a matter of some
> seconds/minutes (the bigger "nr_of_rows2read", the slower it gets).
> Maybe with some adjustment it can be helpful to some of you.
>

This shows that dlmread probably has some serious performance
problems. It would be nice if you submitted a bug report.

--
RNDr. Jaroslav Hajek, PhD
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz

_______________________________________________
Help-octave mailing list
[hidden email]
https://www-old.cae.wisc.edu/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: reading data from ascii files.

Massimiliano Minella
In reply to this post by Veloci
Since you already read completely the file to get the number of rows, why you don't simply get the values from the string you read with fgetl?

If you replace this loop

Veloci wrote
rnum = 0;
while (feof(fid)~= 1)
  rnum = rnum + 1;
  l = fgetl (fid);
end;
with something like this

rnum = 0;
while (feof(fid)~= 1)
  rnum = rnum + 1;
  l = fgetl (fid);
  [val{i}] = sscanf(l, "%f %f %f %f");
end;

and after the loop you use the command "cell2mat" in order to concatenate the cell into a matrix you maybe could get a little improvement (in my case it did).

Max

Reply | Threaded
Open this post in threaded view
|

Re: reading data from ascii files.

Federica
Hi,

I'm new to this forum and above all I'm new to Octave....in this forum I found some useful information on how Octave works but I'm having a lot of problems in trying to create a script with Octave 3.2.4 on Windows and so I decided to ask for your help. I'd like to solve a system of differential equations whose coefficients must be taken from a txt or csv file. Each row of the file gives the 8 coefficients needed by the the two equations. So I was thinking that dlmread could be useful but a first problem appears. The file cannot be opened and I don't understand why. I tried with

filename= "prova.txt"
fid=fopen(filename, "r")

but the fid=-1 so it cannot opened but why? the txt file contains numbers with 3 decimal numbers

Could you help me please? maybe the format of the numbers? I missed some package while downloading Octave?

Thank you really so mch,


Federica