textscan preserving all whitespace

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

textscan preserving all whitespace

rmales
I apologize if this has been previously answered, I did search, it is clear that there are many questions about textscan, but I did not see this particular one (may be my ignorance).

I am using the experimental version of Octave for Windows 8 (v 3.8.2) with the GUI, it works very well.

I have inherited some Matlab scripts that are used to create and parse various Ascii files, and have been successful in converting them quite easily to Octave.  However, I have one script that reads an entire ascii file into an array and then has lots of parsing of that array that depends upon preserving the whitespace, i.e. the exact number of spaces between fields.   The input is processed through:

fid = fopen('ODOC');
tot = textscan(fid,'%s','delimiter','\n');

An example of the input that needs to be handled is:
      3600.0     8.0000     2.1500     0.0000     0.2500     0.0000
      7200.0     8.0000     2.2500     0.0000     0.6830     0.0000
    10800.0     8.0000     2.3500     0.0000     0.9330     0.0000
    14400.0     8.0000     2.4500     0.0000     0.9330     0.0000
    18000.0     8.0000     2.5500     0.0000     0.6830     0.0000

When textscan processes this, the whitespace is crunched out so that there is only a single whitespace character between fields.   This causes problems for the rest of the script which is expecting fields to be in pre-defined columns.  At present, I cannot change the format of the input Ascii file, I am stuck with it.

Any suggestions on how to modfy the textscan command or use an alternative method of reading the file that will preserve the spaces will be much appreciated.   I am aware that Matlab textscan is not identical to Octave textscan, was not expecting that to be the case.

Thank you all for this truly superior piece of software.

R. Males
Cincinnati, Ohio
USA
Reply | Threaded
Open this post in threaded view
|

Re: textscan preserving all whitespace

Philip Nienhuis
rmales wrote
I apologize if this has been previously answered, I did search, it is clear that there are many questions about textscan, but I did not see this particular one (may be my ignorance).

I am using the experimental version of Octave for Windows 8 (v 3.8.2) with the GUI, it works very well.

I have inherited some Matlab scripts that are used to create and parse various Ascii files, and have been successful in converting them quite easily to Octave.  However, I have one script that reads an entire ascii file into an array and then has lots of parsing of that array that depends upon preserving the whitespace, i.e. the exact number of spaces between fields.   The input is processed through:

fid = fopen('ODOC');
tot = textscan(fid,'%s','delimiter','\n');

An example of the input that needs to be handled is:
      3600.0     8.0000     2.1500     0.0000     0.2500     0.0000
      7200.0     8.0000     2.2500     0.0000     0.6830     0.0000
    10800.0     8.0000     2.3500     0.0000     0.9330     0.0000
    14400.0     8.0000     2.4500     0.0000     0.9330     0.0000
    18000.0     8.0000     2.5500     0.0000     0.6830     0.0000

When textscan processes this, the whitespace is crunched out so that there is only a single whitespace character between fields.   This causes problems for the rest of the script which is expecting fields to be in pre-defined columns.  At present, I cannot change the format of the input Ascii file, I am stuck with it.

Any suggestions on how to modfy the textscan command or use an alternative method of reading the file that will preserve the spaces will be much appreciated.   I am aware that Matlab textscan is not identical to Octave textscan, was not expecting that to be the case.
Thanks for your kind words about Octave :-)

(Hopefully I understand you properly.)
AFAIK Matlab's textscan behaves exactly the same. The "fix", or rather: the trick, is easy and is borrowed from the Matlab scripts we use at work (where we hit the same issue):
Try adding ' "whitespace", ""  '  (w/o the single quotes!) to textscan's parameters, i.e.:

tot = textscan (fid, '%s', 'delimiter', '\n', 'whitespace', '');  ## Not sure if delimiter param is still required

It'll give you a cellstr array of lines (the exact lines from your data file). AFAIU that is what you want to achieve, no?
If you want to go further, you'd have to split the data yourself along the lines of:

tot = cell2mat (cell2mat (tot));    # gets you an array of strings
data1 = tot(:,  1:11);
data2 = tot(:. 12:22);
:
etc

Philip
Reply | Threaded
Open this post in threaded view
|

Re: textscan preserving all whitespace

bpabbott
Administrator
In reply to this post by rmales

On Sep 11, 2014, at 6:08 PM, rmales <[hidden email]> wrote:

> I apologize if this has been previously answered, I did search, it is clear
> that there are many questions about textscan, but I did not see this
> particular one (may be my ignorance).
>
> I am using the experimental version of Octave for Windows 8 (v 3.8.2) with
> the GUI, it works very well.
>
> I have inherited some Matlab scripts that are used to create and parse
> various Ascii files, and have been successful in converting them quite
> easily to Octave.  However, I have one script that reads an entire ascii
> file into an array and then has lots of parsing of that array that depends
> upon preserving the whitespace, i.e. the exact number of spaces between
> fields.   The input is processed through:
>
> fid = fopen('ODOC');
> tot = textscan(fid,'%s','delimiter','\n');
>
> An example of the input that needs to be handled is:
>      3600.0     8.0000     2.1500     0.0000     0.2500     0.0000
>      7200.0     8.0000     2.2500     0.0000     0.6830     0.0000
>    10800.0     8.0000     2.3500     0.0000     0.9330     0.0000
>    14400.0     8.0000     2.4500     0.0000     0.9330     0.0000
>    18000.0     8.0000     2.5500     0.0000     0.6830     0.0000
>
> When textscan processes this, the whitespace is crunched out so that there
> is only a single whitespace character between fields.   This causes problems
> for the rest of the script which is expecting fields to be in pre-defined
> columns.  At present, I cannot change the format of the input Ascii file, I
> am stuck with it.
>
> Any suggestions on how to modfy the textscan command or use an alternative
> method of reading the file that will preserve the spaces will be much
> appreciated.   I am aware that Matlab textscan is not identical to Octave
> textscan, was not expecting that to be the case.
>
> Thank you all for this truly superior piece of software.
>
> R. Males
> Cincinnati, Ohio
> USA

You just want to read the lines into a cell array of strings?

        lines = strsplit ((fileread ('ODOC')), "\n");

To convert to doubles ...

        data = str2num (strcat('[',strcat(lines(1:end),';'){:},']'));

Or if you only need the numeric data ...

        data = load ('ODOC')

Ben




_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: textscan preserving all whitespace

rmales
Thanks to the two of you who have given prompt replies.  I tried them both as is, and did not get the needed results, both failed.   However, taking the suggestion of Philip Nienhuis to add the whitespace to the textscan, the resulting code below worked perfectly:

fid = fopen('ODOC');
tot = textscan (fid, '%s', 'delimiter', '\n', 'whitespace', '');
tot = tot{:};
fclose(fid);


I neglected to make clear in the original post that the file I am reading is mixed text and numeric, basically legacy from an old Fortran program.  

Here is what more of the file looks like.   The old matlab script I am converting is pulling information out of this, and is cognizant of the individual field positions in each line.   After I get all this done, I will certainly be proposing drastic changes to the approach to modernize it.

COMPUTATION OPTION ILINE=  1
Alongshore gradient IQYDY=  0
ILINE cross-shore lines are computed together

COMPUTATION OPTION IPROFL =  1
Profile evolution is computed from Time = 0.0
to Time =       18000.0  for NTIME =    5

NO ROLLER is included in computation

NO wave and current interaction included

WAVE OVERTOPPING, OVERFLOW AND SEEPAGE
Runup wire height (m)                   RWH=    0.020
Initial crest location for L=1           JCREST=   301
Initial crest height (m) for L=1        RCREST=    8.000
Swash velocity parameter              AWD=    1.600
Output exceedance probability        EWD=    0.015

This is in addition to segments of the file that have the numeric formatting from my previous post.

Dick

Reply | Threaded
Open this post in threaded view
|

Re: textscan preserving all whitespace

Philip Nienhuis
rmales wrote
Thanks to the two of you who have given prompt replies.  I tried them both as is, and did not get the needed results, both failed.   However, taking the suggestion of Philip Nienhuis to add the whitespace to the textscan, the resulting code below worked perfectly:

fid = fopen('ODOC');
tot = textscan (fid, '%s', 'delimiter', '\n', 'whitespace', '');
tot = tot{:};
fclose(fid);


I neglected to make clear in the original post that the file I am reading is mixed text and numeric, basically legacy from an old Fortran program.  

Here is what more of the file looks like.   The old matlab script I am converting is pulling information out of this, and is cognizant of the individual field positions in each line.   After I get all this done, I will certainly be proposing drastic changes to the approach to modernize it.

COMPUTATION OPTION ILINE=  1
Alongshore gradient IQYDY=  0
ILINE cross-shore lines are computed together

COMPUTATION OPTION IPROFL =  1
Profile evolution is computed from Time = 0.0
to Time =       18000.0  for NTIME =    5

NO ROLLER is included in computation

NO wave and current interaction included

WAVE OVERTOPPING, OVERFLOW AND SEEPAGE
Runup wire height (m)                   RWH=    0.020
Initial crest location for L=1           JCREST=   301
Initial crest height (m) for L=1        RCREST=    8.000
Swash velocity parameter              AWD=    1.600
Output exceedance probability        EWD=    0.015

This is in addition to segments of the file that have the numeric formatting from my previous post.
Yes, this sort of files (chaotic header followed by neatly lined-up data columns) is exactly what we use the textscan script for.

Yet I think Ben's 1st suggestion, followed by cell2mat(), could be faster than textscan.
What you want in the end is a character array, as that allows easy isolation of individual data columns (if needed followed by str2double).

If you were to change some procedures, I'd look at the FORTRAN program first ;-)
Although...  it looks like the file header can easily be parsed using regexp - relying on "individual field positions" often turns out to be quite fragile.

Philip