Textscan and csv fitness data problem

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Textscan and csv fitness data problem

Thomas Esbensen

Challenge: Convert attached csv-file into vectors for timestamps, power, and heart_rate data. The file originates from a FIT-file (from a fitness device) converted via FitSDKRelease into a CSV-file.

Problem: I am not able to generate an output via textscan that corresponds to the number of lines (rows) in the csv-file. E.g. the variable DataMat generated via examples below holds values from various rows in the first line (DataMat(1,:)). It somwhow wraps around and the function in Octave is not consistent.

Hint: In OCTAVE dimensions are off. In MATLAB dimensions follow the file.
See: https://se.mathworks.com/matlabcentral/answers/375102-textscan-and-csv-fitness-data-problem




_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave

2017-08-11-143641-ELEMNT BOLT 2779-2-0.csv (1M) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Textscan and csv fitness data problem

nrjank


On Jan 1, 2018 9:06 AM, "Thomas Esbensen" <[hidden email]> wrote:

Challenge: Convert attached csv-file into vectors for timestamps, power, and heart_rate data. The file originates from a FIT-file (from a fitness device) converted via FitSDKRelease into a CSV-file.

Problem: I am not able to generate an output via textscan that corresponds to the number of lines (rows) in the csv-file.  


Just to add onto the conversation, I had a friend trying to run a data parsing script built for textscan in MATLAB and it wouldn't work in octave. The data file was a mess so I never got around to paring it down to a minimal, sharable example to post to the group. 

The symptom was that a portion of the text file would cause Octave to advance farther than MATLAB.  I don't know if the culprit was textscan or fgetl, but octave would hit EOF before MATLAB and throw a parsing error. Some code stepping showed after some part of the file that octave's file location counter would jump to a much higher value than in MATLAB.

I don't have either program in front of me to check your data, but does this at all match what you're seeing?

I know there have been a few bug tracker reports about textscan. Can anyone comment about likely paths to explore there?  I also remember seeing something about issues double counting EOL character's, maybe specifically in windows?

Sorry I can't be more helpful.

Happy New Year!
Nick J

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Textscan and csv fitness data problem

PhilipNienhuis
NJank wrote
> On Jan 1, 2018 9:06 AM, "Thomas Esbensen" &lt;

> esbensen@

> &gt; wrote:Challenge: Convert attached csv-file into vectors for
> timestamps, power,and heart_rate data. The file originates from a FIT-file
> (from a fitnessdevice) converted via FitSDKRelease into a
> CSV-file.Problem: I am not able to generate an output via textscan that
> correspondsto the number of lines (rows) in the csv-file.Just to add onto
> the conversation, I had a friend trying to run a dataparsing script built
> for textscan in MATLAB and it wouldn't work in octave.The data file was a
> mess so I never got around to paring it down to aminimal, sharable example
> to post to the group.The symptom was that a portion of the text file would
> cause Octave toadvance farther than MATLAB.  I don't know if the culprit
> was textscan orfgetl, but octave would hit EOF before MATLAB and throw a
> parsing error.Some code stepping showed after some part of the file that
> octave's filelocation counter would jump to a much higher value than in
> MATLAB.I don't have either program in front of me to check your data, but
> doesthis at all match what you're seeing?I know there have been a few bug
> tracker reports about textscan. Can anyonecomment about likely paths to
> explore there?  I also remember seeingsomething about issues double
> counting EOL character's, maybe specificallyin windows?Sorry I can't be
> more helpful.

The file has many rows of different number of data; little wonder it is hard
to read.But csv2cell in the io package nowadays can do it in a breeze.I used
(after a bit of fiddling with the range, see below):Data = csv2cell
("2017-08-11-143641-ELEMNT BOLT 2779-2-0.csv", "A1:JZ2784");which gave me a
2784x62 array.However the results point to bugs in csv2cell, it should be
able to read the data with range "A1:DV2784"; in addition some clearly
numerical values aren't converted to a double value. I'll look into those
bugs.As to textscan, Dan did a lot of good work lately, I think the bugs you
implied have been fixed in the development branch.Philip



--
Sent from: http://octave.1599824.n4.nabble.com/Octave-General-f1599825.html

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Textscan and csv fitness data problem

nrjank


On Jan 1, 2018 12:13 PM, "PhilipNienhuis" <[hidden email]> wrote:
NJank wrote
> On Jan 1, 2018 9:06 AM, "Thomas Esbensen"

As to textscan, Dan did a lot of good work lately, I think the bugs you
implied have been fixed in the development branch.

Yeah, i noticed that. Would those make it into a 4.2.2 or not until 4.4.0? Been keeping my fingers crossed that it would suddenly "just work" and I wouldn't have to dive into his data again.

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Textscan and csv fitness data problem

Stuart Edwards
In reply to this post by Thomas Esbensen
Hi -

While this may not be the Octave solution requested, it is a practical solution. Install R and RStudio (both freely available).  Read your csv directly and inspect it in a nice tabular format.  The first few lines are a little confusing, but the data mostly look good. Converting to a set of individual vectors should be no problem with some minor clean-up.

Stu 

On Jan 1, 2018, at 6:24 AM, Thomas Esbensen <[hidden email]> wrote:

Challenge: Convert attached csv-file into vectors for timestamps, power, and heart_rate data. The file originates from a FIT-file (from a fitness device) converted via FitSDKRelease into a CSV-file.

Problem: I am not able to generate an output via textscan that corresponds to the number of lines (rows) in the csv-file. E.g. the variable DataMat generated via examples below holds values from various rows in the first line (DataMat(1,:)). It somwhow wraps around and the function in Octave is not consistent.

Hint: In OCTAVE dimensions are off. In MATLAB dimensions follow the file.
See: https://se.mathworks.com/matlabcentral/answers/375102-textscan-and-csv-fitness-data-problem



<2017-08-11-143641-ELEMNT BOLT 2779-2-0.csv>_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave


_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Textscan and csv fitness data problem

PhilipNienhuis
In reply to this post by nrjank
NJank wrote
> On Jan 1, 2018 12:13 PM, "PhilipNienhuis" &lt;

> pr.nienhuis@

> &gt; wrote:
>
> NJank wrote
>> On Jan 1, 2018 9:06 AM, "Thomas Esbensen"
>
>
> As to textscan, Dan did a lot of good work lately, I think the bugs you
> implied have been fixed in the development branch.
>
>
> Yeah, i noticed that. Would those make it into a 4.2.2 or not until 4.4.0?
> Been keeping my fingers crossed that it would suddenly "just work" and I
> wouldn't have to dive into his data again.

Have a look in the log: http://hg.savannah.gnu.org/hgweb/octave
bugs 52116 and 52479 have been fixed on stable, the last one (bug 52550)
not. If you want you can ask in the latter bug report to backport it to
stable.

As to csv2cell's erroneous column conversion, I've fixed that stupid bug and
pushed it. To use it, get csv2cell.cc from here:

http://hg.code.sf.net/p/octave/io/file/31b7ff5ee040/src/csv2cell.cc

and then do

mkoctfile csv2cell.cc

to build a fixed version. Swap it into place, using
"pkg load io; which csv2cell"
to find out where it should live, followed by
"pkg unload io; clear -f"
to clear the way for copying (otherwise csv2cell.oct is locked), and then
copy csv2cell.oct into place.

Philip




--
Sent from: http://octave.1599824.n4.nabble.com/Octave-General-f1599825.html

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Textscan and csv fitness data problem

PhilipNienhuis
In reply to this post by Stuart Edwards
Stuart Edwards wrote
> Hi -
>
> While this may not be the Octave solution requested, it is a practical
> solution. Install R and RStudio (both freely available).  Read your csv
> directly and inspect it in a nice tabular format.  The first few lines are
> a little confusing, but the data mostly look good. Converting to a set of
> individual vectors should be no problem with some minor clean-up.

Thanks for the hint but there's no need for that.
Octave (that is, csv2cell in the io package) can do it easily and very fast,
it reads the whole file properly into a mixed cell array.
See my other posts in this thread.

Philip




--
Sent from: http://octave.1599824.n4.nabble.com/Octave-General-f1599825.html

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Textscan and csv fitness data problem

bpabbott
Administrator
In reply to this post by PhilipNienhuis
> On Jan 1, 2018, at 11:51 AM, PhilipNienhuis <[hidden email]> wrote:
>
> NJank wrote
>> On Jan 1, 2018 12:13 PM, "PhilipNienhuis" &lt;
>
>> pr.nienhuis@
>
>> &gt; wrote:
>>
>> NJank wrote
>>> On Jan 1, 2018 9:06 AM, "Thomas Esbensen"
>>
>>
>> As to textscan, Dan did a lot of good work lately, I think the bugs you
>> implied have been fixed in the development branch.
>>
>>
>> Yeah, i noticed that. Would those make it into a 4.2.2 or not until 4.4.0?
>> Been keeping my fingers crossed that it would suddenly "just work" and I
>> wouldn't have to dive into his data again.
>
> Have a look in the log: http://hg.savannah.gnu.org/hgweb/octave
> bugs 52116 and 52479 have been fixed on stable, the last one (bug 52550)
> not. If you want you can ask in the latter bug report to backport it to
> stable.
>
> As to csv2cell's erroneous column conversion, I've fixed that stupid bug and
> pushed it. To use it, get csv2cell.cc from here:
>
> http://hg.code.sf.net/p/octave/io/file/31b7ff5ee040/src/csv2cell.cc
>
> and then do
>
> mkoctfile csv2cell.cc
>
> to build a fixed version. Swap it into place, using
> "pkg load io; which csv2cell"
> to find out where it should live, followed by
> "pkg unload io; clear -f"
> to clear the way for copying (otherwise csv2cell.oct is locked), and then
> copy csv2cell.oct into place.
>
> Philip

With 4.2.1 I get errors.

csv2cell.cc:228:40: error: no member named 'isnumeric' in 'octave_value'
  long hlines = (nargin > 1 && args(1).isnumeric () ? args(1).long_value() : 0);
                               ~~~~~~~ ^
csv2cell.cc:229:39: error: no member named 'isnumeric' in 'octave_value'
  int arg_sh = (nargin > 1 && args(1).isnumeric () ? 1 : 0);
                              ~~~~~~~ ^

Does the patched version only work on the default branch?

Ben


_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Textscan and csv fitness data problem

PhilipNienhuis
Ben Abbott wrote:
>> On Jan 1, 2018, at 11:51 AM, PhilipNienhuis <[hidden email]> wrote:
:
<snip>
:

> With 4.2.1 I get errors.
>
> csv2cell.cc:228:40: error: no member named 'isnumeric' in 'octave_value'
>   long hlines = (nargin > 1 && args(1).isnumeric () ? args(1).long_value() : 0);
>                                ~~~~~~~ ^
> csv2cell.cc:229:39: error: no member named 'isnumeric' in 'octave_value'
>   int arg_sh = (nargin > 1 && args(1).isnumeric () ? 1 : 0);
>                               ~~~~~~~ ^
>
> Does the patched version only work on the default branch?

Ben,

Yes it does. I noted the deprecated ("is_numeric_type") warnings and
quite thoughtlessly decided to clear those. But that won't help the
4.2.x users, apologies.
I'll fix it back and push a new cset. For some time we 'll have to make
do with compile time warnings then.

Note that the original csv2cell in io-2.4.8 just works, but the
translation from column header ("AX") to column number is off due to a
stupid bug (base 10 rather than base 26 calculations).

Thank for noticing,

Philip

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Textscan and csv fitness data problem

PhilipNienhuis
Philip Nienhuis wrote:

> Ben Abbott wrote:
>>> On Jan 1, 2018, at 11:51 AM, PhilipNienhuis <[hidden email]>
>>> wrote:
> :
> <snip>
> :
>
>> With 4.2.1 I get errors.
>>
>> csv2cell.cc:228:40: error: no member named 'isnumeric' in 'octave_value'
>>   long hlines = (nargin > 1 && args(1).isnumeric () ?
>> args(1).long_value() : 0);
>>                                ~~~~~~~ ^
>> csv2cell.cc:229:39: error: no member named 'isnumeric' in 'octave_value'
>>   int arg_sh = (nargin > 1 && args(1).isnumeric () ? 1 : 0);
>>                               ~~~~~~~ ^
>>
>> Does the patched version only work on the default branch?
>
> Ben,
>
> Yes it does. I noted the deprecated ("is_numeric_type") warnings and
> quite thoughtlessly decided to clear those. But that won't help the
> 4.2.x users, apologies.
> I'll fix it back and push a new cset. For some time we 'll have to make
> do with compile time warnings then.

Done.

P.


_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Textscan and csv fitness data problem

bpabbott
Administrator
In reply to this post by PhilipNienhuis
> On Jan 1, 2018, at 11:51 AM, PhilipNienhuis <[hidden email]> wrote:
>
> NJank wrote
>> On Jan 1, 2018 12:13 PM, "PhilipNienhuis" &lt;
>
>> pr.nienhuis@
>
>> &gt; wrote:
>>
>> NJank wrote
>>> On Jan 1, 2018 9:06 AM, "Thomas Esbensen"
>>
>>
>> As to textscan, Dan did a lot of good work lately, I think the bugs you
>> implied have been fixed in the development branch.
>>
>>
>> Yeah, i noticed that. Would those make it into a 4.2.2 or not until 4.4.0?
>> Been keeping my fingers crossed that it would suddenly "just work" and I
>> wouldn't have to dive into his data again.
>
> Have a look in the log: http://hg.savannah.gnu.org/hgweb/octave
> bugs 52116 and 52479 have been fixed on stable, the last one (bug 52550)
> not. If you want you can ask in the latter bug report to backport it to
> stable.
>
> As to csv2cell's erroneous column conversion, I've fixed that stupid bug and
> pushed it. To use it, get csv2cell.cc from here:
>
> http://hg.code.sf.net/p/octave/io/file/31b7ff5ee040/src/csv2cell.cc
>
> and then do
>
> mkoctfile csv2cell.cc
>
> to build a fixed version. Swap it into place, using
> "pkg load io; which csv2cell"
> to find out where it should live, followed by
> "pkg unload io; clear -f"
> to clear the way for copying (otherwise csv2cell.oct is locked), and then
> copy csv2cell.oct into place.
>
> Philip

The original file has lines with a varied number of columns.  As a result …

error: csv2cell: incorrect CSV file, line 2 too short

The first row are the column labels (127 of them), and 2nd row only has 19 columns (18 commas). There are other rows deep in the file with 127 columns too.

Ben


_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Textscan and csv fitness data problem

PhilipNienhuis
bpabbott wrote
>> On Jan 1, 2018, at 11:51 AM, PhilipNienhuis &lt;

> pr.nienhuis@

> &gt; wrote:
>>
>> NJank wrote
>>> On Jan 1, 2018 12:13 PM, "PhilipNienhuis" &lt;
>>
>>> pr.nienhuis@
>>
>>> &gt; wrote:
>>>
>>> NJank wrote
>>>> On Jan 1, 2018 9:06 AM, "Thomas Esbensen"
>>>
>>>
>>> As to textscan, Dan did a lot of good work lately, I think the bugs you
>>> implied have been fixed in the development branch.
>>>
>>>
>>> Yeah, i noticed that. Would those make it into a 4.2.2 or not until
>>> 4.4.0?
>>> Been keeping my fingers crossed that it would suddenly "just work" and I
>>> wouldn't have to dive into his data again.
>>
>> Have a look in the log: http://hg.savannah.gnu.org/hgweb/octave
>> bugs 52116 and 52479 have been fixed on stable, the last one (bug 52550)
>> not. If you want you can ask in the latter bug report to backport it to
>> stable.
>>
>> As to csv2cell's erroneous column conversion, I've fixed that stupid bug
>> and
>> pushed it. To use it, get csv2cell.cc from here:
>>
>> http://hg.code.sf.net/p/octave/io/file/31b7ff5ee040/src/csv2cell.cc
>>
>> and then do
>>
>> mkoctfile csv2cell.cc
>>
>> to build a fixed version. Swap it into place, using
>> "pkg load io; which csv2cell"
>> to find out where it should live, followed by
>> "pkg unload io; clear -f"
>> to clear the way for copying (otherwise csv2cell.oct is locked), and then
>> copy csv2cell.oct into place.
>>
>> Philip
>
> The original file has lines with a varied number of columns.  As a result
> …
>
> error: csv2cell: incorrect CSV file, line 2 too short
>
> The first row are the column labels (127 of them), and 2nd row only has 19
> columns (18 commas). There are other rows deep in the file with 127
> columns too.

Sure but if you try csv2cell with as 2nd argument a spreadsheet-style range,
it'll read .csv files with a varying nr. of data per row just fine, see my
first answer in this thread.
If you want to read all of the file, just supply a sufficiently large range;
it'll fill empty fields beyond current line length with "" (empty string).
See "help csv2cell"

The only practical limit is the max line length of 4096 chars (a #DEFINEd
setting; changing that is easy as csv2cell() is just an .oct file).

(of course, as usual I can only vouch for csv2cell() to work fine on the 4
boxes I have access to: my 2 multiboot Linux/Win7/Win10 boxes + 2 Win7 boxes
at work.)

Philip




--
Sent from: http://octave.1599824.n4.nabble.com/Octave-General-f1599825.html

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Textscan and csv fitness data problem

bpabbott
Administrator
> On Jan 3, 2018, at 12:19 AM, PhilipNienhuis <[hidden email]> wrote:
>
> bpabbott wrote
>>> On Jan 1, 2018, at 11:51 AM, PhilipNienhuis &lt;
>
>> pr.nienhuis@
>
>> &gt; wrote:
>>>
>>> NJank wrote
>>>> On Jan 1, 2018 12:13 PM, "PhilipNienhuis" &lt;
>>>
>>>> pr.nienhuis@
>>>
>>>> &gt; wrote:
>>>>
>>>> NJank wrote
>>>>> On Jan 1, 2018 9:06 AM, "Thomas Esbensen"
>>>>
>>>>
>>>> As to textscan, Dan did a lot of good work lately, I think the bugs you
>>>> implied have been fixed in the development branch.
>>>>
>>>>
>>>> Yeah, i noticed that. Would those make it into a 4.2.2 or not until
>>>> 4.4.0?
>>>> Been keeping my fingers crossed that it would suddenly "just work" and I
>>>> wouldn't have to dive into his data again.
>>>
>>> Have a look in the log: http://hg.savannah.gnu.org/hgweb/octave
>>> bugs 52116 and 52479 have been fixed on stable, the last one (bug 52550)
>>> not. If you want you can ask in the latter bug report to backport it to
>>> stable.
>>>
>>> As to csv2cell's erroneous column conversion, I've fixed that stupid bug
>>> and
>>> pushed it. To use it, get csv2cell.cc from here:
>>>
>>> http://hg.code.sf.net/p/octave/io/file/31b7ff5ee040/src/csv2cell.cc
>>>
>>> and then do
>>>
>>> mkoctfile csv2cell.cc
>>>
>>> to build a fixed version. Swap it into place, using
>>> "pkg load io; which csv2cell"
>>> to find out where it should live, followed by
>>> "pkg unload io; clear -f"
>>> to clear the way for copying (otherwise csv2cell.oct is locked), and then
>>> copy csv2cell.oct into place.
>>>
>>> Philip
>>
>> The original file has lines with a varied number of columns.  As a result
>> …
>>
>> error: csv2cell: incorrect CSV file, line 2 too short
>>
>> The first row are the column labels (127 of them), and 2nd row only has 19
>> columns (18 commas). There are other rows deep in the file with 127
>> columns too.
>
> Sure but if you try csv2cell with as 2nd argument a spreadsheet-style range,
> it'll read .csv files with a varying nr. of data per row just fine, see my
> first answer in this thread.
> If you want to read all of the file, just supply a sufficiently large range;
> it'll fill empty fields beyond current line length with "" (empty string).
> See "help csv2cell"
>
> The only practical limit is the max line length of 4096 chars (a #DEFINEd
> setting; changing that is easy as csv2cell() is just an .oct file).
>
> (of course, as usual I can only vouch for csv2cell() to work fine on the 4
> boxes I have access to: my 2 multiboot Linux/Win7/Win10 boxes + 2 Win7 boxes
> at work.)
>
> Philip

Ok. I’m not familiar with the history behind the default behavior. I was expecting the default behavior to load the full csv file. Given the current design, I don’t know how to determine the actual size of the file. Meaning when a range of rows/cols is specified, there is now way to be sure all the information is included.

If the behavior were changed such that "error: csv2cell: incorrect CSV file, line 2 too short” was replaced by “warning: csv2cell: line 2 has fewer columns than the prior lines” and the entire file were to be read would there be an adverse impact on compatibility?

Ben
_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Textscan and csv fitness data problem

PhilipNienhuis
Ben Abbott wrote:

>> On Jan 3, 2018, at 12:19 AM, PhilipNienhuis <[hidden email]> wrote:
>>
>> bpabbott wrote
>>>> On Jan 1, 2018, at 11:51 AM, PhilipNienhuis &lt;
>>
>>> pr.nienhuis@
>>
>>> &gt; wrote:
>>>>
>>>> NJank wrote
>>>>> On Jan 1, 2018 12:13 PM, "PhilipNienhuis" &lt;
>>>>
>>>>> pr.nienhuis@
>>>>
>>>>> &gt; wrote:
>>>>>
>>>>> NJank wrote
>>>>>> On Jan 1, 2018 9:06 AM, "Thomas Esbensen"
>>>>>
>>>>>
>>>>> As to textscan, Dan did a lot of good work lately, I think the bugs you
>>>>> implied have been fixed in the development branch.
>>>>>
>>>>>
>>>>> Yeah, i noticed that. Would those make it into a 4.2.2 or not until
>>>>> 4.4.0?
>>>>> Been keeping my fingers crossed that it would suddenly "just work" and I
>>>>> wouldn't have to dive into his data again.
>>>>
>>>> Have a look in the log: http://hg.savannah.gnu.org/hgweb/octave
>>>> bugs 52116 and 52479 have been fixed on stable, the last one (bug 52550)
>>>> not. If you want you can ask in the latter bug report to backport it to
>>>> stable.
>>>>
>>>> As to csv2cell's erroneous column conversion, I've fixed that stupid bug
>>>> and
>>>> pushed it. To use it, get csv2cell.cc from here:
>>>>
>>>> http://hg.code.sf.net/p/octave/io/file/31b7ff5ee040/src/csv2cell.cc
>>>>
>>>> and then do
>>>>
>>>> mkoctfile csv2cell.cc
>>>>
>>>> to build a fixed version. Swap it into place, using
>>>> "pkg load io; which csv2cell"
>>>> to find out where it should live, followed by
>>>> "pkg unload io; clear -f"
>>>> to clear the way for copying (otherwise csv2cell.oct is locked), and then
>>>> copy csv2cell.oct into place.
>>>>
>>>> Philip
>>>
>>> The original file has lines with a varied number of columns.  As a result
>>> …
>>>
>>> error: csv2cell: incorrect CSV file, line 2 too short
>>>
>>> The first row are the column labels (127 of them), and 2nd row only has 19
>>> columns (18 commas). There are other rows deep in the file with 127
>>> columns too.
>>
>> Sure but if you try csv2cell with as 2nd argument a spreadsheet-style range,
>> it'll read .csv files with a varying nr. of data per row just fine, see my
>> first answer in this thread.
>> If you want to read all of the file, just supply a sufficiently large range;
>> it'll fill empty fields beyond current line length with "" (empty string).
>> See "help csv2cell"
>>
>> The only practical limit is the max line length of 4096 chars (a #DEFINEd
>> setting; changing that is easy as csv2cell() is just an .oct file).
>>
>> (of course, as usual I can only vouch for csv2cell() to work fine on the 4
>> boxes I have access to: my 2 multiboot Linux/Win7/Win10 boxes + 2 Win7 boxes
>> at work.)
>>
>> Philip
>
> Ok. I’m not familiar with the history behind the default behavior. I was expecting the default behavior to load the full csv file. Given the current design, I don’t know how to determine the actual size of the file. Meaning when a range of rows/cols is specified, there is now way to be sure all the information is included.

Sure, I sympathize with your (and probably anyone else's) expectation.
But csv2cell isn't so flexible yet.

I usually inspect csv files with e.g., notepad++ before feeding them to
Octave.
For csv2cell one can always specify a range that is sufficiently "wide"
to contain all possible columns (max 4096, see [*] below). Afterwards
one can invoke parsecell.m in the io package to separate text and
numerical info; the resulting arrays are stripped from enveloping empty
columns/rows; or strip the empty outer columns by hand (e.g., by
re-using the code in parsecell.m).

csv files can also be read into Octave by LibreOffice (or Excel) using
xlsread or odsread.

> If the behavior were changed such that "error: csv2cell: incorrect CSV file, line 2 too short” was replaced by “warning: csv2cell: line 2 has fewer columns than the prior lines” and the entire file were to be read would there be an adverse impact on compatibility?

Compatibility? you mean with the competition? Matlab doesn't have
csv2cell or my other "easy" function to read mixed-type delimited files.

When implementing csv2cell's "range" option some io package releases ago
I've changed the actual reading part so that variable numbers of fields
per line are now easily coped with. The crux is efficiently finding the
required number of columns to be able to also efficiently preallocate
the output array.
For a small file an initial 4096 columns and resizing afterwards could
be fine, but I've read up to GB size files with csv2cell and cutting
down on the initial output array size gets vital then.
(10^6 lines (= not extreme) times 4096 columns needs 64-bit indexing.)

I'm open to suggestions, but implementation will be in a future
io-2.4.10 release as I think this needs careful thinking over  (FYI,
yesterday I've opened a ticket for an io-2.4.9 release)

Philip

[*] The line buffer currently is 4096 characters. A line can contain
just consecutive separators separating 4096 empty fields (or is it 4097
empty fields?). So there's the default max nr. of columns to take into
account.

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Textscan and csv fitness data problem

bpabbott
Administrator
On Jan 3, 2018, at 2:47 PM, Philip Nienhuis <[hidden email]> wrote:

Sure, I sympathize with your (and probably anyone else's) expectation. But csv2cell isn't so flexible yet.

I don’t know if it breaks something, but I commented out the two if/error blocks that check for columns that are too short or too long, and was able to read the file.

But, perhaps a better solution is to assume the 1st row contains all intended columns.

Ben

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Textscan and csv fitness data problem

PhilipNienhuis
Ben Abbott wrote:
>> On Jan 3, 2018, at 2:47 PM, Philip Nienhuis <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>> Sure, I sympathize with your (and probably anyone else's) expectation.
>> But csv2cell isn't so flexible yet.
>
> I don’t know if it breaks something, but I commented out the two
> if/error blocks that check for columns that are too short or too long,
> and was able to read the file.

Looking at the code I think you run little risk there.

To be honest, I or rather the io package inherited csv2cell and its
siblings cell2csv etc. from the old "general" (or was it the
"miscellaneous" package) and I never bothered much about it save for
some obvious and annoying bugs.
Only recently, looking better at it, I realize a few -in hindsight-
suboptimal choices have been implemented by the original author (who
seems to have vanished from the face of the earth).

csv2cell is a lot more flexible than csvread and dlmread and only a
little bit slower, so IMO it deserves more attention.

> But, perhaps a better solution is to assume the 1st row contains all
> intended columns.

Sure but I have several csv files where that isn't the case.
Yet this solution is probably the most practical one. If no range is
supplied csv2cell could supply a warning (instead of an error) if one or
more lines contain more fields than the first line, along the lines you
suggested in an earlier post. I think that would be easy to implement
and would also allow some code cleanup.

Philip

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave