Speed of odsread

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Speed of odsread

hjborsje
I'm comparing the speed of loading a LibreOffice (*.ods) spreadsheet in
Octave and MATLAB.  The results are so vastly different that I wonder if I'm
doing it correctly.  A spreadsheet with 1900 rows, 23 columns takes 135
seconds in Octave. In Matlab 5.5 seconds.   A larger spreadsheet of 60,000 x
23 takes 10.5 seconds in Matlab. Octave did not finish after 10 minutes.   I
run Octave 4.2.1 with all the latest packages on Windows 10.  I enter:
[s,~,~,~] = odsload ('MySpreadsheet.ods',1);  Adding a third and fourth
parameters makes no significant difference. RData_01b.ods
<http://octave.1599824.n4.nabble.com/file/t372538/RData_01b.ods>  



--
Sent from: http://octave.1599824.n4.nabble.com/Octave-General-f1599825.html

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Speed of odsread

siko1056
hjborsje wrote

> I'm comparing the speed of loading a LibreOffice (*.ods) spreadsheet in
> Octave and MATLAB.  The results are so vastly different that I wonder if
> I'm
> doing it correctly.  A spreadsheet with 1900 rows, 23 columns takes 135
> seconds in Octave. In Matlab 5.5 seconds.   A larger spreadsheet of 60,000
> x
> 23 takes 10.5 seconds in Matlab. Octave did not finish after 10 minutes.  
> I
> run Octave 4.2.1 with all the latest packages on Windows 10.  I enter:
> [s,~,~,~] = odsload ('MySpreadsheet.ods',1);  Adding a third and fourth
> parameters makes no significant difference. RData_01b.ods
> &lt;http://octave.1599824.n4.nabble.com/file/t372538/RData_01b.ods&gt; 

Are you using this code [1] for your benchmark?  Java is not as native to
Octave as it is to Matlab (the whole GUI is Java), so any attempt to use
Java is overhead.  I tried the odsread function from the io package [2].
The results are far better than yours but still have factor 2 with Matlab.
To use the io package, just type:

>> pkg install -forge io
>> pkg load io
>> javaaddpath ("jOpenDocument-1.3.jar")          # Available from [3]
>> javaaddpath ("xerces-2_11_0/xercesImpl.jar") # Available from [4]

>> N = 10; t = 0; for i = 1:N, tic; X = odsread ('RData_01b.ods'); t = t +
>> toc; end, fprintf('csvread: avg. %f seconds\n', t/N)

csvread: avg. 10.013385 seconds

As documented in [2], it is a bit more time saving, when having multiple
reads to open the document once and only invoke the reading, but not that
much better:

>> ods = odsopen ('RData_01b.ods'); N = 10; t = 0; for i = 1:N, tic; X =
>> ods2oct (ods); t = t + toc; end, fprintf('csvread: avg. %f seconds\n',
>> t/N), ods = odsclose (ods);

csvread: avg. 7.611418 seconds


Anyway, when dealing with a huge amount of data, I think that an Excel- or
ODS-spreadsheet is like running a marathon in knight's armor, it is an
unnecessary overhead. Simply save your data from your application of choice
as comma separated values (CSV) and see the magic that works even far beyond
60,000 data rows:

>> N = 10; t = 0; for i = 1:N, tic; X = csvread ('RData_01b.csv'); t = t +
>> toc; end, fprintf('csvread: avg. %f seconds\n', t/N)

Matlab R2017a: csvread: avg. 0.014881 seconds
Octave 4.2.1: csvread: avg. 0.020552 seconds

Kai.

[1]:
https://www.mathworks.com/matlabcentral/fileexchange/28411-read-and-write-open-document-format--odf--spreadsheet---ods-
[2]: https://octave.sourceforge.io/io/function/odsread.html
[3]: http://www.jopendocument.org/downloads.html
[4]: https://xerces.apache.org/mirrors.cgi#binary

RData_01b.csv
<http://octave.1599824.n4.nabble.com/file/t370282/RData_01b.csv>  



--
Sent from: http://octave.1599824.n4.nabble.com/Octave-General-f1599825.html

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Speed of odsread

siko1056
In reply to this post by hjborsje
hjborsje wrote

> I'm comparing the speed of loading a LibreOffice (*.ods) spreadsheet in
> Octave and MATLAB.  The results are so vastly different that I wonder if
> I'm
> doing it correctly.  A spreadsheet with 1900 rows, 23 columns takes 135
> seconds in Octave. In Matlab 5.5 seconds.   A larger spreadsheet of 60,000
> x
> 23 takes 10.5 seconds in Matlab. Octave did not finish after 10 minutes.  
> I
> run Octave 4.2.1 with all the latest packages on Windows 10.  I enter:
> [s,~,~,~] = odsload ('MySpreadsheet.ods',1);  Adding a third and fourth
> parameters makes no significant difference. RData_01b.ods
> &lt;http://octave.1599824.n4.nabble.com/file/t372538/RData_01b.ods&gt; 

I assume you are using this code [1]? Java is not as native to Octave as it
is to Matlab (the whole GUI is Java), so any attempt to use Java is
overhead. I tried  the odsread function from the io package [2]. But the
results are not promising as well. To use, just type:

>> pkg install -forge io
>> pkg load io
>> javaaddpath ("jOpenDocument-1.3.jar")          # Available from [3]
>> javaaddpath ("xerces-2_11_0/xercesImpl.jar") # Available from [4]

>> N = 10; t = 0; for i = 1:N, tic; X = odsread ('RData_01b.ods'); t = t +
>> toc; end, fprintf('csvread: avg. %f seconds\n', t/N)

csvread: avg. 10.013385 seconds

It is a bit time saving, when having multiple reads to open the document
once and only invoke the reading, but not that much better:

>> ods = odsopen ('RData_01b.ods'); N = 10; t = 0; for i = 1:N, tic; X =
>> ods2oct (ods); t = t + toc; end, fprintf('csvread: avg. %f seconds\n',
>> t/N), ods = odsclose (ods);

csvread: avg. 7.611418 seconds


Anyway, when dealing with a huge amount of data, I think that an Excel- or
ODS-spreadsheet is like running a marathon in knight's armor, it is an
unnecessary overhead. Simply save your data from your application of choice
as comma separated values (CSV) and see the magic that works even far beyond
60,000 data rows:

>> N = 10; t = 0; for i = 1:N, tic; X = csvread ('RData_01b.csv'); t = t +
>> toc; end, fprintf('csvread: avg. %f seconds\n', t/N)

Matlab R2017a: csvread: avg. 0.014881 seconds
Octave 4.2.1: csvread: avg. 0.020552 seconds

Kai.

[1]:
https://www.mathworks.com/matlabcentral/fileexchange/28411-read-and-write-open-document-format--odf--spreadsheet---ods-
[2]: https://octave.sourceforge.io/io/function/odsread.html
[3]: http://www.jopendocument.org/downloads.html
[4]: https://xerces.apache.org/mirrors.cgi#binary

RData_01b.csv
<http://octave.1599824.n4.nabble.com/file/t370282/RData_01b.csv>  



--
Sent from: http://octave.1599824.n4.nabble.com/Octave-General-f1599825.html

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Speed of odsread

PhilipNienhuis
In reply to this post by hjborsje
(Hey I did send a reply though Nabble bit it didn't arrive. Attempt #2)


hjborsje wrote

> I'm comparing the speed of loading a LibreOffice (*.ods) spreadsheet in
> Octave and MATLAB.  The results are so vastly different that I wonder if
> I'm
> doing it correctly.  A spreadsheet with 1900 rows, 23 columns takes 135
> seconds in Octave. In Matlab 5.5 seconds.   A larger spreadsheet of 60,000
> x
> 23 takes 10.5 seconds in Matlab. Octave did not finish after 10 minutes.  
> I
> run Octave 4.2.1 with all the latest packages on Windows 10.  I enter:
> [s,~,~,~] = odsload ('MySpreadsheet.ods',1);  Adding a third and fourth
> parameters makes no significant difference.

What spreadsheet I/O "interface" did you use? and windows 32 or 64 bit?

The "native" spreadsheet I/O interface ("OCT") for .ods is know to be
pathetic as regards speed. In addition .ods is an almost equally
pathetically complex file format.

The fastest interface for .ods we've got on offer is jOpendocument, but it
is Java-based.
Until Octave has better functions for morphing heterogeneous Java data
structures into Octave cell arrays (maybe, sometime) we cannot beat matlab
with its Excel backend.

Philip




--
Sent from: http://octave.1599824.n4.nabble.com/Octave-General-f1599825.html

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Speed of odsread

PhilipNienhuis
PhilipNienhuis wrote

> (Hey I did send a reply though Nabble bit it didn't arrive. Attempt #2)
>
>
> hjborsje wrote
>> I'm comparing the speed of loading a LibreOffice (*.ods) spreadsheet in
>> Octave and MATLAB.  The results are so vastly different that I wonder if
>> I'm
>> doing it correctly.  A spreadsheet with 1900 rows, 23 columns takes 135
>> seconds in Octave. In Matlab 5.5 seconds.   A larger spreadsheet of
>> 60,000
>> x
>> 23 takes 10.5 seconds in Matlab. Octave did not finish after 10 minutes.  
>> I
>> run Octave 4.2.1 with all the latest packages on Windows 10.  I enter:
>> [s,~,~,~] = odsload ('MySpreadsheet.ods',1);  Adding a third and fourth
>> parameters makes no significant difference.
>
> What spreadsheet I/O "interface" did you use? and windows 32 or 64 bit?
>
> The "native" spreadsheet I/O interface ("OCT") for .ods is know to be
> pathetic as regards speed. In addition .ods is an almost equally
> pathetically complex file format.
>
> The fastest interface for .ods we've got on offer is jOpendocument, but it
> is Java-based.
> Until Octave has better functions for morphing heterogeneous Java data
> structures into Octave cell arrays (maybe, sometime) we cannot beat matlab
> with its Excel backend.

FYI, just some benchmarks with the file you uploaded:

tic; [tic; [~, ~, raw] = odsread ("RData_01b.ods", 1, "", <intf>); toc

gives when specifying the following interfaces for <intf>:

"otk": ~18 s.
"jod": ~10 s.
"oct": ~96 s.
"uno": ~35 s.     ## Invokes LibreOffice
... and ...
tic; [~, ~, raw] = xlsread ("RData_01b.ods", 1, "", "com"); toc   ## Invokes
Excel
==> about 6 s.

Philip




--
Sent from: http://octave.1599824.n4.nabble.com/Octave-General-f1599825.html

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Speed of odsread

hjborsje
In reply to this post by siko1056
 
I assume you are using this code [1]? Java is not as native to Octave as it
is to Matlab (the whole GUI is Java), so any attempt to use Java is
overhead. I tried  the odsread function from the io package [2]. But the
results are not promising as well. To use, just type:

>> pkg install -forge io
>> pkg load io
>> javaaddpath ("jOpenDocument-1.3.jar")          # Available from [3]
>> javaaddpath ("xerces-2_11_0/xercesImpl.jar") # Available from [4]

>> N = 10; t = 0; for i = 1:N, tic; X = odsread ('RData_01b.ods'); t = t +
>> toc; end, fprintf('csvread: avg. %f seconds\n', t/N)

csvread: avg. 10.013385 seconds

It is a bit time saving, when having multiple reads to open the document
once and only invoke the reading, but not that much better:

>> ods = odsopen ('RData_01b.ods'); N = 10; t = 0; for i = 1:N, tic; X =
>> ods2oct (ods); t = t + toc; end, fprintf('csvread: avg. %f seconds\n',
>> t/N), ods = odsclose (ods);

csvread: avg. 7.611418 seconds


Anyway, when dealing with a huge amount of data, I think that an Excel- or
ODS-spreadsheet is like running a marathon in knight's armor, it is an
unnecessary overhead. Simply save your data from your application of choice
as comma separated values (CSV) and see the magic that works even far beyond
60,000 data rows:

>> N = 10; t = 0; for i = 1:N, tic; X = csvread ('RData_01b.csv'); t = t +
>> toc; end, fprintf('csvread: avg. %f seconds\n', t/N)

Matlab R2017a: csvread: avg. 0.014881 seconds
Octave 4.2.1: csvread: avg. 0.020552 seconds

Kai.

[1]:
https://www.mathworks.com/matlabcentral/fileexchange/28411-read-and-write-open-document-format--odf--spreadsheet---ods-
[2]: https://octave.sourceforge.io/io/function/odsread.html
[3]: http://www.jopendocument.org/downloads.html
[4]: https://xerces.apache.org/mirrors.cgi#binary



I was using [2], with similar results as you do.  I read somewhere that
starting with Octave 4.0 all Java was native in Octave.  I guess that line
could use some clarification, I'm no Java user. After installing the two
Java jars I also get the faster times.

csv is clearly the way to go, but sometimes it's easier to pick a few
numbers from specific cells in a spreadsheet.

Henk




--
Sent from: http://octave.1599824.n4.nabble.com/Octave-General-f1599825.html

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Speed of odsread

hjborsje
In reply to this post by PhilipNienhuis
PhilipNienhuis wrote

> PhilipNienhuis wrote
>> (Hey I did send a reply though Nabble bit it didn't arrive. Attempt #2)
>>
>>
>> hjborsje wrote
>>> I'm comparing the speed of loading a LibreOffice (*.ods) spreadsheet in
>>> Octave and MATLAB.  The results are so vastly different that I wonder if
>>> I'm
>>> doing it correctly.  A spreadsheet with 1900 rows, 23 columns takes 135
>>> seconds in Octave. In Matlab 5.5 seconds.   A larger spreadsheet of
>>> 60,000
>>> x
>>> 23 takes 10.5 seconds in Matlab. Octave did not finish after 10 minutes.  
>>> I
>>> run Octave 4.2.1 with all the latest packages on Windows 10.  I enter:
>>> [s,~,~,~] = odsload ('MySpreadsheet.ods',1);  Adding a third and fourth
>>> parameters makes no significant difference.
>>
>> What spreadsheet I/O "interface" did you use? and windows 32 or 64 bit?
>>
>> The "native" spreadsheet I/O interface ("OCT") for .ods is know to be
>> pathetic as regards speed. In addition .ods is an almost equally
>> pathetically complex file format.
>>
>> The fastest interface for .ods we've got on offer is jOpendocument, but
>> it
>> is Java-based.
>> Until Octave has better functions for morphing heterogeneous Java data
>> structures into Octave cell arrays (maybe, sometime) we cannot beat
>> matlab
>> with its Excel backend.
>
> FYI, just some benchmarks with the file you uploaded:
>
> tic; [tic; [~, ~, raw] = odsread ("RData_01b.ods", 1, "",
> <intf>
> ); toc
>
> gives when specifying the following interfaces for
> <intf>
> :
>
> "otk": ~18 s.
> "jod": ~10 s.
> "oct": ~96 s.
> "uno": ~35 s.     ## Invokes LibreOffice
> ... and ...
> tic; [~, ~, raw] = xlsread ("RData_01b.ods", 1, "", "com"); toc   ##
> Invokes
> Excel
> ==> about 6 s.
>
> Philip
>
>
>
>
> --
> Sent from:
> http://octave.1599824.n4.nabble.com/Octave-General-f1599825.html
>
> _______________________________________________
> Help-octave mailing list

> Help-octave@

> https://lists.gnu.org/mailman/listinfo/help-octave

My Octave installation must not be complete, I get 'Unsupported' for "otk"
and "uno".  Also "com" fails.
I do have LibreOffice 5.4 and Excel installed on my system and MATLAB reads
xls (as well as .ods) without a hitch.
Any ideas what I missed during the Octave installation?  (Win10 home
64-bits)

Henk



--
Sent from: http://octave.1599824.n4.nabble.com/Octave-General-f1599825.html

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Speed of odsread

PhilipNienhuis
hjborsje wrote
> I was using [2], with similar results as you do.  I read somewhere that
> starting with Octave 4.0 all Java was native in Octave.  I guess that line
> could use some clarification, I'm no Java user. After installing the two
> Java jars I also get the faster times.

Some of that info is available in "help odsread"  (but it could be
improved).


hjborsje wrote

> PhilipNienhuis wrote
>> PhilipNienhuis wrote
>>> (Hey I did send a reply though Nabble bit it didn't arrive. Attempt #2)
>>>
>>>
>>> hjborsje wrote
>>>> I'm comparing the speed of loading a LibreOffice (*.ods) spreadsheet in
>>>> Octave and MATLAB.  The results are so vastly different that I wonder
>>>> if
>>>> I'm
>>>> doing it correctly.  A spreadsheet with 1900 rows, 23 columns takes 135
>>>> seconds in Octave. In Matlab 5.5 seconds.   A larger spreadsheet of
>>>> 60,000
>>>> x
>>>> 23 takes 10.5 seconds in Matlab. Octave did not finish after 10
>>>> minutes.  
>>>> I
>>>> run Octave 4.2.1 with all the latest packages on Windows 10.  I enter:
>>>> [s,~,~,~] = odsload ('MySpreadsheet.ods',1);  Adding a third and fourth
>>>> parameters makes no significant difference.
>>>
>>> What spreadsheet I/O "interface" did you use? and windows 32 or 64 bit?
>>>
>>> The "native" spreadsheet I/O interface ("OCT") for .ods is know to be
>>> pathetic as regards speed. In addition .ods is an almost equally
>>> pathetically complex file format.
>>>
>>> The fastest interface for .ods we've got on offer is jOpendocument, but
>>> it
>>> is Java-based.
>>> Until Octave has better functions for morphing heterogeneous Java data
>>> structures into Octave cell arrays (maybe, sometime) we cannot beat
>>> matlab
>>> with its Excel backend.
>>
>> FYI, just some benchmarks with the file you uploaded:
>>
>> tic; [tic; [~, ~, raw] = odsread ("RData_01b.ods", 1, "",
>>
> <intf>
>> ); toc
>>
>> gives when specifying the following interfaces for
>>
> <intf>
>> :
>>
>> "otk": ~18 s.
>> "jod": ~10 s.
>> "oct": ~96 s.
>> "uno": ~35 s.     ## Invokes LibreOffice
>> ... and ...
>> tic; [~, ~, raw] = xlsread ("RData_01b.ods", 1, "", "com"); toc   ##
>> Invokes
>> Excel
>> ==> about 6 s.
> <snip>
> My Octave installation must not be complete, I get 'Unsupported' for "otk"
> and "uno".  Also "com" fails.
> I do have LibreOffice 5.4 and Excel installed on my system and MATLAB
> reads
> xls (as well as .ods) without a hitch.
> Any ideas what I missed during the Octave installation?  (Win10 home
> 64-bits)

I have no reason to doubt that your Octave installation is as complete as
you can get it. But Octave doesn't ship external dependencies like the java
.jar files required for spreadsheet I/O. Basic file I/O (using the -for
.ods- slow OCT interface) is included.
The io package offers a function "chk_spreadsheet_support" that can set up a
lot of stuff automatically.
In fact it is silently called the moment the io package is loaded.
Make sure the relevant Java .jar files are in a subdir "java" of your home
dir.

LibreOffice is more awkward since on Linux its location isn't standardized
and the javaclasspath needs the location of some subdirs as well, not just
.jar files.
Yet no matter what operating system,

chk_spreadsheet_support ("/full/path/to/java/jars", 3,
"/full/path/to/Libreoffice/installation")

would set the javaclasspath up for you, depending on available dependencies,
and echo a lot of debug info.
Note: when loading the io package several locations are automatically
tested, the first argument would be ~/java (on Linux) or %HOMEPATH% on
Windows. Better use forward slashes as path separators.

To invoke COM you'd need to first load the OF windows package (only on
Windows).

BTW have you tried how fast Matlab can read those files if Excel isn't
installed? I'm a bit curious.

BTW a somewhat involved explanation of spreadsheet I/O is on the Octave
wiki,
http://wiki.octave.org/IO_package

Philip




--
Sent from: http://octave.1599824.n4.nabble.com/Octave-General-f1599825.html

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave