I'm comparing the speed of loading a LibreOffice (*.ods) spreadsheet in
Octave and MATLAB. The results are so vastly different that I wonder if I'm doing it correctly. A spreadsheet with 1900 rows, 23 columns takes 135 seconds in Octave. In Matlab 5.5 seconds. A larger spreadsheet of 60,000 x 23 takes 10.5 seconds in Matlab. Octave did not finish after 10 minutes. I run Octave 4.2.1 with all the latest packages on Windows 10. I enter: [s,~,~,~] = odsload ('MySpreadsheet.ods',1); Adding a third and fourth parameters makes no significant difference. RData_01b.ods <http://octave.1599824.n4.nabble.com/file/t372538/RData_01b.ods> -- Sent from: http://octave.1599824.n4.nabble.com/Octave-General-f1599825.html _______________________________________________ Help-octave mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-octave |
hjborsje wrote
> I'm comparing the speed of loading a LibreOffice (*.ods) spreadsheet in > Octave and MATLAB. The results are so vastly different that I wonder if > I'm > doing it correctly. A spreadsheet with 1900 rows, 23 columns takes 135 > seconds in Octave. In Matlab 5.5 seconds. A larger spreadsheet of 60,000 > x > 23 takes 10.5 seconds in Matlab. Octave did not finish after 10 minutes. > I > run Octave 4.2.1 with all the latest packages on Windows 10. I enter: > [s,~,~,~] = odsload ('MySpreadsheet.ods',1); Adding a third and fourth > parameters makes no significant difference. RData_01b.ods > <http://octave.1599824.n4.nabble.com/file/t372538/RData_01b.ods> Are you using this code [1] for your benchmark? Java is not as native to Octave as it is to Matlab (the whole GUI is Java), so any attempt to use Java is overhead. I tried the odsread function from the io package [2]. The results are far better than yours but still have factor 2 with Matlab. To use the io package, just type: >> pkg install -forge io >> pkg load io >> javaaddpath ("jOpenDocument-1.3.jar") # Available from [3] >> javaaddpath ("xerces-2_11_0/xercesImpl.jar") # Available from [4] >> N = 10; t = 0; for i = 1:N, tic; X = odsread ('RData_01b.ods'); t = t + >> toc; end, fprintf('csvread: avg. %f seconds\n', t/N) csvread: avg. 10.013385 seconds As documented in [2], it is a bit more time saving, when having multiple reads to open the document once and only invoke the reading, but not that much better: >> ods = odsopen ('RData_01b.ods'); N = 10; t = 0; for i = 1:N, tic; X = >> ods2oct (ods); t = t + toc; end, fprintf('csvread: avg. %f seconds\n', >> t/N), ods = odsclose (ods); csvread: avg. 7.611418 seconds Anyway, when dealing with a huge amount of data, I think that an Excel- or ODS-spreadsheet is like running a marathon in knight's armor, it is an unnecessary overhead. Simply save your data from your application of choice as comma separated values (CSV) and see the magic that works even far beyond 60,000 data rows: >> N = 10; t = 0; for i = 1:N, tic; X = csvread ('RData_01b.csv'); t = t + >> toc; end, fprintf('csvread: avg. %f seconds\n', t/N) Matlab R2017a: csvread: avg. 0.014881 seconds Octave 4.2.1: csvread: avg. 0.020552 seconds Kai. [1]: https://www.mathworks.com/matlabcentral/fileexchange/28411-read-and-write-open-document-format--odf--spreadsheet---ods- [2]: https://octave.sourceforge.io/io/function/odsread.html [3]: http://www.jopendocument.org/downloads.html [4]: https://xerces.apache.org/mirrors.cgi#binary RData_01b.csv <http://octave.1599824.n4.nabble.com/file/t370282/RData_01b.csv> -- Sent from: http://octave.1599824.n4.nabble.com/Octave-General-f1599825.html _______________________________________________ Help-octave mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-octave |
In reply to this post by hjborsje
hjborsje wrote
> I'm comparing the speed of loading a LibreOffice (*.ods) spreadsheet in > Octave and MATLAB. The results are so vastly different that I wonder if > I'm > doing it correctly. A spreadsheet with 1900 rows, 23 columns takes 135 > seconds in Octave. In Matlab 5.5 seconds. A larger spreadsheet of 60,000 > x > 23 takes 10.5 seconds in Matlab. Octave did not finish after 10 minutes. > I > run Octave 4.2.1 with all the latest packages on Windows 10. I enter: > [s,~,~,~] = odsload ('MySpreadsheet.ods',1); Adding a third and fourth > parameters makes no significant difference. RData_01b.ods > <http://octave.1599824.n4.nabble.com/file/t372538/RData_01b.ods> I assume you are using this code [1]? Java is not as native to Octave as it is to Matlab (the whole GUI is Java), so any attempt to use Java is overhead. I tried the odsread function from the io package [2]. But the results are not promising as well. To use, just type: >> pkg install -forge io >> pkg load io >> javaaddpath ("jOpenDocument-1.3.jar") # Available from [3] >> javaaddpath ("xerces-2_11_0/xercesImpl.jar") # Available from [4] >> N = 10; t = 0; for i = 1:N, tic; X = odsread ('RData_01b.ods'); t = t + >> toc; end, fprintf('csvread: avg. %f seconds\n', t/N) csvread: avg. 10.013385 seconds It is a bit time saving, when having multiple reads to open the document once and only invoke the reading, but not that much better: >> ods = odsopen ('RData_01b.ods'); N = 10; t = 0; for i = 1:N, tic; X = >> ods2oct (ods); t = t + toc; end, fprintf('csvread: avg. %f seconds\n', >> t/N), ods = odsclose (ods); csvread: avg. 7.611418 seconds Anyway, when dealing with a huge amount of data, I think that an Excel- or ODS-spreadsheet is like running a marathon in knight's armor, it is an unnecessary overhead. Simply save your data from your application of choice as comma separated values (CSV) and see the magic that works even far beyond 60,000 data rows: >> N = 10; t = 0; for i = 1:N, tic; X = csvread ('RData_01b.csv'); t = t + >> toc; end, fprintf('csvread: avg. %f seconds\n', t/N) Matlab R2017a: csvread: avg. 0.014881 seconds Octave 4.2.1: csvread: avg. 0.020552 seconds Kai. [1]: https://www.mathworks.com/matlabcentral/fileexchange/28411-read-and-write-open-document-format--odf--spreadsheet---ods- [2]: https://octave.sourceforge.io/io/function/odsread.html [3]: http://www.jopendocument.org/downloads.html [4]: https://xerces.apache.org/mirrors.cgi#binary RData_01b.csv <http://octave.1599824.n4.nabble.com/file/t370282/RData_01b.csv> -- Sent from: http://octave.1599824.n4.nabble.com/Octave-General-f1599825.html _______________________________________________ Help-octave mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-octave |
In reply to this post by hjborsje
(Hey I did send a reply though Nabble bit it didn't arrive. Attempt #2)
hjborsje wrote > I'm comparing the speed of loading a LibreOffice (*.ods) spreadsheet in > Octave and MATLAB. The results are so vastly different that I wonder if > I'm > doing it correctly. A spreadsheet with 1900 rows, 23 columns takes 135 > seconds in Octave. In Matlab 5.5 seconds. A larger spreadsheet of 60,000 > x > 23 takes 10.5 seconds in Matlab. Octave did not finish after 10 minutes. > I > run Octave 4.2.1 with all the latest packages on Windows 10. I enter: > [s,~,~,~] = odsload ('MySpreadsheet.ods',1); Adding a third and fourth > parameters makes no significant difference. What spreadsheet I/O "interface" did you use? and windows 32 or 64 bit? The "native" spreadsheet I/O interface ("OCT") for .ods is know to be pathetic as regards speed. In addition .ods is an almost equally pathetically complex file format. The fastest interface for .ods we've got on offer is jOpendocument, but it is Java-based. Until Octave has better functions for morphing heterogeneous Java data structures into Octave cell arrays (maybe, sometime) we cannot beat matlab with its Excel backend. Philip -- Sent from: http://octave.1599824.n4.nabble.com/Octave-General-f1599825.html _______________________________________________ Help-octave mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-octave |
PhilipNienhuis wrote
> (Hey I did send a reply though Nabble bit it didn't arrive. Attempt #2) > > > hjborsje wrote >> I'm comparing the speed of loading a LibreOffice (*.ods) spreadsheet in >> Octave and MATLAB. The results are so vastly different that I wonder if >> I'm >> doing it correctly. A spreadsheet with 1900 rows, 23 columns takes 135 >> seconds in Octave. In Matlab 5.5 seconds. A larger spreadsheet of >> 60,000 >> x >> 23 takes 10.5 seconds in Matlab. Octave did not finish after 10 minutes. >> I >> run Octave 4.2.1 with all the latest packages on Windows 10. I enter: >> [s,~,~,~] = odsload ('MySpreadsheet.ods',1); Adding a third and fourth >> parameters makes no significant difference. > > What spreadsheet I/O "interface" did you use? and windows 32 or 64 bit? > > The "native" spreadsheet I/O interface ("OCT") for .ods is know to be > pathetic as regards speed. In addition .ods is an almost equally > pathetically complex file format. > > The fastest interface for .ods we've got on offer is jOpendocument, but it > is Java-based. > Until Octave has better functions for morphing heterogeneous Java data > structures into Octave cell arrays (maybe, sometime) we cannot beat matlab > with its Excel backend. FYI, just some benchmarks with the file you uploaded: tic; [tic; [~, ~, raw] = odsread ("RData_01b.ods", 1, "", <intf>); toc gives when specifying the following interfaces for <intf>: "otk": ~18 s. "jod": ~10 s. "oct": ~96 s. "uno": ~35 s. ## Invokes LibreOffice ... and ... tic; [~, ~, raw] = xlsread ("RData_01b.ods", 1, "", "com"); toc ## Invokes Excel ==> about 6 s. Philip -- Sent from: http://octave.1599824.n4.nabble.com/Octave-General-f1599825.html _______________________________________________ Help-octave mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-octave |
In reply to this post by siko1056
I assume you are using this code [1]? Java is not as native to Octave as it is to Matlab (the whole GUI is Java), so any attempt to use Java is overhead. I tried the odsread function from the io package [2]. But the results are not promising as well. To use, just type: >> pkg install -forge io >> pkg load io >> javaaddpath ("jOpenDocument-1.3.jar") # Available from [3] >> javaaddpath ("xerces-2_11_0/xercesImpl.jar") # Available from [4] >> N = 10; t = 0; for i = 1:N, tic; X = odsread ('RData_01b.ods'); t = t + >> toc; end, fprintf('csvread: avg. %f seconds\n', t/N) csvread: avg. 10.013385 seconds It is a bit time saving, when having multiple reads to open the document once and only invoke the reading, but not that much better: >> ods = odsopen ('RData_01b.ods'); N = 10; t = 0; for i = 1:N, tic; X = >> ods2oct (ods); t = t + toc; end, fprintf('csvread: avg. %f seconds\n', >> t/N), ods = odsclose (ods); csvread: avg. 7.611418 seconds Anyway, when dealing with a huge amount of data, I think that an Excel- or ODS-spreadsheet is like running a marathon in knight's armor, it is an unnecessary overhead. Simply save your data from your application of choice as comma separated values (CSV) and see the magic that works even far beyond 60,000 data rows: >> N = 10; t = 0; for i = 1:N, tic; X = csvread ('RData_01b.csv'); t = t + >> toc; end, fprintf('csvread: avg. %f seconds\n', t/N) Matlab R2017a: csvread: avg. 0.014881 seconds Octave 4.2.1: csvread: avg. 0.020552 seconds Kai. [1]: https://www.mathworks.com/matlabcentral/fileexchange/28411-read-and-write-open-document-format--odf--spreadsheet---ods- [2]: https://octave.sourceforge.io/io/function/odsread.html [3]: http://www.jopendocument.org/downloads.html [4]: https://xerces.apache.org/mirrors.cgi#binary I was using [2], with similar results as you do. I read somewhere that starting with Octave 4.0 all Java was native in Octave. I guess that line could use some clarification, I'm no Java user. After installing the two Java jars I also get the faster times. csv is clearly the way to go, but sometimes it's easier to pick a few numbers from specific cells in a spreadsheet. Henk -- Sent from: http://octave.1599824.n4.nabble.com/Octave-General-f1599825.html _______________________________________________ Help-octave mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-octave |
In reply to this post by PhilipNienhuis
PhilipNienhuis wrote
> PhilipNienhuis wrote >> (Hey I did send a reply though Nabble bit it didn't arrive. Attempt #2) >> >> >> hjborsje wrote >>> I'm comparing the speed of loading a LibreOffice (*.ods) spreadsheet in >>> Octave and MATLAB. The results are so vastly different that I wonder if >>> I'm >>> doing it correctly. A spreadsheet with 1900 rows, 23 columns takes 135 >>> seconds in Octave. In Matlab 5.5 seconds. A larger spreadsheet of >>> 60,000 >>> x >>> 23 takes 10.5 seconds in Matlab. Octave did not finish after 10 minutes. >>> I >>> run Octave 4.2.1 with all the latest packages on Windows 10. I enter: >>> [s,~,~,~] = odsload ('MySpreadsheet.ods',1); Adding a third and fourth >>> parameters makes no significant difference. >> >> What spreadsheet I/O "interface" did you use? and windows 32 or 64 bit? >> >> The "native" spreadsheet I/O interface ("OCT") for .ods is know to be >> pathetic as regards speed. In addition .ods is an almost equally >> pathetically complex file format. >> >> The fastest interface for .ods we've got on offer is jOpendocument, but >> it >> is Java-based. >> Until Octave has better functions for morphing heterogeneous Java data >> structures into Octave cell arrays (maybe, sometime) we cannot beat >> matlab >> with its Excel backend. > > FYI, just some benchmarks with the file you uploaded: > > tic; [tic; [~, ~, raw] = odsread ("RData_01b.ods", 1, "", > <intf> > ); toc > > gives when specifying the following interfaces for > <intf> > : > > "otk": ~18 s. > "jod": ~10 s. > "oct": ~96 s. > "uno": ~35 s. ## Invokes LibreOffice > ... and ... > tic; [~, ~, raw] = xlsread ("RData_01b.ods", 1, "", "com"); toc ## > Invokes > Excel > ==> about 6 s. > > Philip > > > > > -- > Sent from: > http://octave.1599824.n4.nabble.com/Octave-General-f1599825.html > > _______________________________________________ > Help-octave mailing list > Help-octave@ > https://lists.gnu.org/mailman/listinfo/help-octave My Octave installation must not be complete, I get 'Unsupported' for "otk" and "uno". Also "com" fails. I do have LibreOffice 5.4 and Excel installed on my system and MATLAB reads xls (as well as .ods) without a hitch. Any ideas what I missed during the Octave installation? (Win10 home 64-bits) Henk -- Sent from: http://octave.1599824.n4.nabble.com/Octave-General-f1599825.html _______________________________________________ Help-octave mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-octave |
hjborsje wrote
> I was using [2], with similar results as you do. I read somewhere that > starting with Octave 4.0 all Java was native in Octave. I guess that line > could use some clarification, I'm no Java user. After installing the two > Java jars I also get the faster times. Some of that info is available in "help odsread" (but it could be improved). hjborsje wrote > PhilipNienhuis wrote >> PhilipNienhuis wrote >>> (Hey I did send a reply though Nabble bit it didn't arrive. Attempt #2) >>> >>> >>> hjborsje wrote >>>> I'm comparing the speed of loading a LibreOffice (*.ods) spreadsheet in >>>> Octave and MATLAB. The results are so vastly different that I wonder >>>> if >>>> I'm >>>> doing it correctly. A spreadsheet with 1900 rows, 23 columns takes 135 >>>> seconds in Octave. In Matlab 5.5 seconds. A larger spreadsheet of >>>> 60,000 >>>> x >>>> 23 takes 10.5 seconds in Matlab. Octave did not finish after 10 >>>> minutes. >>>> I >>>> run Octave 4.2.1 with all the latest packages on Windows 10. I enter: >>>> [s,~,~,~] = odsload ('MySpreadsheet.ods',1); Adding a third and fourth >>>> parameters makes no significant difference. >>> >>> What spreadsheet I/O "interface" did you use? and windows 32 or 64 bit? >>> >>> The "native" spreadsheet I/O interface ("OCT") for .ods is know to be >>> pathetic as regards speed. In addition .ods is an almost equally >>> pathetically complex file format. >>> >>> The fastest interface for .ods we've got on offer is jOpendocument, but >>> it >>> is Java-based. >>> Until Octave has better functions for morphing heterogeneous Java data >>> structures into Octave cell arrays (maybe, sometime) we cannot beat >>> matlab >>> with its Excel backend. >> >> FYI, just some benchmarks with the file you uploaded: >> >> tic; [tic; [~, ~, raw] = odsread ("RData_01b.ods", 1, "", >> > <intf> >> ); toc >> >> gives when specifying the following interfaces for >> > <intf> >> : >> >> "otk": ~18 s. >> "jod": ~10 s. >> "oct": ~96 s. >> "uno": ~35 s. ## Invokes LibreOffice >> ... and ... >> tic; [~, ~, raw] = xlsread ("RData_01b.ods", 1, "", "com"); toc ## >> Invokes >> Excel >> ==> about 6 s. > <snip> > My Octave installation must not be complete, I get 'Unsupported' for "otk" > and "uno". Also "com" fails. > I do have LibreOffice 5.4 and Excel installed on my system and MATLAB > reads > xls (as well as .ods) without a hitch. > Any ideas what I missed during the Octave installation? (Win10 home > 64-bits) I have no reason to doubt that your Octave installation is as complete as you can get it. But Octave doesn't ship external dependencies like the java .jar files required for spreadsheet I/O. Basic file I/O (using the -for .ods- slow OCT interface) is included. The io package offers a function "chk_spreadsheet_support" that can set up a lot of stuff automatically. In fact it is silently called the moment the io package is loaded. Make sure the relevant Java .jar files are in a subdir "java" of your home dir. LibreOffice is more awkward since on Linux its location isn't standardized and the javaclasspath needs the location of some subdirs as well, not just .jar files. Yet no matter what operating system, chk_spreadsheet_support ("/full/path/to/java/jars", 3, "/full/path/to/Libreoffice/installation") would set the javaclasspath up for you, depending on available dependencies, and echo a lot of debug info. Note: when loading the io package several locations are automatically tested, the first argument would be ~/java (on Linux) or %HOMEPATH% on Windows. Better use forward slashes as path separators. To invoke COM you'd need to first load the OF windows package (only on Windows). BTW have you tried how fast Matlab can read those files if Excel isn't installed? I'm a bit curious. BTW a somewhat involved explanation of spreadsheet I/O is on the Octave wiki, http://wiki.octave.org/IO_package Philip -- Sent from: http://octave.1599824.n4.nabble.com/Octave-General-f1599825.html _______________________________________________ Help-octave mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-octave |
Free forum by Nabble | Edit this page |