Spreadsheet translation

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Spreadsheet translation

Windhorn, Allen E [ACIM/LSA/MKT]
I have a very large, messy, and poorly commented spreadsheet, and I want
to translate it to Octave.  Any chance there's an automated way to do that?
Or even a way to copy all the formulae into a list?

Thanks for any advice.

Regards,
Allen
--
Allen Windhorn P.E. (Mn), CEng| Senior Principal Engineer
Leroy-Somer Americas | Kato Engineering, Inc.
2075 Howard Dr. West | North Mankato, MN 56003 | USA
T +1 507-345-2782 | F +1 507-345-2798
[hidden email] | [hidden email]



Reply | Threaded
Open this post in threaded view
|

Re: Spreadsheet translation

Doug Stewart-4


On Wed, Jan 29, 2020 at 9:18 AM Windhorn, Allen E [ACIM/LSA/MKT] <[hidden email]> wrote:
I have a very large, messy, and poorly commented spreadsheet, and I want
to translate it to Octave.  Any chance there's an automated way to do that?
Or even a way to copy all the formulae into a list?

Thanks for any advice.

Regards,
Allen
--
Allen Windhorn P.E. (Mn), CEng| Senior Principal Engineer
Leroy-Somer Americas | Kato Engineering, Inc.
2075 Howard Dr. West | North Mankato, MN 56003 | USA
T +1 507-345-2782 | F +1 507-345-2798
[hidden email] | [hidden email]



Does this help?


Reply | Threaded
Open this post in threaded view
|

RE: Spreadsheet translation

Windhorn, Allen E [ACIM/LSA/MKT]
Doug,

From: Doug Stewart <[hidden email]>

On Wed, Jan 29, 2020 at 9:18 AM [hidden email] wrote:
>> I have a very large, messy, and poorly commented spreadsheet, and I want
>> to translate it to Octave.  Any chance there's an automated way to do that?
>> Or even a way to copy all the formulae into a list?

> Does this help?
> https://wiki.octave.org/IO_package#Spreadsheet_formula_support

Yes, sort of (it means I can extract the formulae, but I still need to work).
Now I just need to (a) make sure all the formulae are correct Octave, and
(b) sort them so that all dependencies are satisfied.  Any hints on (b)?  I'm
not even sure that has to be possible.

I wrote the following code and appended it to the code from the wiki,
where txtarr1 has all the text cells in it:

clear txtarr2;
count = 1;
% Now extract all the formulae
stxt = size(txtarr1)
alph = ['A':'Z'];  % Index of the alphabet
for ii = 1:stxt(1)     % Loop over rows
  for jj = 1:stxt(2)   % Loop over columns (maybe a way to do without loops?)
    if length(txtarr1{ii,jj})>0    % If there is text, and it starts with "=",
      if substr(txtarr1{ii,jj}, 1, 1)=='=' % ... it's a formula
        cola = '';     % Column letter(s)
        ltr2 = alph(mod(jj,26)+1);   % Second letter is...
        if (jj>26) % Is there a first letter?
          cola = alph(fix(jj/26));     % If so, prepend it
        endif
        cola = [cola ltr2];    % Build one- or two-letter combo for column
        txtarr2{count++} = sprintf('%s%d %s', cola, ii, txtarr1{ii,jj});
        % append row #, then prepend to equation
      endif
    endif
  endfor
endfor

Any suggestions for improvement are welcome.  This works pretty well, but
many of the resultant formula cells have stuff like the following:

'Y24 =<v>0.23319999999999999</v></c><c r="Z24" s="6"><v>3.1E-2</v></c><c r="AA24" s="7"><v>0.13100000000000001</v>
</c><c r="AB24" s="6"><v>0</v></c><c r="AC24" s="6"><f t="shared" si="1"/><v>0.16200000000000001</v></c><c r="AM24" s="
6"><f>AM22'

where what shows up in the spreadsheet formula is '=W24+X24' and the
result is shown as 0.233.  What's up with that?  The original txtarr1 cell has
the same thing, so that's how Octave received it from the parsecell
function.

BTW, there are 3800 formulae in the spreadsheet, so converting them by
hand is not a pleasant prospect.

Regards,
Allen


Reply | Threaded
Open this post in threaded view
|

RE: Spreadsheet translation

PhilipNienhuis
Windhorn, Allen E [ACIM/LSA/MKT] wrote
> Doug,
>
> From: Doug Stewart &lt;

> doug.dastew@

> &gt;
>
> On Wed, Jan 29, 2020 at 9:18 AM

> Allen.Windhorn@.nidec

>  wrote:
>>> I have a very large, messy, and poorly commented spreadsheet, and I want
>>> to translate it to Octave.  Any chance there's an automated way to do
>>> that?
>>> Or even a way to copy all the formulae into a list?
>
>> Does this help?
>> https://wiki.octave.org/IO_package#Spreadsheet_formula_support
>
> Yes, sort of (it means I can extract the formulae, but I still need to
> work).
> Now I just need to (a) make sure all the formulae are correct Octave, and
> (b) sort them so that all dependencies are satisfied.  Any hints on (b)?
> I'm
> not even sure that has to be possible.
>
> I wrote the following code and appended it to the code from the wiki,
> where txtarr1 has all the text cells in it:
>
> clear txtarr2;
> count = 1;
> % Now extract all the formulae
> stxt = size(txtarr1)
> alph = ['A':'Z'];  % Index of the alphabet
> for ii = 1:stxt(1)     % Loop over rows
>   for jj = 1:stxt(2)   % Loop over columns (maybe a way to do without
> loops?)
>     if length(txtarr1{ii,jj})>0    % If there is text, and it starts with
> "=",
>       if substr(txtarr1{ii,jj}, 1, 1)=='=' % ... it's a formula
>         cola = '';     % Column letter(s)
>         ltr2 = alph(mod(jj,26)+1);   % Second letter is...
>         if (jj>26) % Is there a first letter?
>           cola = alph(fix(jj/26));     % If so, prepend it
>         endif
>         cola = [cola ltr2];    % Build one- or two-letter combo for column
>         txtarr2{count++} = sprintf('%s%d %s', cola, ii, txtarr1{ii,jj});
>         % append row #, then prepend to equation
>       endif
>     endif
>   endfor
> endfor
>
> Any suggestions for improvement are welcome.  This works pretty well, but
> many of the resultant formula cells have stuff like the following:
>
> 'Y24 =
> <v>
> 0.23319999999999999
> </v>
> </c>
> <c r="Z24" s="6">
> <v>
> 3.1E-2
> </v>
> </c>
> <c r="AA24" s="7">
> <v>
> 0.13100000000000001
> </v>
> </c>
> <c r="AB24" s="6">
> <v>
> 0
> </v>
> </c>
> <c r="AC24" s="6">
> <f t="shared" si="1"/>
> <v>
> 0.16200000000000001
> </v>
> </c>
> <c r="AM24" s="
> 6">
> <f>
> AM22'

Did you read the spreadsheet with option 'struct ("formulas_as_text", 1)' ?

If yes, please try another spreadsheet I/O "interface". It looks to me like
you have no additional SW installed so you fall back to the OCT interface.
That hasn't received much testing with formula output, so I'm not very
surprised that large parts of the spreadsheet cell XML are left in place.

Another thing I see is that there's a reference to a shared formula (i.e.,
the formula is not contained in an individual cell but rather in the
SharedStrings XML file in the .xlsx archive and the spreadsheet cell merely
holds a reference to that plus basic cell addresses).
That's a known bug for the OCT interface
(https://savannah.gnu.org/bugs/?52875) and a challenging one at that; its
priority has been lowered for a reason, and you're only the second person to
date to report a problem with it. Looking at the cell contents excerpt you
copied over I suppose it contains the contents of several spreadsheet cells
(each cell is what is between <c> .... </c> tags).

Chances are that with additional spreadsheet I/O SW like Apache POI, or, if
you are on Windows, just the OF windows package and MS-Excel, you'd able to
extract the formulae in a format more suitable for what you want to achieve.

Philip




--
Sent from: https://octave.1599824.n4.nabble.com/Octave-General-f1599825.html


Reply | Threaded
Open this post in threaded view
|

RE: Spreadsheet translation

Windhorn, Allen E [ACIM/LSA/MKT]
Philip,

> -----Original Message-----
> From: Help-octave On Behalf Of  PhilipNienhuis
>
> Did you read the spreadsheet with option 'struct ("formulas_as_text", 1)' ?

Yes.

> If yes, please try another spreadsheet I/O "interface". It looks to me like
> you have no additional SW installed so you fall back to the OCT interface.
> That hasn't received much testing with formula output, so I'm not very
> surprised that large parts of the spreadsheet cell XML are left in place.

Yes, I believe I am using the OCT interface, it was supposed to be best for
Windows.  The file has .xlsm ending but I don't actually know if there are
any macros in it.

> ...Chances are that with additional spreadsheet I/O SW like Apache POI, or,
> if you are on Windows, just the OF windows package and MS-Excel, you'd
> able to extract the formulae in a format more suitable for what you want to
> achieve.

I tried the COM interface:

options.formulas_as_text = 1;
xlh = xlsopen('GENDH_Dissection01.xlsm', 0, 'COM'); % Open read only

Octave complains that Java is installed but unusable.  Java 32 bit JVM is
installed at C:\Program Files (x86)\Java\jre7\bin\client.  I don't have
admin rights, so can't install any other Java (anyway, if we change the Java
installation it does horrible things to Oracle).  I am on Octave 5.1.0 64 bit
"x86_64-w64-mingw32" on Windows 10.

... OK, it complained, but when I looked, it still produced the correct
formulas, so that's sorted for the time being... I would like to know how
to make Java work though, in case I need it for something else.

I don't know what the "OF windows package" is -- is it different from COM?
And how would I find out more about the other Java packages (assuming I
could get any to work with our Java installation)?

Thanks again.

Regards,
Allen

Reply | Threaded
Open this post in threaded view
|

Re: Spreadsheet translation

PhilipNienhuis
OK Allen,

Windhorn, Allen E [ACIM/LSA/MKT] wrote:

> Philip,
>
>> -----Original Message-----
>> From: Help-octave On Behalf Of  PhilipNienhuis
>>
>> Did you read the spreadsheet with option 'struct ("formulas_as_text", 1)' ?
>
> Yes.
>
>> If yes, please try another spreadsheet I/O "interface". It looks to me like
>> you have no additional SW installed so you fall back to the OCT interface.
>> That hasn't received much testing with formula output, so I'm not very
>> surprised that large parts of the spreadsheet cell XML are left in place.
>
> Yes, I believe I am using the OCT interface, it was supposed to be best for
> Windows.

? who says so?
OCT is good enough, but Excel itself ("COM" interface) is by far the
fastest. But in case of COM errors you're left with zombie Excel
invocations that may keep a hold on you spreadsheet file (kill them off
in Task Manager), in case of OCT errors you may merely have leaked file
handles.

>> ...Chances are that with additional spreadsheet I/O SW like Apache POI, or,
>> if you are on Windows, just the OF windows package and MS-Excel, you'd
>> able to extract the formulae in a format more suitable for what you want to
>> achieve.
>
> I tried the COM interface:
>
> options.formulas_as_text = 1;

... then you need to convey "options" as an argument to xls2oct. But
apparently you did.

> xlh = xlsopen('GENDH_Dissection01.xlsm', 0, 'COM'); % Open read only
> Octave complains that Java is installed but unusable

Yeah that's a known one. Do you have 64-bit Octave installed (I guess
so)? Octave & Java bit widths should match.

> ... OK, it complained, but when I looked, it still produced the correct
> formulas, so that's sorted for the time being... I would like to know how
> to make Java work though, in case I need it for something else.
>
> I don't know what the "OF windows package" is -- is it different from COM?

It's the same (Octave-forge windows package supplies ActiveX / COM
functionality).

> And how would I find out more about the other Java packages (assuming I
> could get any to work with our Java installation)?

E.g., on the wiki (https://wiki.octave.org/IO_package).

There's a READ-XLS.htm file in the doc/ subdirectory of the io package
but I think the wiki is a bit clearer.
If you have the required Java .jar files together in some subdirectory,
you can do:

chk_spreadsheet_support ('/full/path/to/that/subdirectory')
   ## followed by
javaclasspath
   ## to see which Java class libs are loaded

and it that path happens to be in your %USERPROFILE% (e.g.,
C:\user\<your_login_name>\Java)
that will even happen automagically when loading the io package.

Philip


Reply | Threaded
Open this post in threaded view
|

RE: Spreadsheet translation

Windhorn, Allen E [ACIM/LSA/MKT]
Philip et al,

> -----Original Message-----
> From: Philip Nienhuis <[hidden email]>

>> Yes, I believe I am using the OCT interface, it was supposed to be best for
>> Windows.

> ? who says so?

The Wiki: "OCT offers read support for OOXML files (.xlsx) only, but it is by far
the fastest read option; faster than Excel itself."  and  "This is the new
impressive and fast ... interface which presently supports .xlsx, .ods and
.gnumeric files."

I see now that it doesn't list the OCT interface as being able to read the
formulas -- it did read about 90% of them though, and probably would have
done better on a cleaner spreadsheet.

> > Octave complains that Java is installed but unusable
>
> Yeah that's a known one. Do you have 64-bit Octave installed (I guess
> so)? Octave & Java bit widths should match.

Yes, 64-bit Octave, and 32-bit Java, so presumably incompatible, though
the Wiki implies it doesn't matter: "Generally, if you use a Java-based
interface for spreadsheet I/O, it doesn't matter much whether you use
Octave 32-bit or Octave 64-bit."   But I don't think I can install 64-bit Java
without admin rights.

> > And how would I find out more about the other Java packages (assuming I
> > could get any to work with our Java installation)?
>
> E.g., on the wiki (https://wiki.octave.org/IO_package).

It doesn't say where to obtain this code, or how to install it.  I will try Google
again.  Thanks for your help -- I will post code somewhere if I get it working.

Regards,
Allen

Reply | Threaded
Open this post in threaded view
|

Re: Spreadsheet translation

PhilipNienhuis
Windhorn, Allen E [ACIM/LSA/MKT] wrote:

> Philip et al,
>
>> -----Original Message-----
>> From: Philip Nienhuis <[hidden email]>
>
>>> Yes, I believe I am using the OCT interface, it was supposed to be best for
>>> Windows.
>
>> ? who says so?
>
> The Wiki: "OCT offers read support for OOXML files (.xlsx) only, but it is by far
> the fastest read option; faster than Excel itself."  and  "This is the new
> impressive and fast ... interface which presently supports .xlsx, .ods and
> .gnumeric files."

OK, so that needs an update.
I think I remember where that came from, but those times are long gone :-)

> I see now that it doesn't list the OCT interface as being able to read the
> formulas -- it did read about 90% of them though, and probably would have
> done better on a cleaner spreadsheet.

Well, OCT can (I wrote the code for it) but .xlsx is a very very
complicated spreadsheet format (basically it's a compressed nested
directory tree full of XML files), and we read it using regular
expressions while XML had better be read using XML parsers and such.
In fact, all the Java based spreadsheet I/O classes are based on XML
parsers/validators etc. But regexps are much faster than XML (although I
often think OTOH they're much more fragile).
(If it may comfort some, .ods is just a tiny little bit less complicated.)

If you read up on the bug report in my earlier post you'll see that the
existence of shared formulas was a bit of a surprise for us.
I'm sure OCT can be made to read (and write ...) the shared formulas as
well, but someone will have to adapt and extend the code for it, and as
I wrote earlier you're just the 2nd one in several years hitting this so
there was little demand, and there are several alternatives in the form
of other spreadsheet I/O interfaces.

>>> Octave complains that Java is installed but unusable
>>
>> Yeah that's a known one. Do you have 64-bit Octave installed (I guess
>> so)? Octave & Java bit widths should match.
>
> Yes, 64-bit Octave, and 32-bit Java, so presumably incompatible, though
> the Wiki implies it doesn't matter: "Generally, if you use a Java-based
> interface for spreadsheet I/O, it doesn't matter much whether you use
> Octave 32-bit or Octave 64-bit."

That's plain wrong and also in need of an update.
I'll try to update the wiki one of these days.

Thanks for your investigations (I don't quite visit the wiki on a daily
basis) :-)

Philip


Reply | Threaded
Open this post in threaded view
|

Re: Spreadsheet translation

mmuetzel
Am 31. Januar 2020 um 10:31 Uhr schrieb "Philip Nienhuis":
> Well, OCT can (I wrote the code for it) but .xlsx is a very very
> complicated spreadsheet format (basically it's a compressed nested
> directory tree full of XML files), and we read it using regular
> expressions while XML had better be read using XML parsers and such.
> In fact, all the Java based spreadsheet I/O classes are based on XML
> parsers/validators etc. But regexps are much faster than XML (although I
> often think OTOH they're much more fragile).

This legendary stackoverflow answer comes to mind ;-)
https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

Markus


Reply | Threaded
Open this post in threaded view
|

Re: Spreadsheet translation

PhilipNienhuis
"Markus Mützel" wrote:

> Am 31. Januar 2020 um 10:31 Uhr schrieb "Philip Nienhuis":
>> Well, OCT can (I wrote the code for it) but .xlsx is a very very
>> complicated spreadsheet format (basically it's a compressed nested
>> directory tree full of XML files), and we read it using regular
>> expressions while XML had better be read using XML parsers and such.
>> In fact, all the Java based spreadsheet I/O classes are based on XML
>> parsers/validators etc. But regexps are much faster than XML (although I
>> often think OTOH they're much more fragile).
>
> This legendary stackoverflow answer comes to mind ;-)
> https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454
>
> Markus

Thanks Markus, I'll add that link to the wiki.
An amusing read - further down the arguments pro and con from the strict
vs. loose/practical developer types makes for more interesting reading
moments :-)

As to our issue, it's especially the order of XML tags in the cell nodes
that makes our regexps fragile. Luckily, both Excel and LibreOffice
write those tags in similar order (~alphabetical, AFAICT). But maybe
some Java based spreadsheet I/O class or embedded SW or so will do
otherwise, and then we might have a bit of trouble. I'm unsure if there
are rules for the order of appearance of tags in XML.
OTOH reading and writing just simple data, which Octave does, isn't so
terribly complicated and demanding.

Philip



Reply | Threaded
Open this post in threaded view
|

RE: Spreadsheet translation

Windhorn, Allen E [ACIM/LSA/MKT]
In reply to this post by PhilipNienhuis
Philip et al,

> -----Original Message-----
> From: Philip Nienhuis <[hidden email]>
> >>
> >> Did you read the spreadsheet with option 'struct ("formulas_as_text", 1)'
> >
> >> If yes, please try another spreadsheet I/O "interface". It looks to me like
> >> you have no additional SW installed so you fall back to the OCT interface.
> >> That hasn't received much testing with formula output, so I'm not very
> >> surprised that large parts of the spreadsheet cell XML are left in place.
> >
> > xlh = xlsopen('GENDH_Dissection01.xlsm', 0, 'COM'); % Open read only
> > Octave ... complained, but when I looked, it still produced the correct
> > formulas...

Yesterday, that is.  Today it says "COM is not supported".  @#$%^&*()!!!
Fortunately, I saved the result to a text file.

Regards,
Allen

Reply | Threaded
Open this post in threaded view
|

RE: Spreadsheet translation

PhilipNienhuis
Windhorn, Allen E [ACIM/LSA/MKT] wrote
> Philip et al,
>
>> -----Original Message-----
>> From: Philip Nienhuis &lt;

> pr.nienhuis@

> &gt;
>> >>
>> >> Did you read the spreadsheet with option 'struct ("formulas_as_text",
>> 1)'
>> >
>> >> If yes, please try another spreadsheet I/O "interface". It looks to me
>> like
>> >> you have no additional SW installed so you fall back to the OCT
>> interface.
>> >> That hasn't received much testing with formula output, so I'm not very
>> >> surprised that large parts of the spreadsheet cell XML are left in
>> place.
>> >
>> > xlh = xlsopen('GENDH_Dissection01.xlsm', 0, 'COM'); % Open read only
>> > Octave ... complained, but when I looked, it still produced the correct
>> > formulas...
>
> Yesterday, that is.  Today it says "COM is not supported".  @#$%^&*()!!!

Did you load the widows package?

pkg load windows

P.



--
Sent from: https://octave.1599824.n4.nabble.com/Octave-General-f1599825.html


Reply | Threaded
Open this post in threaded view
|

RE: Spreadsheet translation

Windhorn, Allen E [ACIM/LSA/MKT]
Philip,

> -----Original Message-----
> From: Help-octave <help-octave- bounces On Behalf Of PhilipNienhuis
>
> Did you load the widows package?

Thank you.  Going to find someplace quiet for a while...

Allen