Writing 'help' functions as m-files

classic Classic list List threaded Threaded
45 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Writing 'help' functions as m-files

Søren Hauberg
Yet Another mail from the "Brain Farting Department"... (YABF?)
  A couple of days ago I started wondering why the help system is
written in C++ rather then as a set of m-files. Since the help system is
only used interactively (i.e. not part of any "real" calculations), I
doubt that speed is the issue.
  I would think that if the following function were available it would
be fairly simple to write the help system as m-files:
 
"get_help_text": given a function name, return the corresponding (raw)
help text as a string.

This function would have to be written in C++ since it has to deal with
built-in functions and oct-files. But couldn't everything be written as
m-files if this function was available? And couldn't this simplify the
implementation a bit?

Let me just tell why I started to think about this. To build the
Octave-Forge web pages we have a bunch of scripts written in various
languages. One of the many things these scripts do is to extract help
texts from each function, such that we can generate a corresponding web
page. The fact that we have our own implementation of this extraction is
IMHO a bad thing, since it duplicates the implementation in the help
system. Things would be more simple if we had the above-mentioned
"get_help_string" function in Octave. Also, doesn't the 'munge-texi'
program in 'doc/interpreter' reimplement this functionality as well?

Søren


Reply | Threaded
Open this post in threaded view
|

Re: Writing 'help' functions as m-files

David Bateman-3
Søren Hauberg wrote:
> Yet Another mail from the "Brain Farting Department"... (YABF?)
>   A couple of days ago I started wondering why the help system is
> written in C++ rather then as a set of m-files. Since the help system is
> only used interactively (i.e. not part of any "real" calculations), I
> doubt that speed is the issue.
>  
To support mex-files Octave can have the help text in an m-file and the
function implementation in C or C++ already..

>   I would think that if the following function were available it would
> be fairly simple to write the help system as m-files:
>  
> "get_help_text": given a function name, return the corresponding (raw)
> help text as a string.
>  
This function already exists in Octave (cf src/help.cc) and I wrote the
C++ DLD_DEFUN to get the help text a few years back. See

http://sourceforge.net/mailarchive/message.php?msg_id=20040603104208.GC30468%40zfr01-2089.crm.mot.com

However it doesn't maintain the texinfo, but uses the parsed stream, so
this might not be the best... Otherwise the help code was reorganised
when the lookfor function was added and so if the function

std::string raw_help (const std::string& nm, bool &symbol_found);

in help.cc was also declared in help.h, you might use it to extract the
raw help string.. In any case as this function is not static you might
include the above declaration and just use in the an oct-file without
any modification to Octave (except on windows as it lacks the OCTAVE_API
macro and so isn't exported).

> Let me just tell why I started to think about this. To build the
> Octave-Forge web pages we have a bunch of scripts written in various
> languages. One of the many things these scripts do is to extract help
> texts from each function, such that we can generate a corresponding web
> page. The fact that we have our own implementation of this extraction is
> IMHO a bad thing, since it duplicates the implementation in the help
> system. Things would be more simple if we had the above-mentioned
> "get_help_string" function in Octave. Also, doesn't the 'munge-texi'
> program in 'doc/interpreter' reimplement this functionality as well?
>  
Parsing the DOCSTRING files will be faster than getting Octave to give
us the help text, so I'm not sure this is a bad thing.. The build of
octave-forge wbe-pages takes a significant amount of time and anything
that slows it up will be a bad thing..

D.




> Søren
>
>
>
>  


--
David Bateman                                [hidden email]
Motorola Labs - Paris                        +33 1 69 35 48 04 (Ph)
Parc Les Algorithmes, Commune de St Aubin    +33 6 72 01 06 33 (Mob)
91193 Gif-Sur-Yvette FRANCE                  +33 1 69 35 77 01 (Fax)

The information contained in this communication has been classified as:

[x] General Business Information
[ ] Motorola Internal Use Only
[ ] Motorola Confidential Proprietary

Reply | Threaded
Open this post in threaded view
|

Re: Writing 'help' functions as m-files

John W. Eaton-6
I have no objection to writing the help functions as .m files.

I'd like to consider the following change as well.

The DEFUN macro in Octave is patterned after the one used in Emacs.
Originally, Emacs also had the doc string as an argument to the
macro.  The Emacs build procss extracted the doc strings and placed
them in a separate file that was indexed for quick lookup.  This also
avoided storing relatively large amounts of infrequently used data in
the Emacs binary itself (I know it may be hard to believe that there
was a concern for space efficiency, in Emacs, but there it is!).  More
recently, the Emacs sources have been changed to look like this:

  DEFUN ("setq", Fsetq, Ssetq, 0, UNEVALLED, 0,
         doc: /* Set each SYM to the value of its VAL.
  The symbols SYM are variables; they are literal (not evaluated).
  The values VAL are expressions; they are evaluated.
  Thus, (setq x (1+ y)) sets `x' to the value of `(1+ y)'.
  The second VAL is not computed until after the first SYM is set, and so on;
  each VAL can use the new value of variables set earlier in the `setq'.
  The return value of the `setq' form is the value of the last VAL.
  usage: (setq [SYM VAL]...)  */)
       (args)
       Lisp_Object args;
  {

so that the doc string is just a comment.  It is still extracted by the
build process and stored in a separate file that is indexed for quick
lookup.  Having the doc string as a comment has several advantages. It
would no longer be necessary to insert \n\ or other
quoted characters.  Since there would be no C character escapes, we
wouldn't need to use the mkgendoc script to generate the C++ program
that we compile to strip them out.  And we would also avoid any
character string length limits imposed by a C++ compiler (though that
may be rare these days).  The only disadvantage I can see (other than
the work needed for the implementation) is that inserting "*/" in a
doc string would require special treatment, but I don't think we need
that.

Should we make a change Like this?

Should the doc strings for the built-in functions be placed in a
single indexed file?

Should the doc strings for functions defined in .oct files be
installed in individual .m files alongside the .oct files?

Additionally, the list of items for 3.1 includes "handle block
comments" (for the scripting language).  Given that, we could also
write doc strings in .m files using

#{
-*- texinfo -*-
@deftypefn {Function File} {@var{c} =} convn (@var{a}, @var{b}, @var{shape})
@math{N}-dimensional convolution of matrices @var{a} and @var{b}.

The size of the output is determined by the @var{shape} argument.
This can be any of the following character strings:

@table @asis
@item "full"
The full convolution result is returned. The size out of the output is
@code{size (@var{a}) + size (@var{b})-1}. This is the default behaviour.
@item "same"
The central part of the convolution result is returned. The size out of the
output is the same as @var{a}.
@item "valid"
The valid part of the convolution is returned. The size of the result is
@code{max (size (@var{a}) - size (@var{b})+1, 0)}.
@end table

@seealso{conv, conv2}
@end deftypefn
#}

and avoid the need for prefixing every line with "##".  Would that be
helpful when writing the doc strings?  I suppose that with a decent
editor it is easy enough to strip the comment characters out, edit the
doc string, and then replace the comment characters, but using block
comments would still simplify the process.

jwe
Reply | Threaded
Open this post in threaded view
|

Re: Writing 'help' functions as m-files

David Bateman-3
John W. Eaton wrote:
> The only disadvantage I can see (other than
> the work needed for the implementation) is that inserting "*/" in a
> doc string would require special treatment, but I don't think we need
> that.
>
> Should we make a change Like this?
>  
This imposes a lot of work adapting the existing code and might make
oct-files that aren't part of Octave incompatible, so even if this
change is made we should support the old way of doing it as well.
Therefore I rather think this is not really the way to go.. Help from an
m-file if the help string in the oct-file is empty makes sense though
(does this work already?)

> Additionally, the list of items for 3.1 includes "handle block
> comments" (for the scripting language).  Given that, we could also
> write doc strings in .m files using
>
> #{
> -*- texinfo -*-
> @deftypefn {Function File} {@var{c} =} convn (@var{a}, @var{b}, @var{shape})
> @math{N}-dimensional convolution of matrices @var{a} and @var{b}.
>
> The size of the output is determined by the @var{shape} argument.
> This can be any of the following character strings:
>
> @table @asis
> @item "full"
> The full convolution result is returned. The size out of the output is
> @code{size (@var{a}) + size (@var{b})-1}. This is the default behaviour.
> @item "same"
> The central part of the convolution result is returned. The size out of the
> output is the same as @var{a}.
> @item "valid"
> The valid part of the convolution is returned. The size of the result is
> @code{max (size (@var{a}) - size (@var{b})+1, 0)}.
> @end table
>
> @seealso{conv, conv2}
> @end deftypefn
> #}
>
> and avoid the need for prefixing every line with "##".  Would that be
> helpful when writing the doc strings?  I suppose that with a decent
> editor it is easy enough to strip the comment characters out, edit the
> doc string, and then replace the comment characters, but using block
> comments would still simplify the process.
>  
A comment in a file is a comment and so I'd have thought that support
for block comments would make such a feature automatically available...
Should this syntax be encouraged is something else..

Regards
D.

--
David Bateman                                [hidden email]
Motorola Labs - Paris                        +33 1 69 35 48 04 (Ph)
Parc Les Algorithmes, Commune de St Aubin    +33 6 72 01 06 33 (Mob)
91193 Gif-Sur-Yvette FRANCE                  +33 1 69 35 77 01 (Fax)

The information contained in this communication has been classified as:

[x] General Business Information
[ ] Motorola Internal Use Only
[ ] Motorola Confidential Proprietary

Reply | Threaded
Open this post in threaded view
|

Re: Writing 'help' functions as m-files

John W. Eaton-6
On 27-Mar-2008, David Bateman wrote:

| This imposes a lot of work adapting the existing code and might make
| oct-files that aren't part of Octave incompatible, so even if this
| change is made we should support the old way of doing it as well.

Yes, we would still handle doc strings defined as they are now.  But
allowing them to be written as comments would be nice.  For doc
strings defined this way in .oct files we would need a way to generate files
files, we would need a way to gene

| Therefore I rather think this is not really the way to go.. Help from an
| m-file if the help string in the oct-file is empty makes sense though
| (does this work already?)

Yes.  If the doc string for a .oct file is empty, Octave will look for
a .m file of the same name.  The way it works now is possibly
confusing (but apparently it is compatible behavior).  If foo.oct with
no doc string appears somewhere in the path before foo.m with a doc
string, Octave will display the documentation from the .m file even
though it will find and execute foo.oct.  It might be better for it to
simply find the function it would execute, then if no doc string is
defined for it, only look for the corresponding .m file rather than
searching the path again.  It's hard to imagine that being
incompatible in this way would be a bad thing...

jwe
Reply | Threaded
Open this post in threaded view
|

Re: Writing 'help' functions as m-files

David Bateman-3
John W. Eaton wrote:

> Yes.  If the doc string for a .oct file is empty, Octave will look for
> a .m file of the same name.  The way it works now is possibly
> confusing (but apparently it is compatible behavior).  If foo.oct with
> no doc string appears somewhere in the path before foo.m with a doc
> string, Octave will display the documentation from the .m file even
> though it will find and execute foo.oct.  It might be better for it to
> simply find the function it would execute, then if no doc string is
> defined for it, only look for the corresponding .m file rather than
> searching the path again.  It's hard to imagine that being
> incompatible in this way would be a bad thing...
>
>  

As Matlab doesn't support the Octave API and mex-files can't have
help-string, in what way would this be incompatible?

D.



--
David Bateman                                [hidden email]
Motorola Labs - Paris                        +33 1 69 35 48 04 (Ph)
Parc Les Algorithmes, Commune de St Aubin    +33 6 72 01 06 33 (Mob)
91193 Gif-Sur-Yvette FRANCE                  +33 1 69 35 77 01 (Fax)

The information contained in this communication has been classified as:

[x] General Business Information
[ ] Motorola Internal Use Only
[ ] Motorola Confidential Proprietary

Reply | Threaded
Open this post in threaded view
|

Re: Writing 'help' functions as m-files

John W. Eaton-6
In reply to this post by John W. Eaton-6
On 27-Mar-2008, John W. Eaton wrote:

| On 27-Mar-2008, David Bateman wrote:
|
| | This imposes a lot of work adapting the existing code and might make
| | oct-files that aren't part of Octave incompatible, so even if this
| | change is made we should support the old way of doing it as well.
|
| Yes, we would still handle doc strings defined as they are now.  But
| allowing them to be written as comments would be nice.  For doc
| strings defined this way in .oct files we would need a way to generate files
| files, we would need a way to gene

Sorry, I sent this before I finished editing it.  I mean to say

For strings defined this way in .oct files we would need a way to
generate files files, we would need a way to generate external doc
files.  Probably the simplest thing would be to generate .m files with
the same base name, or perhaps to use a new extension for files that
contain doc strings.  I supposed we could modify the mkoctfile script
to do the job.

But yes, you are right that this is a fair amount of work.  Is it
worth it to avoid having to deal with the escape sequences when
writing doc strings in C++ files?

As for the size savings, the doc strings currently add a total of
approximately 337kb of text to the Octave executable and core .oct
files (this number comes from looking at the size of the generated
src/DOCSTRINGS file that is used for creating the Octave manual .texi
files).  All together, the Octave libraries and .oct files are
approximately 25MB on my system (and that's the stripped size).
Someone will have to put a lot more detail into the doc strings before
the data size savings would really be significant.

jwe
Reply | Threaded
Open this post in threaded view
|

Re: Writing 'help' functions as m-files

John W. Eaton-6
In reply to this post by David Bateman-3
On 27-Mar-2008, David Bateman wrote:

| As Matlab doesn't support the Octave API and mex-files can't have
| help-string, in what way would this be incompatible?

It works the same way for mex files.  So if foo.mex is in dir1 and
foo.m with a doc string is in dir2, with dir2 appearing after dir1 in
the path, Matlab finds the doc string in dir2/foo.m even though it
will call dir1/foo.mex.  This could lead to confusion as the two files
might be completely separate.  At least if they appear side-by-side in
the same directory, it would seem more likely that the documentation
corresponds to the mex file.

jwe
Reply | Threaded
Open this post in threaded view
|

Re: Writing 'help' functions as m-files

Søren Hauberg
In reply to this post by John W. Eaton-6
tor, 27 03 2008 kl. 11:00 -0400, skrev John W. Eaton:
> I have no objection to writing the help functions as .m files.
Okay, I'm attaching some code that illustrates how this could be done.
Not everything is working, but it should be functional enough to decide
if this approach should be used.
  I like this approach for two reasons:

1) It gives me access to the function 'get_help_text' which I have
needed in the past.
2) C++ code is moved to Octave code. This should make it easier for more
people to hack on the help system. As an example, today matlab allows
html formated help texts. If these ever catch on, I think it would be
simpler to support this (the attached code provides a *very* simple
implementation).

On the downside: the current code works well. And if it ain't broken,
don't fix it. Also, an Octave implementation will be slower, but I don't
think that matters.

Søren

get_help_text.cc (2K) Download Attachment
makeinfo.m (1K) Download Attachment
myhelp.m (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Writing 'help' functions as m-files

John W. Eaton
Administrator
On 27-Mar-2008, Søren Hauberg wrote:

| tor, 27 03 2008 kl. 11:00 -0400, skrev John W. Eaton:
| > I have no objection to writing the help functions as .m files.
| Okay, I'm attaching some code that illustrates how this could be done.
| Not everything is working, but it should be functional enough to decide

What's the current status of this patch?  Did you post a Mercurial
changeset?  I can't seem to find one.

jwe

Reply | Threaded
Open this post in threaded view
|

Re: Writing 'help' functions as m-files

John W. Eaton
Administrator
On 22-Jan-2009, John W. Eaton wrote:

| On 27-Mar-2008, Søren Hauberg wrote:
|
| | tor, 27 03 2008 kl. 11:00 -0400, skrev John W. Eaton:
| | > I have no objection to writing the help functions as .m files.
| | Okay, I'm attaching some code that illustrates how this could be done.
| | Not everything is working, but it should be functional enough to decide
|
| What's the current status of this patch?  Did you post a Mercurial
| changeset?  I can't seem to find one.

Never mind.  I found a patch you posted on October 31.  Is that the
latest version?  Have you made any changes since then?

Thanks,

jwe

Reply | Threaded
Open this post in threaded view
|

Re: Writing 'help' functions as m-files

Søren Hauberg
tor, 22 01 2009 kl. 13:56 -0500, skrev John W. Eaton:

> On 22-Jan-2009, John W. Eaton wrote:
>
> | On 27-Mar-2008, Søren Hauberg wrote:
> |
> | | tor, 27 03 2008 kl. 11:00 -0400, skrev John W. Eaton:
> | | > I have no objection to writing the help functions as .m files.
> | | Okay, I'm attaching some code that illustrates how this could be done.
> | | Not everything is working, but it should be functional enough to decide
> |
> | What's the current status of this patch?  Did you post a Mercurial
> | changeset?  I can't seem to find one.
>
> Never mind.  I found a patch you posted on October 31.  Is that the
> latest version?  Have you made any changes since then?

I haven't worked on this since then as I was unsure if the change (which
is fairly large) would be accepted at all.

Søren

Reply | Threaded
Open this post in threaded view
|

Re: Writing 'help' functions as m-files

John W. Eaton
Administrator
On 22-Jan-2009, Søren Hauberg wrote:

| I haven't worked on this since then as I was unsure if the change (which
| is fairly large) would be accepted at all.

OK, sorry for the long delay.  Made a few additional changes:

  move the which command to an m-file
  eliminate the private directory for now
  display location of file in help message as before
  append additional help info to print_usage message
  misc style changes
  more complete changelog entry

and checked it in.  The changeset is here:

  http://hg.savannah.gnu.org/hgweb/octave/rev/f134925a1cfa

One thing I noticed is that lookfor prints syntax error message if
there are syntax errors in functions in the load-path.  I think we
need to find some way to avoid doing that.  Can it be done with a
try/catch block or do we need something else?

Is there more to do?  I remember some discussion about cache files and
needing to generate those when Octave is installed.  Am I remembering
correctly?  Can you do that, or explain what needs to be done?

Thanks,

jwe


Reply | Threaded
Open this post in threaded view
|

Re: Writing 'help' functions as m-files

Søren Hauberg
tor, 22 01 2009 kl. 18:30 -0500, skrev John W. Eaton:
> One thing I noticed is that lookfor prints syntax error message if
> there are syntax errors in functions in the load-path.  I think we
> need to find some way to avoid doing that.  Can it be done with a
> try/catch block or do we need something else?

Could you provide an example?

> Is there more to do?  I remember some discussion about cache files and
> needing to generate those when Octave is installed.  Am I remembering
> correctly?  Can you do that, or explain what needs to be done?

Basically, we need to call 'gen_doc_cache ()' at some point. This
function traverses the 'path ()' and generates a cache in each
directory. A cache is just a simple '.mat' file that contains the help
texts of the files in the directory. I am not sure when it is best to do
this

  * when Octave is built, or
  * when Octave is installed.

I'm guessing both will work. I'm guessing that the easiest solution is
to run 'octave --eval gen_doc_cache ()' at the end of the installation.

Søren
>
> Thanks,
>
> jwe
>

Reply | Threaded
Open this post in threaded view
|

sed (was: Re: Writing 'help' functions as m-files)

Søren Hauberg
In reply to this post by John W. Eaton
tor, 22 01 2009 kl. 18:30 -0500, skrev John W. Eaton:

> On 22-Jan-2009, Søren Hauberg wrote:
>
> | I haven't worked on this since then as I was unsure if the change (which
> | is fairly large) would be accepted at all.
>
> OK, sorry for the long delay.  Made a few additional changes:
>
>   move the which command to an m-file
>   eliminate the private directory for now
>   display location of file in help message as before
>   append additional help info to print_usage message
>   misc style changes
>   more complete changelog entry
>
> and checked it in.

One thing I've forgotten to mention is that with this change the help
code no longer requires 'sed' to function. I remember that the 'sed'
binary gave Windows users some problems as it triggered some virus
warnings. Anyway, this binary no longer needs to be distributed as part
of the Windows build (assuming it was only used by the help system).

Søren

Reply | Threaded
Open this post in threaded view
|

Problem with test_perfer.m

bpabbott
Administrator
In reply to this post by Søren Hauberg
Søren,

I encountered a test failure related to the "help functions as m-files".

octave:25> test test_prefer.m
   ***** test
  ped = print_empty_dimensions ();
  print_empty_dimensions (0);
  a = cell (1, 1);
  b = type -q a;
  assert(!isempty(findstr(b,"[]")));
  assert(isempty(findstr(b,"[](0x0)")));
  print_empty_dimensions (ped);
!!!!! test failed
type: `a' undefinedoctave:26>
octave:26>
octave:26>
octave:26> a = cell(1,1)
a =

{
   [1,1] = []
}

octave:27> b = type -q a
error: type: `a' undefined

Does this test pass for you?

Ben
Reply | Threaded
Open this post in threaded view
|

Re: Problem with test_perfer.m

Jaroslav Hajek-2
On Fri, Jan 23, 2009 at 1:32 PM, Ben Abbott <[hidden email]> wrote:

> Søren,
>
> I encountered a test failure related to the "help functions as m-files".
>
> octave:25> test test_prefer.m
>  ***** test
>  ped = print_empty_dimensions ();
>  print_empty_dimensions (0);
>  a = cell (1, 1);
>  b = type -q a;
>  assert(!isempty(findstr(b,"[]")));
>  assert(isempty(findstr(b,"[](0x0)")));
>  print_empty_dimensions (ped);
> !!!!! test failed
> type: `a' undefinedoctave:26>
> octave:26>
> octave:26>
> octave:26> a = cell(1,1)
> a =
>
> {
>  [1,1] = []
> }
>
> octave:27> b = type -q a
> error: type: `a' undefined
>
> Does this test pass for you?
>
> Ben
>

 It doesn't for me (though I get a different failure - a parse error).

regards

--
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz

Reply | Threaded
Open this post in threaded view
|

Re: Problem with test_perfer.m

Jaroslav Hajek-2
In reply to this post by bpabbott
On Fri, Jan 23, 2009 at 1:32 PM, Ben Abbott <[hidden email]> wrote:

> Søren,
>
> I encountered a test failure related to the "help functions as m-files".
>
> octave:25> test test_prefer.m
>  ***** test
>  ped = print_empty_dimensions ();
>  print_empty_dimensions (0);
>  a = cell (1, 1);
>  b = type -q a;
>  assert(!isempty(findstr(b,"[]")));
>  assert(isempty(findstr(b,"[](0x0)")));
>  print_empty_dimensions (ped);
> !!!!! test failed
> type: `a' undefinedoctave:26>
> octave:26>
> octave:26>
> octave:26> a = cell(1,1)
> a =
>
> {
>  [1,1] = []
> }
>
> octave:27> b = type -q a
> error: type: `a' undefined
>
> Does this test pass for you?
>
> Ben
>

 It doesn't for me (though I get a different failure - a parse error).

regards

--
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz

Reply | Threaded
Open this post in threaded view
|

Re: Problem with test_perfer.m

Søren Hauberg
In reply to this post by bpabbott
fre, 23 01 2009 kl. 07:32 -0500, skrev Ben Abbott:
> Does this test pass for you?

No it does not -- this is a bug. I'll send a changeset once I've made a
build of latest upstream sources.

Søren

Reply | Threaded
Open this post in threaded view
|

Re: Problem with test_perfer.m

Søren Hauberg
fre, 23 01 2009 kl. 14:07 +0100, skrev Søren Hauberg:
> fre, 23 01 2009 kl. 07:32 -0500, skrev Ben Abbott:
> > Does this test pass for you?
>
> No it does not -- this is a bug. I'll send a changeset once I've made a
> build of latest upstream sources.

Okay, attached is a changeset that fixes the problem. The change is
fairly large, but it mostly consist of moving the content of the
'do_type' function into the body of the 'type' function. This was needed
for 'evalin ("caller", ...)' to work.

However, the test still fails. The problem is that the old
implementation of 'type' behaved IMHO a bit weird when you asked for an
output argument. If you did

  text = type ('fun1', 'fun2');

you would get the code of both functions in one string. To me it makes
more sense to put the code of these two functions in a cell array, which
is what the current code does. This is, however, not what the test
expects.

So, should we revert to the old behaviour or should we adapt the test?

Søren

P.S. It should be noted that Matlab does not support this extension, so
from a compatibility point of view both options are fine.

type.changeset (4K) Download Attachment
123