Writing 'help' functions as m-files

classic Classic list List threaded Threaded
45 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: Problem with test_perfer.m

bpabbott
Administrator
On Friday, January 23, 2009, at 12:18PM, "Søren Hauberg" <[hidden email]> wrote:

>fre, 23 01 2009 kl. 14:07 +0100, skrev Søren Hauberg:
>> fre, 23 01 2009 kl. 07:32 -0500, skrev Ben Abbott:
>> > Does this test pass for you?
>>
>> No it does not -- this is a bug. I'll send a changeset once I've made a
>> build of latest upstream sources.
>
>Okay, attached is a changeset that fixes the problem. The change is
>fairly large, but it mostly consist of moving the content of the
>'do_type' function into the body of the 'type' function. This was needed
>for 'evalin ("caller", ...)' to work.
>
>However, the test still fails. The problem is that the old
>implementation of 'type' behaved IMHO a bit weird when you asked for an
>output argument. If you did
>
>  text = type ('fun1', 'fun2');
>
>you would get the code of both functions in one string. To me it makes
>more sense to put the code of these two functions in a cell array, which
>is what the current code does.

fwiw, I prefer what the current code does.

Ben

Reply | Threaded
Open this post in threaded view
|

Re: Problem with test_perfer.m

Søren Hauberg
fre, 23 01 2009 kl. 12:35 -0500, skrev Ben Abbott:
> fwiw, I prefer what the current code does.

I'm attaching a changeset that adapts the test to the change in API. If
the new API is accepted, then this changeset should be applied.

Søren

test_prefer.changeset (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Problem with test_perfer.m

John W. Eaton
Administrator
In reply to this post by Søren Hauberg
On 23-Jan-2009, Søren Hauberg wrote:

| fre, 23 01 2009 kl. 14:07 +0100, skrev Søren Hauberg:
| > fre, 23 01 2009 kl. 07:32 -0500, skrev Ben Abbott:
| > > Does this test pass for you?
| >
| > No it does not -- this is a bug. I'll send a changeset once I've made a
| > build of latest upstream sources.
|
| Okay, attached is a changeset that fixes the problem. The change is
| fairly large, but it mostly consist of moving the content of the
| 'do_type' function into the body of the 'type' function. This was needed
| for 'evalin ("caller", ...)' to work.

OK, I applied this changeset.

Thanks,

jwe

Reply | Threaded
Open this post in threaded view
|

Re: Problem with test_perfer.m

John W. Eaton
Administrator
In reply to this post by Søren Hauberg
On 23-Jan-2009, Søren Hauberg wrote:

| fre, 23 01 2009 kl. 12:35 -0500, skrev Ben Abbott:
| > fwiw, I prefer what the current code does.
|
| I'm attaching a changeset that adapts the test to the change in API. If
| the new API is accepted, then this changeset should be applied.

I don't have any strong feelings about it one way or the other.  I
applied the changeset.

Thanks,

jwe

Reply | Threaded
Open this post in threaded view
|

Re: Writing 'help' functions as m-files

John W. Eaton
Administrator
In reply to this post by Søren Hauberg
On 23-Jan-2009, Søren Hauberg wrote:

| tor, 22 01 2009 kl. 18:30 -0500, skrev John W. Eaton:
| > One thing I noticed is that lookfor prints syntax error message if
| > there are syntax errors in functions in the load-path.  I think we
| > need to find some way to avoid doing that.  Can it be done with a
| > try/catch block or do we need something else?
|
| Could you provide an example?
|
| > Is there more to do?  I remember some discussion about cache files and
| > needing to generate those when Octave is installed.  Am I remembering
| > correctly?  Can you do that, or explain what needs to be done?
|
| Basically, we need to call 'gen_doc_cache ()' at some point. This
| function traverses the 'path ()' and generates a cache in each
| directory. A cache is just a simple '.mat' file that contains the help
| texts of the files in the directory. I am not sure when it is best to do
| this
|
|   * when Octave is built, or
|   * when Octave is installed.
|
| I'm guessing both will work. I'm guessing that the easiest solution is
| to run 'octave --eval gen_doc_cache ()' at the end of the installation.

I think it should be done at build time so that running Octave in
place will work.  Then the generated files can be installed.

jwe

Reply | Threaded
Open this post in threaded view
|

sed (was: Re: Writing 'help' functions as m-files)

John W. Eaton
Administrator
In reply to this post by Søren Hauberg
On 23-Jan-2009, Søren Hauberg wrote:

| One thing I've forgotten to mention is that with this change the help
| code no longer requires 'sed' to function. I remember that the 'sed'
| binary gave Windows users some problems as it triggered some virus
| warnings. Anyway, this binary no longer needs to be distributed as part
| of the Windows build (assuming it was only used by the help system).

OK, that's a nice bonus side effect.

Thanks,

jwe

Reply | Threaded
Open this post in threaded view
|

Re: Writing 'help' functions as m-files

Søren Hauberg
In reply to this post by John W. Eaton
tir, 27 01 2009 kl. 11:48 -0500, skrev John W. Eaton:

> On 23-Jan-2009, Søren Hauberg wrote:
> | > Is there more to do?  I remember some discussion about cache files and
> | > needing to generate those when Octave is installed.  Am I remembering
> | > correctly?  Can you do that, or explain what needs to be done?
> |
> | Basically, we need to call 'gen_doc_cache ()' at some point. This
> | function traverses the 'path ()' and generates a cache in each
> | directory. A cache is just a simple '.mat' file that contains the help
> | texts of the files in the directory. I am not sure when it is best to do
> | this
> |
> |   * when Octave is built, or
> |   * when Octave is installed.
> |
> | I'm guessing both will work. I'm guessing that the easiest solution is
> | to run 'octave --eval gen_doc_cache ()' at the end of the installation.
>
> I think it should be done at build time so that running Octave in
> place will work.  Then the generated files can be installed.

First of all, sorry about the late reply...

So, I'd like to run 'gen_doc_cache' when Octave is built, which I guess
boils down to

  ./run-octave --eval gen_doc_cache

But I fear this won't work. The 'gen_doc_cache' function simply
traverses the current path and creates a cache for each directory in the
path. The problem is that the path seems to be different when running
'./run-octave', than when running an installed octave. Specifically, in
'./run-octave' we have

  /home/sh/Dokumenter/octave/octave-hg/upstream/src
  /home/sh/Dokumenter/octave/octave-hg/upstream/src/OPERATORS
  /home/sh/Dokumenter/octave/octave-hg/upstream/src/TEMPLATE-INST
  /home/sh/Dokumenter/octave/octave-hg/upstream/src/DLD-FUNCTIONS
  /home/sh/Dokumenter/octave/octave-hg/upstream/src/pic

in the path, whereas in an installed Octave we have

  /home/sh/Programmer//libexec/octave/3.1.51+/site/oct/i686-pc-linux-gnu
  /home/sh/Programmer//libexec/octave/site/oct/api-v33
+/i686-pc-linux-gnu
  /home/sh/Programmer//libexec/octave/site/oct/i686-pc-linux-gnu
  /home/sh/Programmer//share/octave/3.1.51+/site/m
  /home/sh/Programmer//share/octave/site/api-v33+/m

in the path. So, if I run 'gen_doc_cache' via './run-octave' the path
I'll be traversing won't correspond to the path that's available when
Octave is installed.

Any thoughts or comments?

Søren


Reply | Threaded
Open this post in threaded view
|

Re: Writing 'help' functions as m-files

John W. Eaton
Administrator
On  4-Feb-2009, Søren Hauberg wrote:

| tir, 27 01 2009 kl. 11:48 -0500, skrev John W. Eaton:
| > On 23-Jan-2009, Søren Hauberg wrote:
| > | > Is there more to do?  I remember some discussion about cache files and
| > | > needing to generate those when Octave is installed.  Am I remembering
| > | > correctly?  Can you do that, or explain what needs to be done?
| > |
| > | Basically, we need to call 'gen_doc_cache ()' at some point. This
| > | function traverses the 'path ()' and generates a cache in each
| > | directory. A cache is just a simple '.mat' file that contains the help
| > | texts of the files in the directory. I am not sure when it is best to do
| > | this
| > |
| > |   * when Octave is built, or
| > |   * when Octave is installed.
| > |
| > | I'm guessing both will work. I'm guessing that the easiest solution is
| > | to run 'octave --eval gen_doc_cache ()' at the end of the installation.
| >
| > I think it should be done at build time so that running Octave in
| > place will work.  Then the generated files can be installed.
|
| First of all, sorry about the late reply...
|
| So, I'd like to run 'gen_doc_cache' when Octave is built, which I guess
| boils down to
|
|   ./run-octave --eval gen_doc_cache
|
| But I fear this won't work. The 'gen_doc_cache' function simply
| traverses the current path and creates a cache for each directory in the
| path. The problem is that the path seems to be different when running
| './run-octave', than when running an installed octave. Specifically, in
| './run-octave' we have
|
|   /home/sh/Dokumenter/octave/octave-hg/upstream/src
|   /home/sh/Dokumenter/octave/octave-hg/upstream/src/OPERATORS
|   /home/sh/Dokumenter/octave/octave-hg/upstream/src/TEMPLATE-INST
|   /home/sh/Dokumenter/octave/octave-hg/upstream/src/DLD-FUNCTIONS
|   /home/sh/Dokumenter/octave/octave-hg/upstream/src/pic
|
| in the path, whereas in an installed Octave we have
|
|   /home/sh/Programmer//libexec/octave/3.1.51+/site/oct/i686-pc-linux-gnu
|   /home/sh/Programmer//libexec/octave/site/oct/api-v33
| +/i686-pc-linux-gnu
|   /home/sh/Programmer//libexec/octave/site/oct/i686-pc-linux-gnu
|   /home/sh/Programmer//share/octave/3.1.51+/site/m
|   /home/sh/Programmer//share/octave/site/api-v33+/m
|
| in the path. So, if I run 'gen_doc_cache' via './run-octave' the path
| I'll be traversing won't correspond to the path that's available when
| Octave is installed.
|
| Any thoughts or comments?

If the files that need to be generated for Octave to run in place are
different from the ones that are needed when Octave is installed, then
we will have to generate the files when building and then again when
installing.

jwe

Reply | Threaded
Open this post in threaded view
|

Re: Writing 'help' functions as m-files

John W. Eaton
Administrator
In reply to this post by Søren Hauberg
On  4-Feb-2009, Søren Hauberg wrote:

| tir, 27 01 2009 kl. 11:48 -0500, skrev John W. Eaton:
| > On 23-Jan-2009, Søren Hauberg wrote:
| > | > Is there more to do?  I remember some discussion about cache files and
| > | > needing to generate those when Octave is installed.  Am I remembering
| > | > correctly?  Can you do that, or explain what needs to be done?
| > |
| > | Basically, we need to call 'gen_doc_cache ()' at some point. This
| > | function traverses the 'path ()' and generates a cache in each
| > | directory. A cache is just a simple '.mat' file that contains the help
| > | texts of the files in the directory. I am not sure when it is best to do
| > | this
| > |
| > |   * when Octave is built, or
| > |   * when Octave is installed.
| > |
| > | I'm guessing both will work. I'm guessing that the easiest solution is
| > | to run 'octave --eval gen_doc_cache ()' at the end of the installation.
| >
| > I think it should be done at build time so that running Octave in
| > place will work.  Then the generated files can be installed.
|
| First of all, sorry about the late reply...
|
| So, I'd like to run 'gen_doc_cache' when Octave is built, which I guess
| boils down to
|
|   ./run-octave --eval gen_doc_cache
|
| But I fear this won't work. The 'gen_doc_cache' function simply
| traverses the current path and creates a cache for each directory in the
| path. The problem is that the path seems to be different when running
| './run-octave', than when running an installed octave. Specifically, in
| './run-octave' we have
|
|   /home/sh/Dokumenter/octave/octave-hg/upstream/src
|   /home/sh/Dokumenter/octave/octave-hg/upstream/src/OPERATORS
|   /home/sh/Dokumenter/octave/octave-hg/upstream/src/TEMPLATE-INST
|   /home/sh/Dokumenter/octave/octave-hg/upstream/src/DLD-FUNCTIONS
|   /home/sh/Dokumenter/octave/octave-hg/upstream/src/pic
|
| in the path, whereas in an installed Octave we have
|
|   /home/sh/Programmer//libexec/octave/3.1.51+/site/oct/i686-pc-linux-gnu
|   /home/sh/Programmer//libexec/octave/site/oct/api-v33
| +/i686-pc-linux-gnu
|   /home/sh/Programmer//libexec/octave/site/oct/i686-pc-linux-gnu
|   /home/sh/Programmer//share/octave/3.1.51+/site/m
|   /home/sh/Programmer//share/octave/site/api-v33+/m
|
| in the path. So, if I run 'gen_doc_cache' via './run-octave' the path
| I'll be traversing won't correspond to the path that's available when
| Octave is installed.
|
| Any thoughts or comments?

OK, I looked at the current gen_doc_cache and I have a few comments

  * I noticed that

      gen_doc_cache (".")

    doesn't work because

      idx = find (p == pathsep ());

    returns an empty matrix if P has only one element.

  * Since all the data are character strings, using a binary format
    doesn't save much space.  Compression would help, so maybe using
    -text -z as options for save would be better than just using
    binary.

  * I think I'd prefer a simple name like DOC.gz in each directory
    instead of help_cache.mat.  Also, generally I think we try to
    prefer using - instead of _ for file names in Octave, unless they
    are .m files, which have to have names that are also valid
    symbol names in the scripting language.

  * Running the funtion takes some time, so I think it would be best
    to run it at build time.  It looks like most of the files will not
    need to change when Octave is built, so it probably also makes
    sense to distribute these files with the tar.gz files.

  * What is the slow part?  Running makeinfo?  If so, then I don't see
    much we can do about that.  Or is it extracting the first sentence
    of the doc string?

I propose making gen_doc_cache take two arguments.  The first argument
names the output file.  The second names the directory to work on.  If
only one argument is given, generate the DOC.gz file for for keywords,
operators, etc.

By doing that, it will be easy to write a rule for Make that will
create the DOC file if it is out of date compared to any of the
corresponding source files.

Write Makefile rules to generate a DOC.gz file in each subdirectory of
scripts the src directory.

Add the DOC.gz files to the list of files that we distribute.

Fix the install rules to install them.

Since it takes some time to run, make the function verbose, so that
there is some indication of progress.

Does this sound OK?  If you agree, I can do most of this work.

jwe

Reply | Threaded
Open this post in threaded view
|

Re: Writing 'help' functions as m-files

Søren Hauberg
man, 09 02 2009 kl. 22:58 -0500, skrev John W. Eaton:
> OK, I looked at the current gen_doc_cache and I have a few comments

Thanks for taking the time to look into this. I apologize for not being
more active here.

>   * I noticed that
>
>       gen_doc_cache (".")
>
>     doesn't work because
>
>       idx = find (p == pathsep ());
>
>     returns an empty matrix if P has only one element.

See below.

>   * Since all the data are character strings, using a binary format
>     doesn't save much space.  Compression would help, so maybe using
>     -text -z as options for save would be better than just using
>     binary.

I've changed this. I didn't do this at first, since I wasn't sure what
would happen if Octave was built without support for compression. Will
the call to 'save' fail, or will it simply skip the compression?

>   * I think I'd prefer a simple name like DOC.gz in each directory
>     instead of help_cache.mat.  Also, generally I think we try to
>     prefer using - instead of _ for file names in Octave, unless they
>     are .m files, which have to have names that are also valid
>     symbol names in the scripting language.

OK.

>   * Running the funtion takes some time, so I think it would be best
>     to run it at build time.  It looks like most of the files will not
>     need to change when Octave is built, so it probably also makes
>     sense to distribute these files with the tar.gz files.

I agree. The only problem I see, is what if we distribute compressed
caches, and the user doesn't link to to 'zlib' (or whatever we use).

>   * What is the slow part?  Running makeinfo?  If so, then I don't see
>     much we can do about that.  Or is it extracting the first sentence
>     of the doc string?

I haven't done any profiling, but I'm guessing it's the many calls to
'makeinfo'. The algorithm for generating caches are in many ways similar
to the 'lookfor' code. When I moved 'lookfor' from C++ to an m-file, I
didn't really see any significant speed changes, so from that I
concluded (a bit hasty) that the slow part was 'makeinfo'.

> I propose making gen_doc_cache take two arguments.  The first argument
> names the output file.  The second names the directory to work on.  If
> only one argument is given, generate the DOC.gz file for for keywords,
> operators, etc.

The attached changeset does this. The change also fixes the bug you
mentioned when doing 'gen_doc_cache (".")'.

[snip]
> Does this sound OK?  If you agree, I can do most of this work.

I think your suggestion makes sense.

One thing to consider is where to put the cache for builtin stuff
(operators, keywords, ...). With the attached patch this is stored in
the file

  fullfile (octave_config_info.datadir, "DOC-builtin.gz");

(see 'lookfor.m' at line 58). Is that the right position/name?

Søren

gen_doc_cache.changeset (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Writing 'help' functions as m-files

John W. Eaton
Administrator
On 10-Feb-2009, Søren Hauberg wrote:

| man, 09 02 2009 kl. 22:58 -0500, skrev John W. Eaton:
|
| >   * Since all the data are character strings, using a binary format
| >     doesn't save much space.  Compression would help, so maybe using
| >     -text -z as options for save would be better than just using
| >     binary.
|
| I've changed this. I didn't do this at first, since I wasn't sure what
| would happen if Octave was built without support for compression. Will
| the call to 'save' fail, or will it simply skip the compression?

I expect it fails.

| >   * Running the funtion takes some time, so I think it would be best
| >     to run it at build time.  It looks like most of the files will not
| >     need to change when Octave is built, so it probably also makes
| >     sense to distribute these files with the tar.gz files.
|
| I agree. The only problem I see, is what if we distribute compressed
| caches, and the user doesn't link to to 'zlib' (or whatever we use).

Yes, that is a problem, but it doesn't seem critical to me.  If you
want the full functionality, install the prerequisites.

| >   * What is the slow part?  Running makeinfo?  If so, then I don't see
| >     much we can do about that.  Or is it extracting the first sentence
| >     of the doc string?
|
| I haven't done any profiling, but I'm guessing it's the many calls to
| 'makeinfo'. The algorithm for generating caches are in many ways similar
| to the 'lookfor' code. When I moved 'lookfor' from C++ to an m-file, I
| didn't really see any significant speed changes, so from that I
| concluded (a bit hasty) that the slow part was 'makeinfo'.

I suspect there is also some time spent parsing the files to extract
the docstrings.  Instead of just looking for the docstrings, your
function is parsing them.

Also, we are currently finding the functions in the path, and doing

  if (!dir_in_path)
    addpath (directory);
  endif

  ...

  if (!dir_in_path)
    rmpath (directory);
  endif

which has some problems.  First, there's no unwind_protect here, so
you can end up modifying the global path if this function is
interrupted.  Second, the current directory is still searched before
the added path, so you could end up witn the wrong docstring in the
DOC.gz file.  I think it would be better to temporarily cd to the
directory, do the work, then cd back (using unwind_protect to ensure
the change is temporary).

It occurred to me that in the process of building Octave, we already
generate DOCSTRING files that contain the raw Texinfo docstrings.  I
think we should be taking advantage of that.  The new function could
still be useful for generating a DOC.gz file later, but I'm not sure
it is the best thing to use during the build process.

How about making the following changes to the build process:

  * generate a DOCSTRINGS file for each directory in the source tree
    that contains .m files or C++ files with DEFUNs.  We'll read these
    files to generate the .texi files for the manual, same as we do
    now for the two DOCSTRINGS files that we currently generate.  I
    can do this part.

  * use the DOCSTRINGS files to generate the DOC.gz files.  It should
    be much faster to read the DOCSTRINGS files than it is to parse
    all the .m files in Octave.  We need to pass each docstring in a
    DOCSTRINGS file through makeinfo and then extract the first
    sentence.

| > I propose making gen_doc_cache take two arguments.  The first argument
| > names the output file.  The second names the directory to work on.  If
| > only one argument is given, generate the DOC.gz file for for keywords,
| > operators, etc.
|
| The attached changeset does this. The change also fixes the bug you
| mentioned when doing 'gen_doc_cache (".")'.

OK.

| One thing to consider is where to put the cache for builtin stuff
| (operators, keywords, ...). With the attached patch this is stored in
| the file
|
|   fullfile (octave_config_info.datadir, "DOC-builtin.gz");
|
| (see 'lookfor.m' at line 58). Is that the right position/name?

My Debian-packaged copy of Octave sets datadir to /usr/share, so
that's probably not the best place.  Maybe /usr/share/octave/VERSION?
But there is no variable for just this directory yet, so we would need
to add one.

jwe

Reply | Threaded
Open this post in threaded view
|

Re: Writing 'help' functions as m-files

Søren Hauberg
tir, 10 02 2009 kl. 11:34 -0500, skrev John W. Eaton:

> | >   * What is the slow part?  Running makeinfo?  If so, then I don't see
> | >     much we can do about that.  Or is it extracting the first sentence
> | >     of the doc string?
> |
> | I haven't done any profiling, but I'm guessing it's the many calls to
> | 'makeinfo'. The algorithm for generating caches are in many ways similar
> | to the 'lookfor' code. When I moved 'lookfor' from C++ to an m-file, I
> | didn't really see any significant speed changes, so from that I
> | concluded (a bit hasty) that the slow part was 'makeinfo'.
>
> I suspect there is also some time spent parsing the files to extract
> the docstrings.  Instead of just looking for the docstrings, your
> function is parsing them.

Yeah, I guess that adds some more work.

> Also, we are currently finding the functions in the path, and doing
>
>   if (!dir_in_path)
>     addpath (directory);
>   endif
>
>   ...
>
>   if (!dir_in_path)
>     rmpath (directory);
>   endif
>
> which has some problems.  First, there's no unwind_protect here, so
> you can end up modifying the global path if this function is
> interrupted.  Second, the current directory is still searched before
> the added path, so you could end up witn the wrong docstring in the
> DOC.gz file.  I think it would be better to temporarily cd to the
> directory, do the work, then cd back (using unwind_protect to ensure
> the change is temporary).

Yeah, that is better. Something similar is also needed for 'help.m',
when handling 'Contents.m' files.

> It occurred to me that in the process of building Octave, we already
> generate DOCSTRING files that contain the raw Texinfo docstrings.  I
> think we should be taking advantage of that.  The new function could
> still be useful for generating a DOC.gz file later, but I'm not sure
> it is the best thing to use during the build process.
>
> How about making the following changes to the build process:
>
>   * generate a DOCSTRINGS file for each directory in the source tree
>     that contains .m files or C++ files with DEFUNs.  We'll read these
>     files to generate the .texi files for the manual, same as we do
>     now for the two DOCSTRINGS files that we currently generate.  I
>     can do this part.
>
>   * use the DOCSTRINGS files to generate the DOC.gz files.  It should
>     be much faster to read the DOCSTRINGS files than it is to parse
>     all the .m files in Octave.  We need to pass each docstring in a
>     DOCSTRINGS file through makeinfo and then extract the first
>     sentence.

This sounds like a good idea to me. Just one comment, though: would it
be better to only have one cache instead of one per directory? We would
need a function to determine if a given function was part of Octave, or
somehow provided by the user. I guess this would make 'lookfor' even
faster as it then doesn't have to traverse all directories that comes
with Octave, to search a bunch of caches. It also might make cache
generation more simple. We should still support caches in individual
directories as that would be useful for packages.

Søren

Reply | Threaded
Open this post in threaded view
|

Re: Writing 'help' functions as m-files

John W. Eaton
Administrator
On 10-Feb-2009, Søren Hauberg wrote:

| Yeah, that is better. Something similar is also needed for 'help.m',
| when handling 'Contents.m' files.

Since the contents.m files are supposed to just contain comments with
the help text, shouldn't we just be able to easily grab the docstring
from that file without having to parse it like a function or script?

| This sounds like a good idea to me. Just one comment, though: would it
| be better to only have one cache instead of one per directory?

Sure, we can do that.  I was thinking that by splitting up the
DOCSTRINGS files we could make rebuilding Octave a little faster if
just one .m file changed.  But splitting them up also introduces some
other complications, so for now I think it would be OK to leave the
DOCSTRINGS creation as it is now, and then generate a single DOC
from the src/DOCSTRINGS and scripts/DOCSTRINGS files.

OK, I thought about this some more and checked in the following
changeset:

  http://hg.savannah.gnu.org/hgweb/octave/rev/80910b37d855

I decided to just go with an uncompressed text file for
now.  It is less than 1MB.  It takes about 2 minutes to generate the
DOC file from the src/DOCSTRINGS and scripts/DOCSTRINGS file on my
system (~2GHz AMD64).  Maybe it would be faster if we didn't have to
start makeinfo for each docstring, but instead processed them all at
once, then split them up again.  I'll try to make that work and see if
it helps.

| We would
| need a function to determine if a given function was part of Octave, or
| somehow provided by the user.

Why?  Is the DOC file used for anything other than lookfor?

| We should still support caches in individual
| directories as that would be useful for packages.

Yes.

Some other thoughts:

  * The name of the makeinfo function might lead to some confusion
    since it doesn't run makeinfo, but performs a specialized task
    just for the help functions in Octave.  It might be better to
    rename it to __makeinfo__.m  and consider it an internal
    function.

  * It looks like your gen_doc_cache function is running makeinfo
    twice.  First to generate the formatted help text, then again from
    the get_first_help_sentence function.  I think it would be better
    to simply pass the formatted help tect to
    get_first_help_sentence.  Maybe that should also be an internal
    function.

  * I noticed that help function now prints a list of all functions
    together instead of grouping them by directory.  Is that
    intentional?  I thought the previous output format was more
    helpful.  What do others think?

Thanks,

jwe

Reply | Threaded
Open this post in threaded view
|

Re: Writing 'help' functions as m-files

Søren Hauberg
tir, 10 02 2009 kl. 17:16 -0500, skrev John W. Eaton:
> On 10-Feb-2009, Søren Hauberg wrote:
>
> | Yeah, that is better. Something similar is also needed for 'help.m',
> | when handling 'Contents.m' files.
>
> Since the contents.m files are supposed to just contain comments with
> the help text, shouldn't we just be able to easily grab the docstring
> from that file without having to parse it like a function or script?

Yeah, I guess that makes sense.

> | This sounds like a good idea to me. Just one comment, though: would it
> | be better to only have one cache instead of one per directory?
>
> Sure, we can do that.  I was thinking that by splitting up the
> DOCSTRINGS files we could make rebuilding Octave a little faster if
> just one .m file changed.  But splitting them up also introduces some
> other complications, so for now I think it would be OK to leave the
> DOCSTRINGS creation as it is now, and then generate a single DOC
> from the src/DOCSTRINGS and scripts/DOCSTRINGS files.
>
> OK, I thought about this some more and checked in the following
> changeset:
>
>   http://hg.savannah.gnu.org/hgweb/octave/rev/80910b37d855
>
> I decided to just go with an uncompressed text file for
> now.  It is less than 1MB.  It takes about 2 minutes to generate the
> DOC file from the src/DOCSTRINGS and scripts/DOCSTRINGS file on my
> system (~2GHz AMD64).  Maybe it would be faster if we didn't have to
> start makeinfo for each docstring, but instead processed them all at
> once, then split them up again.  I'll try to make that work and see if
> it helps.

Yeah, I think that would probably speed up building. It's fairly easy to
just insert

  #########################

between individual functions, and then split things apart after running
makeinfo.

> | We would
> | need a function to determine if a given function was part of Octave, or
> | somehow provided by the user.
>
> Why?  Is the DOC file used for anything other than lookfor?

'lookfor' should also search files that aren't cached. So you would need
some way of determining if we should search all functions in a given
directory.

> Some other thoughts:
>
>   * The name of the makeinfo function might lead to some confusion
>     since it doesn't run makeinfo, but performs a specialized task
>     just for the help functions in Octave.  It might be better to
>     rename it to __makeinfo__.m  and consider it an internal
>     function.

Well, it does run the makeinfo program... But you're right that it does
more than this. I don't mind renaming it, but I don't think it should be
a "hidden" function (or whatever we call functions that begin with
"__"). I use the function quite a bit for generating web pages for
Octave-Forge, so I'd at least appreciate if its API was fairly stable.

>   * It looks like your gen_doc_cache function is running makeinfo
>     twice.  First to generate the formatted help text, then again from
>     the get_first_help_sentence function.  I think it would be better
>     to simply pass the formatted help tect to
>     get_first_help_sentence.  Maybe that should also be an internal
>     function.

The problem is that it is not trivial to extract the first sentence. If
you run 'makeinfo' on the entire help message you get something like

   -- Function File: [RETVAL, STATUS] = makeinfo (TEXT, OUTPUT_TYPE)
   -- Function File: [RETVAL, STATUS] = makeinfo (TEXT, OUTPUT_TYPE,
            SEE_ALSO)
       Run `makeinfo' on a given text.

which you then have to parse. The old code in 'lookfor' did this, and it
was very complicated, yet I don't think it handled all corner-cases.
What the current code does is to first remove all lines from the texinfo
code that begins with '@def', since that removes '@deftypefn',
'@defvar', etc., and then run 'makeinfo' on the text, which gives you
something that is quite simple to parse. The downside is that you have
to call 'makeinfo' twice, but the resulting code is much much more
simple, and in general seems to work better. So, I wouldn't change this,
unless you can come up with some clever trick.

>   * I noticed that help function now prints a list of all functions
>     together instead of grouping them by directory.  Is that
>     intentional?  I thought the previous output format was more
>     helpful.  What do others think?

Hmm, I thought I actually fixed that, but apparently I didn't.
Personally, I would actually prefer neither of the two options. I think
just typing 'help' provides waay too much output. I think it would be
better to simply print something like (very much written off the top of
my head):

  This is the GNU Octave prompt.

  To run a script saved in the file 'myscript.m', type

    myscript

  and press ENTER.

  To get help with individual commands and functions type

    help name

  where you should replace 'name' with the name of the command or
  function you would like to learn more about.

  For a more detailed introduction to GNU Octave, please consult the
  manual. To read this from the prompt type

    doc

  Enjoy GNU Octave! If you have questions, feel free to join the GNU
  Octave community. See http://www.octave.org for details.

Or something like that.

Søren

Reply | Threaded
Open this post in threaded view
|

Re: Writing 'help' functions as m-files

John W. Eaton
Administrator
On 10-Feb-2009, Søren Hauberg wrote:

| tir, 10 02 2009 kl. 17:16 -0500, skrev John W. Eaton:
|
| > start makeinfo for each docstring, but instead processed them all at
| > once, then split them up again.  I'll try to make that work and see if
| > it helps.
|
| Yeah, I think that would probably speed up building. It's fairly easy to
| just insert
|
|   #########################
|
| between individual functions, and then split things apart after running
| makeinfo.

The DOCSTRINGS files already use ascii 31 for this, and it passes
through makeinfo fine, so I just used that.

My latest version is checked in.  It

  reads the contents of all the DOCSTRINGS files into memory

  replaces @seealso{fcn-list} with "See also: fcn-list."

  strips the "-*- texinfo -*-" tags

  runs makeinfo on the result (using system, not the Octave makeinfo
  function, since the extra processing of that function is not needed
  now)

  finds the separator characters in the output from makeinfo to work
  on each docstring

  splits the function name and doc string (the function name appears
  just after the separator character, same as in the DOCSTRINGS files)

  finds the first sentence, stripping extra spaces and newline characters

  stores the symbol name, docstring, and first sentence in a cell
  array

  saves the cell array

I parsed the output by noting that each formatted docstring should
begin with

 " -- Function File: ..."

or similar. The key is that the " -- " at the beginning of a line
should only appear as the tag of a header like this, so I just look
for the end of the last line containing one of those, and assume the
first sentence begins just after that.

| > | We would
| > | need a function to determine if a given function was part of Octave, or
| > | somehow provided by the user.
| >
| > Why?  Is the DOC file used for anything other than lookfor?
|
| 'lookfor' should also search files that aren't cached. So you would need
| some way of determining if we should search all functions in a given
| directory.

OK.

| > Some other thoughts:
| >
| >   * The name of the makeinfo function might lead to some confusion
| >     since it doesn't run makeinfo, but performs a specialized task
| >     just for the help functions in Octave.  It might be better to
| >     rename it to __makeinfo__.m  and consider it an internal
| >     function.
|
| Well, it does run the makeinfo program... But you're right that it does
| more than this. I don't mind renaming it, but I don't think it should be
| a "hidden" function (or whatever we call functions that begin with
| "__"). I use the function quite a bit for generating web pages for
| Octave-Forge, so I'd at least appreciate if its API was fairly stable.

I don't think the API would change much. It's just that it seems to me
that it is the wrong name for such a specialized function, and I don't
think very many users will need it.

| >   * It looks like your gen_doc_cache function is running makeinfo
| >     twice.  First to generate the formatted help text, then again from
| >     the get_first_help_sentence function.  I think it would be better
| >     to simply pass the formatted help tect to
| >     get_first_help_sentence.  Maybe that should also be an internal
| >     function.
|
| The problem is that it is not trivial to extract the first sentence. If
| you run 'makeinfo' on the entire help message you get something like
|
|    -- Function File: [RETVAL, STATUS] = makeinfo (TEXT, OUTPUT_TYPE)
|    -- Function File: [RETVAL, STATUS] = makeinfo (TEXT, OUTPUT_TYPE,
|             SEE_ALSO)
|        Run `makeinfo' on a given text.
|
| which you then have to parse. The old code in 'lookfor' did this, and it
| was very complicated, yet I don't think it handled all corner-cases.

See above.  Does it really need to be complicated?

Next, I'll try to fix the Makefile to install the DOC file somewhere.

jwe

Reply | Threaded
Open this post in threaded view
|

Re: Writing 'help' functions as m-files

Søren Hauberg
tir, 10 02 2009 kl. 22:04 -0500, skrev John W. Eaton:
> I parsed the output by noting that each formatted docstring should
> begin with
>
>  " -- Function File: ..."
>
> or similar. The key is that the " -- " at the beginning of a line
> should only appear as the tag of a header like this, so I just look
> for the end of the last line containing one of those, and assume the
> first sentence begins just after that.

Does that work when the header span several lines? Typing 'help
makeinfo'  (as an example) gives me

 -- Function File: [RETVAL, STATUS] = makeinfo (TEXT, OUTPUT_TYPE)
 -- Function File: [RETVAL, STATUS] = makeinfo (TEXT, OUTPUT_TYPE,
          SEE_ALSO)
     Run `makeinfo' on a given text.

plus some more. From what you described I would assume that the first
help sentence would be

          SEE_ALSO)
     Run `makeinfo' on a given text.

Which isn't really what we want.

> | Well, it does run the makeinfo program... But you're right that it does
> | more than this. I don't mind renaming it, but I don't think it should be
> | a "hidden" function (or whatever we call functions that begin with
> | "__"). I use the function quite a bit for generating web pages for
> | Octave-Forge, so I'd at least appreciate if its API was fairly stable.
>
> I don't think the API would change much. It's just that it seems to me
> that it is the wrong name for such a specialized function, and I don't
> think very many users will need it.

OK. I guess you're right. Let's just call it __makeinfo__

Søren

Reply | Threaded
Open this post in threaded view
|

Re: Writing 'help' functions as m-files

John W. Eaton
Administrator
On 11-Feb-2009, Søren Hauberg wrote:

| Does that work when the header span several lines? Typing 'help
| makeinfo'  (as an example) gives me
|
|  -- Function File: [RETVAL, STATUS] = makeinfo (TEXT, OUTPUT_TYPE)
|  -- Function File: [RETVAL, STATUS] = makeinfo (TEXT, OUTPUT_TYPE,
|           SEE_ALSO)
|      Run `makeinfo' on a given text.
|
| plus some more. From what you described I would assume that the first
| help sentence would be
|
|           SEE_ALSO)
|      Run `makeinfo' on a given text.
|
| Which isn't really what we want.

How about using --fill-column=NUM with a large value of NUM?  We don't
need the result to fit in any particular width screen.  Something like
1024 for NUM should be big enough.  I checked in a change.

jwe

Reply | Threaded
Open this post in threaded view
|

Re: Writing 'help' functions as m-files

Søren Hauberg
ons, 11 02 2009 kl. 01:48 -0500, skrev John W. Eaton:

> On 11-Feb-2009, Søren Hauberg wrote:
>
> | Does that work when the header span several lines? Typing 'help
> | makeinfo'  (as an example) gives me
> |
> |  -- Function File: [RETVAL, STATUS] = makeinfo (TEXT, OUTPUT_TYPE)
> |  -- Function File: [RETVAL, STATUS] = makeinfo (TEXT, OUTPUT_TYPE,
> |           SEE_ALSO)
> |      Run `makeinfo' on a given text.
> |
> | plus some more. From what you described I would assume that the first
> | help sentence would be
> |
> |           SEE_ALSO)
> |      Run `makeinfo' on a given text.
> |
> | Which isn't really what we want.
>
> How about using --fill-column=NUM with a large value of NUM?  We don't
> need the result to fit in any particular width screen.  Something like
> 1024 for NUM should be big enough.  I checked in a change.

This works for generating caches as we don't care about the width of the
text. I seem to remember more odd corner-cases, but I can't think of any
right now. I'll speak up if I remember any.

Søren

Reply | Threaded
Open this post in threaded view
|

Re: Writing 'help' functions as m-files

Søren Hauberg
In reply to this post by John W. Eaton
tir, 10 02 2009 kl. 22:04 -0500, skrev John W. Eaton:
> My latest version is checked in.

Attached is a changeset that updates 'lookfor' to use the new cache
scheme.

Søren

lookfor.changeset (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Writing 'help' functions as m-files

Søren Hauberg
In reply to this post by John W. Eaton
With a recent checkout I see that the bottom of 'makeinfo.m' says

    unwind_protect_cleanup
  #    if (exist (name, "file"))
  #      delete (name);
  #    endif
    end_unwind_protect

Why has the deletion of the temporary file 'name' been removed? The
result is that temporary files are written to the current directory, but
never removed.

Søren

123