Patching Octave-MPI

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Patching Octave-MPI

JD Cole-2
   In the process of patching Andy's MPI support into octave, I have
come across the following dilemma: Should I add additional make targets
to config files such as Makeconf.in, like this:
+ mpi%.o : mpi%.cc
+     $(CXX) -c $(CPPFLAGS) $(MPI_DEFS) $(MPI_CPPFLAGS) $(ALL_CXXFLAGS)
$< -o $@

%.o : %.cc
    $(CXX) -c $(CPPFLAGS) $(ALL_CXXFLAGS) $< -o $@

or

keep with the original:

%.o : %.cc
    $(CXX) -c $(CPPFLAGS) $(ALL_CXXFLAGS) $< -o $@

and add MPI_CPPFLAGS to ALL_CXXFLAGS somewhere inbetween configure.in
and Makeconf.in (I'm wondering if appending XTRA_CXXFLAGS would be
appropriate for this?)

I am partial to the latter solution because it reduces the amount of
code duplication in the makefiles. As far as I could see adding MPI
includes to the global includes with which octave is compiled shouldn't
effect the end result. The situation may be slightly different if I were
to add the MPI_LDFLAGS to ALL_LDFLAGS.

Any thoughts?

Thanks,
JD


Reply | Threaded
Open this post in threaded view
|

Patching Octave-MPI

John W. Eaton-6
On 20-Nov-2002, JD Cole <[hidden email]> wrote:

|    In the process of patching Andy's MPI support into octave, I have
| come across the following dilemma: Should I add additional make targets
| to config files such as Makeconf.in, like this:
| + mpi%.o : mpi%.cc
| +     $(CXX) -c $(CPPFLAGS) $(MPI_DEFS) $(MPI_CPPFLAGS) $(ALL_CXXFLAGS)
| $< -o $@
|
| %.o : %.cc
|     $(CXX) -c $(CPPFLAGS) $(ALL_CXXFLAGS) $< -o $@
|
| or
|
| keep with the original:
|
| %.o : %.cc
|     $(CXX) -c $(CPPFLAGS) $(ALL_CXXFLAGS) $< -o $@
|
| and add MPI_CPPFLAGS to ALL_CXXFLAGS somewhere inbetween configure.in
| and Makeconf.in (I'm wondering if appending XTRA_CXXFLAGS would be
| appropriate for this?)

What are these MPI_* variables defined to be?  Are special things
needed because of the way the MPI headers and libraries are installed
on your system, or is it something that is always needed?

| I am partial to the latter solution because it reduces the amount of
| code duplication in the makefiles. As far as I could see adding MPI
| includes to the global includes with which octave is compiled shouldn't
| effect the end result. The situation may be slightly different if I were
| to add the MPI_LDFLAGS to ALL_LDFLAGS.

What would go in MPI_LDFLAGS?

| Any thoughts?

I've been assuming that we would add a configure option to enable
MPI.  That would add whatever is necessary to the normal INCFLAGS,
etc.

jwe


Reply | Threaded
Open this post in threaded view
|

Re: Patching Octave-MPI

Paul Kienzle-2
There are a variety of approaches to parallelism which have been done
in matlab m-files and mex-files.  E.g.,

        http://supertech.lcs.mit.edu/~cly/survey.html

I'm a little surprised that you would need to patch octave itself to
do the same.

If you want fast parallel matrix operations, wouldn't you be better off
having an interface to scalapack or something similar?  Or are you
designing new parallel algorithms?  

Paul Kienzle
[hidden email]

On Wed, Nov 20, 2002 at 09:12:07PM -0600, John W. Eaton wrote:

> On 20-Nov-2002, JD Cole <[hidden email]> wrote:
>
> |    In the process of patching Andy's MPI support into octave, I have
> | come across the following dilemma: Should I add additional make targets
> | to config files such as Makeconf.in, like this:
> | + mpi%.o : mpi%.cc
> | +     $(CXX) -c $(CPPFLAGS) $(MPI_DEFS) $(MPI_CPPFLAGS) $(ALL_CXXFLAGS)
> | $< -o $@
> |
> | %.o : %.cc
> |     $(CXX) -c $(CPPFLAGS) $(ALL_CXXFLAGS) $< -o $@
> |
> | or
> |
> | keep with the original:
> |
> | %.o : %.cc
> |     $(CXX) -c $(CPPFLAGS) $(ALL_CXXFLAGS) $< -o $@
> |
> | and add MPI_CPPFLAGS to ALL_CXXFLAGS somewhere inbetween configure.in
> | and Makeconf.in (I'm wondering if appending XTRA_CXXFLAGS would be
> | appropriate for this?)
>
> What are these MPI_* variables defined to be?  Are special things
> needed because of the way the MPI headers and libraries are installed
> on your system, or is it something that is always needed?
>
> | I am partial to the latter solution because it reduces the amount of
> | code duplication in the makefiles. As far as I could see adding MPI
> | includes to the global includes with which octave is compiled shouldn't
> | effect the end result. The situation may be slightly different if I were
> | to add the MPI_LDFLAGS to ALL_LDFLAGS.
>
> What would go in MPI_LDFLAGS?
>
> | Any thoughts?
>
> I've been assuming that we would add a configure option to enable
> MPI.  That would add whatever is necessary to the normal INCFLAGS,
> etc.
>
> jwe
>


Reply | Threaded
Open this post in threaded view
|

Re: Patching Octave-MPI

JD Cole-2
Paul Kienzle wrote:

>There are a variety of approaches to parallelism which have been done
>in matlab m-files and mex-files.  E.g.,
>
> http://supertech.lcs.mit.edu/~cly/survey.html
>
>I'm a little surprised that you would need to patch octave itself to
>do the same.
>
Yes, it is quite possible that and octave wrapper API for MPI could be
implemented via .oct files (I am currently looking at this solution),
and thank you for the interesting Matlab links.  My current effort is to
bring Andy Jocabson's work (found at
http://corto.icg.to.infn.it/andy/octave-mpi/) to the most recent version
of octave. Not necessarily to integrate with the octave distribution,
but to make it available to those who may want to use it. (This also
gives me a chance to better understand the proper way to add
functionality to octave in the future.)

Thanks,
  JD


Reply | Threaded
Open this post in threaded view
|

Re: Patching Octave-MPI

John W. Eaton-6
In reply to this post by Paul Kienzle-2
On 21-Nov-2002, Paul Kienzle <[hidden email]> wrote:

| If you want fast parallel matrix operations, wouldn't you be better off
| having an interface to scalapack or something similar?

Speaking of that, do you think it would be a good time to make it
possible for Octave to easily link to whatever version of the
blas/lapack the user wants at run time?  So instead of

  F77_XFCN (dgemm, DGEMM) (...);

in the code and

  g++ ... -llapack -lblas ...

or whatever, we would use

  octave_blas::dgemm_ptr (...);

or similar, and not link directly to lapack or blas, but load them and
initialize the set of pointers at startup time.  Then you could tell
Octave which lapack/blas combination you want using an environment
variable or command-line option.  This all assumes the right set
of configure options, otherwise we could still link to the normal
libraries and not offer the run-time option.

Does that allow us to support scalapack, or does scalapack not have
the same interface as lapack (after doing a quick check, it looks like
it does not).

Even if this does not allow simple quick support for scalapack, would
it be a useful addition?

jwe


Reply | Threaded
Open this post in threaded view
|

Re: Patching Octave-MPI

Andy Jacobson-2-2
In reply to this post by Paul Kienzle-2
>>>>> "Paul" == Paul Kienzle <[hidden email]> writes:

    Paul> There are a variety of approaches to parallelism which have
    Paul> been done in matlab m-files and mex-files.  E.g.,

Alex Verstak raised an interesting point last year: he said that the
MPI standard requires that MPI_Init() be fed the *original* argv and
argc.  (Yes, it is arguably a silly requirement.)  With at least MPICH
apparently, MPI_Init() would not work from a .oct file.  See:

http://www.octave.org/mailing-lists/help-octave/2001/141

Later however, Alex apparently did get it running:

http://www.octave.org/mailing-lists/help-octave/2001/218

So perhaps a .oct solution will work.

Note however that the parallelized Matlab solutions all are guided by
two facts: (1) they cannot modify the source code so the
parallelization has to work via .mex/.m files, and (2) due to the cost
of multiple Matlab licenses, there may be implicit assumptions about
how many instances are running in a given MPI application.  There may
be hidden performance issues.

It should a pretty trivial matter to determine whether or not the user
wants a parallel or traditional application when she starts octave.
Perhaps something as simple as "octave --mpi-server" or "octave
--mpi-client".  Then we call MPI_Init() (and later MPI_Finalize()) as
appropriate.  That is just about all that is required from the main
code; the rest could naturally be moved to .oct.

        Regards,

               Andy

--
Andy Jacobson

[hidden email]

Program in Atmospheric and Oceanic Sciences
Sayre Hall, Forrestal Campus
Princeton University
PO Box CN710 Princeton, NJ 08544-0710 USA

Tel: 609/258-5260  Fax: 609/258-2850


Reply | Threaded
Open this post in threaded view
|

Re: Patching Octave-MPI

Paul Kienzle-2
In reply to this post by John W. Eaton-6
On Thu, Nov 21, 2002 at 10:08:53AM -0600, John W. Eaton wrote:

> On 21-Nov-2002, Paul Kienzle <[hidden email]> wrote:
>
> | If you want fast parallel matrix operations, wouldn't you be better off
> | having an interface to scalapack or something similar?
>
> Speaking of that, do you think it would be a good time to make it
> possible for Octave to easily link to whatever version of the
> blas/lapack the user wants at run time?  So instead of
>
>   F77_XFCN (dgemm, DGEMM) (...);
>
> in the code and
>
>   g++ ... -llapack -lblas ...
>
> or whatever, we would use
>
>   octave_blas::dgemm_ptr (...);
>
> or similar, and not link directly to lapack or blas, but load them and
> initialize the set of pointers at startup time.  Then you could tell
> Octave which lapack/blas combination you want using an environment
> variable or command-line option.  This all assumes the right set
> of configure options, otherwise we could still link to the normal
> libraries and not offer the run-time option.
>
> Does that allow us to support scalapack, or does scalapack not have
> the same interface as lapack (after doing a quick check, it looks like
> it does not).
>
> Even if this does not allow simple quick support for scalapack, would
> it be a useful addition?

I don't know anything about the details of distributed matrix libraries.
In particular, I don't know how they decide if and when to distribute
data.  

As a high level language, I would like octave to do the right thing.

If that means calling the local blas for small matrices and the parallel
blas for large matrices, then Octave would have to be linked against
both simultaneously.  If on the other hand the parallel blas are half
way intelligent and don't distribute small computations across the net,
there is no reason to link to the local blas.  If the user is really
concerned, then link two different executables: poctave against the
parallel blas and octave against the serial blas.


Reply | Threaded
Open this post in threaded view
|

Re: Patching Octave-MPI

Paul Kienzle-2
In reply to this post by Andy Jacobson-2-2
So maybe we need the equivalent of gcc's -Wl,"blah", so that octave
m-files/octave-files can receive parameters from the command line
that octave ignores?  From there we can build a pretend original
argc/argv pair to hand to MPI.

Paul Kienzle
[hidden email]

On Thu, Nov 21, 2002 at 11:51:19AM -0500, Andy Jacobson wrote:

> >>>>> "Paul" == Paul Kienzle <[hidden email]> writes:
>
>     Paul> There are a variety of approaches to parallelism which have
>     Paul> been done in matlab m-files and mex-files.  E.g.,
>
> Alex Verstak raised an interesting point last year: he said that the
> MPI standard requires that MPI_Init() be fed the *original* argv and
> argc.  (Yes, it is arguably a silly requirement.)  With at least MPICH
> apparently, MPI_Init() would not work from a .oct file.  See:
>
> http://www.octave.org/mailing-lists/help-octave/2001/141
>
> Later however, Alex apparently did get it running:
>
> http://www.octave.org/mailing-lists/help-octave/2001/218
>
> So perhaps a .oct solution will work.
>
> Note however that the parallelized Matlab solutions all are guided by
> two facts: (1) they cannot modify the source code so the
> parallelization has to work via .mex/.m files, and (2) due to the cost
> of multiple Matlab licenses, there may be implicit assumptions about
> how many instances are running in a given MPI application.  There may
> be hidden performance issues.
>
> It should a pretty trivial matter to determine whether or not the user
> wants a parallel or traditional application when she starts octave.
> Perhaps something as simple as "octave --mpi-server" or "octave
> --mpi-client".  Then we call MPI_Init() (and later MPI_Finalize()) as
> appropriate.  That is just about all that is required from the main
> code; the rest could naturally be moved to .oct.
>
>         Regards,
>
>                Andy
>
> --
> Andy Jacobson
>
> [hidden email]
>
> Program in Atmospheric and Oceanic Sciences
> Sayre Hall, Forrestal Campus
> Princeton University
> PO Box CN710 Princeton, NJ 08544-0710 USA
>
> Tel: 609/258-5260  Fax: 609/258-2850
>


Reply | Threaded
Open this post in threaded view
|

Re: Patching Octave-MPI

John W. Eaton-6
In reply to this post by Andy Jacobson-2-2
On 21-Nov-2002, Andy Jacobson <[hidden email]> wrote:

| Alex Verstak raised an interesting point last year: he said that the
| MPI standard requires that MPI_Init() be fed the *original* argv and
| argc.

Yes, I think that's what he claimed.

But after looking at the MPI docs, I think it wants pointers to them,
so it can strip out its special arguments first, modifying the
argv/argc pair which the original application would then use.  The
argument I remember seeing (from Alex?) was that there is no standard
for the argument names, so the application could not know which args
to avoid.  But I think this is a weak argument, because as Paul points
out, you could mark them specially.  So unless there is some *other*
reason that the MPI_Init function needs pointers to the actual
argv/argc of the main program (say, to get an idea of where the stack
starts for GC or something -- another bad idea, I think!) then I don't
see any reason that you couldn't just fake your own argv/argc for MPI.

There is also a statement in the MPI docs that I saw saying something
about being sure to initialize MPI before doing much of anything (I
think I/O and opening files were mentioned).  Does anyone know the
rationale for that?  What difference would that make?  If it really
does make a difference, then there is another reason to put the call
to MPI_Init in the core Octave.

| It should a pretty trivial matter to determine whether or not the user
| wants a parallel or traditional application when she starts octave.
| Perhaps something as simple as "octave --mpi-server" or "octave
| --mpi-client".  Then we call MPI_Init() (and later MPI_Finalize()) as
| appropriate.  That is just about all that is required from the main
| code; the rest could naturally be moved to .oct.

This is what I was thinking also, that we would

  * Compile support for MPI into Octave based on configure flags (so
    you could omit it completely and still build a working Octave.

  * Even for Octave built with MPI support, only enable MPI
    functionality at run time if Octave is given the appropriate
    command-line option.

jwe


Reply | Threaded
Open this post in threaded view
|

Re: Patching Octave-MPI

John W. Eaton-6
In reply to this post by Paul Kienzle-2
On 21-Nov-2002, Paul Kienzle <[hidden email]> wrote:

| I don't know anything about the details of distributed matrix libraries.
| In particular, I don't know how they decide if and when to distribute
| data.  
|
| As a high level language, I would like octave to do the right thing.
|
| If that means calling the local blas for small matrices and the parallel
| blas for large matrices, then Octave would have to be linked against
| both simultaneously.  If on the other hand the parallel blas are half
| way intelligent and don't distribute small computations across the net,
| there is no reason to link to the local blas.  If the user is really
| concerned, then link two different executables: poctave against the
| parallel blas and octave against the serial blas.

If the distributed matrix libraries don't do the right thing, then I
suppose one could always write a wrapper that does, though it would be
uglier if the distributed library uses the same names as the standard
one.  But in any case, what I was proposing would make it somewhat
easier to experiment.  And also somewhat easier to move an Octave
binary from one machine to another when using ATLAS or other tuned
blas/lapack libraries, because you wouldn't have to rely on dynamic
linker tricks.  You could just set an option saying what version of
the libraries you want to use.  This also makes it easier to
demonstrate the speedup for a given version of the libraries.  So I
think it is worth doing for these reasons, but I wanted to know if
anyone sees something that I'm missing.  Are there any reasons that it
would not be worth the effort?

jwe


Reply | Threaded
Open this post in threaded view
|

Re: Patching Octave-MPI

JD Cole-2
In reply to this post by Andy Jacobson-2-2
Andy Jacobson wrote:

>>>>>>"Paul" == Paul Kienzle <[hidden email]> writes:
>>>>>>            
>>>>>>
>
>    Paul> There are a variety of approaches to parallelism which have
>    Paul> been done in matlab m-files and mex-files.  E.g.,
>
>Alex Verstak raised an interesting point last year: he said that the
>MPI standard requires that MPI_Init() be fed the *original* argv and
>argc.  (Yes, it is arguably a silly requirement.)  With at least MPICH
>apparently, MPI_Init() would not work from a .oct file.  See:
>
>http://www.octave.org/mailing-lists/help-octave/2001/141
>
>Later however, Alex apparently did get it running:
>
>http://www.octave.org/mailing-lists/help-octave/2001/218
>
>So perhaps a .oct solution will work.
>
>Note however that the parallelized Matlab solutions all are guided by
>two facts: (1) they cannot modify the source code so the
>parallelization has to work via .mex/.m files, and (2) due to the cost
>of multiple Matlab licenses, there may be implicit assumptions about
>how many instances are running in a given MPI application.  There may
>be hidden performance issues.
>
>It should a pretty trivial matter to determine whether or not the user
>wants a parallel or traditional application when she starts octave.
>Perhaps something as simple as "octave --mpi-server" or "octave
>--mpi-client".  Then we call MPI_Init() (and later MPI_Finalize()) as
>appropriate.  That is just about all that is required from the main
>code; the rest could naturally be moved to .oct.
>
>        Regards,
>
>               Andy
>
>  
>
I just created mpi_init.oct and mpi_finalize.oct, under LAM, mpi_init
seems to initialize fine, however when I call the second function,
mpi_finalize, I get a failure message telling me that a function has
been called without a call to MPI_Init. This result is consistent with
what Andy described above, however, in the LAM man page for MPI_Init it
reads:

MPI mandates that the same thread must call MPI_Init or MPI_Init_thread
and MPI_Finalize

This leads me to the question: Do all octave DLD functions execute in
the same thread?

JD



Reply | Threaded
Open this post in threaded view
|

Re: Patching Octave-MPI

JD Cole-2
>
>
>>
> I just created mpi_init.oct and mpi_finalize.oct, under LAM, mpi_init
> seems to initialize fine, however when I call the second function,
> mpi_finalize, I get a failure message telling me that a function has
> been called without a call to MPI_Init. This result is consistent with
> what Andy described above, however, in the LAM man page for MPI_Init
> it reads:
>
> MPI mandates that the same thread must call MPI_Init or
> MPI_Init_thread and MPI_Finalize
>
> This leads me to the question: Do all octave DLD functions execute in
> the same thread?
>
> JD

By the way, I tried NULL arguments in addition to faking it, i.e. argc=1
argv[0]="octave", both successfully inited, but neither effected the
mpi_finalize result.
JD




Reply | Threaded
Open this post in threaded view
|

Re: Patching Octave-MPI

Andy Jacobson-2-2
In reply to this post by John W. Eaton-6
>>>>> "JWE" == John W Eaton <[hidden email]> writes:


    JWE> There is also a statement in the MPI docs that I saw saying
    JWE> something about being sure to initialize MPI before doing
    JWE> much of anything (I think I/O and opening files were
    JWE> mentioned).  Does anyone know the rationale for that?  What
    JWE> difference would that make?  If it really does make a
    JWE> difference, then there is another reason to put the call to
    JWE> MPI_Init in the core Octave.

I believe that all I/O is implementation-specific.  It is omitted from
the MPI-1 standard
(http://www.mpi-forum.org/docs/mpi-11-html/node6.html#Node6).

I'm only familiar with the LAM implementation of MPI, and in that case
one I/O issue involves determining whether or not stdout and stderr
are (a) linked to a terminal, (b) need to be forwarded to a remote
terminal, or (c) dumped to /dev/null.  It's my understanding that these
decisions are implementation-dependent, so I expect that MPICH handles
them a different way.

    JWE> This is what I was thinking also, that we would

    JWE>   * Compile support for MPI into Octave based on configure
    JWE> flags (so you could omit it completely and still build a
    JWE> working Octave.

    JWE>   * Even for Octave built with MPI support, only enable MPI
    JWE> functionality at run time if Octave is given the appropriate
    JWE> command-line option.

This seems like a perfectly reasonable solution to me.  However, I
would like to point out a couple of human interface issues.

MPI apps are usually started by some external code.  In LAM's case you
first have to start the LAM daemon on all the computers (via the
"lamboot" program).  Then you call "mpirun", which takes
responsibility for executing the member programs which make up the MPI
application (e.g. loading N instances of octave on M computers).
After the application is finished, you generally have to shut down the
LAM daemons manually (via "wipe" or "lamhalt").  Users will have to
know the details of starting an MPI application according to their MPI
implementation (LAM, MPICH, etc) in order to get octave to run in
parallel.

In addition, users need to know the compilation and linker flags
necessary to get an MPI-aware octave to compile successfully.  Once
more, this is implementation-specific.  I don't know whether autoconf
can be made smart enough to figure out how to do this.  LAM very
unfortunately hides these flags from the user by wrapping the compiler
in a script.  While it is possible to determine the proper flags from
this wrapper script, this scheme feels like a needless obfuscation to
me.

Finally, for developers, debugging parallel applications can be quite
challenging.  Enough said, I hope.

Empirically, I have found that these are fairly high barriers to the
use of MPI codes by non alpha geeks.  I think MPI-enabled octave,
especially if linked to ScaLAPACK or some such, would be very useful
to some people, and raise a lot of eyebrows--but it is unlikely to
ever be easy to use.  

I suggested one particular balance between access to low-level
functionality and high-level convenience with the MPI patches to
octave-2.1.31.  I thought the best part was that you could send
arbitrary octave_value objects between nodes (via a minor hack to
load/save).  Alex Verstak struck a different balance by providing a
raw interface to the MPI send/receive functions.  This is very
appealing just as a means to learn how parallel applications work.  I
think it would be great if both levels were available in the final
product.

        Regards,
                Andy

--
Andy Jacobson

[hidden email]

Program in Atmospheric and Oceanic Sciences
Sayre Hall, Forrestal Campus
Princeton University
PO Box CN710 Princeton, NJ 08544-0710 USA

Tel: 609/258-5260  Fax: 609/258-2850