Octave Interpreter

classic Classic list List threaded Threaded
13 messages Options
mk
Reply | Threaded
Open this post in threaded view
|

Octave Interpreter

mk
Hi All, 
I was wondering whether Octave has an equivalent that works on large-scale distributed machines?
If not, would such a project be of any importance to the Octave community? I would like to build an 
"Octave-like" front end on top of a distributed setup for large scale ML algorithms to be implemented
by ML practitioners. As opposed, to how its currently done in projects like GraphLab. 
For this I would need some directions/pointers as to how I can build a compiler for such a language.
Does anyone here have any suggestions?

Thanks,
Manu
Reply | Threaded
Open this post in threaded view
|

Re: Octave Interpreter

Bipin Mathew
Hello Manu,

    I was actually thinking of working on something along the same lines. Googling around, I see a few efforts along these lines:


While I have not used these personally, looking at the examples, it appears one needs to be quite aware of MPI / parallel programming to make effective use of these package. It would be nice if all these things could be made more transparent to the end user.

Many years ago, I actually worked with a team at MIT Lincoln Laboratory, designing this sort of  thing for Matlab:


The idea was to hide as much of the parallelism as possible. This was done by using a special "dmatrix" class to represent your matrix, which was, in reality, spread across several servers. I don't know if this is still an active project there ( Any of my old Lincoln friends can comment? :-) ).

Of course technology has moved on since then and there is probably alot of new technology we can leverage to make something similar that cleaner/faster/smarter ( i.e. more awsome ). For example, metadata for where the data was was stored in a static file back in days. I reckon we can just query a distributed file system to get this same information these days, along with all the benefits of replication / performance / resilience. 

All this being said, perhaps someone more familiar with the source code could comment on how easy / impossible such a thing would be given the current state of Octave? In terms of your direct question of how you can build a compiler for Octave, I believe a good place to start is to look at the lex/yacc grammar files:

octave/libinterp/parse-tree/lex.ll
octave/libinterp/parse-tree/oct-parse.yy

Regards,

Bipin

   

On Wed, Oct 1, 2014 at 3:31 PM, Manu Kaul <[hidden email]> wrote:
Hi All, 
I was wondering whether Octave has an equivalent that works on large-scale distributed machines?
If not, would such a project be of any importance to the Octave community? I would like to build an 
"Octave-like" front end on top of a distributed setup for large scale ML algorithms to be implemented
by ML practitioners. As opposed, to how its currently done in projects like GraphLab. 
For this I would need some directions/pointers as to how I can build a compiler for such a language.
Does anyone here have any suggestions?

Thanks,
Manu

Reply | Threaded
Open this post in threaded view
|

Re: Octave Interpreter

Stefan Seefeld
On 2014-10-01 21:16, Bipin Mathew wrote:

> Hello Manu,
>
>     I was actually thinking of working on something along the same
> lines. Googling around, I see a few efforts along these lines:
>
> http://octave.sourceforge.net/parallel/
> http://octave.sourceforge.net/mpi/index.html
>
> While I have not used these personally, looking at the examples, it
> appears one needs to be quite aware of MPI / parallel programming to
> make effective use of these package. It would be nice if all these
> things could be made more transparent to the end user.
>
> Many years ago, I actually worked with a team at MIT Lincoln
> Laboratory, designing this sort of  thing for Matlab:
>
> http://www.ll.mit.edu/mission/cybersec/softwaretools/pmatlab/pmatlab.html
>
> The idea was to hide as much of the parallelism as possible. This was
> done by using a special "dmatrix" class to represent your matrix,
> which was, in reality, spread across several servers. I don't know if
> this is still an active project there ( Any of my old Lincoln friends
> can comment? :-) ).

Interesting that you bring this project up, as I was considering
proposing something similar. While I haven't worked at the MIT/LL, I
have collaborated with them.

> Of course technology has moved on since then and there is probably
> alot of new technology we can leverage to make something similar that
> cleaner/faster/smarter ( i.e. more awsome ). For example, metadata for
> where the data was was stored in a static file back in days. I reckon
> we can just query a distributed file system to get this same
> information these days, along with all the benefits of replication /
> performance / resilience.

I believe Jeremy Kepner's original implementation was deliberately using
very simple means to implement the transport protocol (all file-based),
to make it easy for users to get off the ground.

In collaboration with the MIT/LL et al. I have been working on a C++ API
specification (http://portals.omg.org/hpec/content/specifications) as
well as library implementation (http://openvsip.org/) incorporating much
of the concepts of pMatlab's distributed arrays (i.e., using "maps" to
define the distribution of blocks of data across many computing nodes).

I was considering working on scripting frontends for that implementation
in Python and Octave, and I have been watching with interest the recent
work on classdef support in Octave, as I was considering using that to
define the high-level API for Maps, Blocks, and Views.
(In fact, I presented the idea to add Python and MATLAB "frontends" to
VSIPL++ backends for a seamless integration into a high-performance
computing platform at an IEEE conference two years ago:
http://ieee-hpec.org/2012/index_htm_files/Seefeld.pdf, and the ideas
were received quite enthusiastically.)

>  All this being said, perhaps someone more familiar with the source
> code could comment on how easy / impossible such a thing would be
> given the current state of Octave? In terms of your direct question of
> how you can build a compiler for Octave, I believe a good place to
> start is to look at the lex/yacc grammar files:
>
> octave/libinterp/parse-tree/lex.ll
> octave/libinterp/parse-tree/oct-parse.yy

I'm honestly not entirely sure I understand the question: What
additional support in terms of language support is needed to provide a
"distributed Octave" ? What I was expecting (though I haven't
investigated much of it yet) is infrastructure work akin to
http://ipython.scipy.org/talks/0903_siamcse09_ipython_dist_bgranger.pdf,
to allow an Octave "frontend" to dispatch computation to multiple "engines".

Would any of this require changes or additions to the language /
interpreter, rather than just additional functions ?

I think a good path towards support for pMatlab in Octave would be

* infrastructure work that allows multiple "octave engines" to be
controlled from one "octave controller" via some scheduler, akin to
ipython's implementation)
* an Octave package that adds an MPI-like API on top of the above
* A new API defining distributed arrays in terms of "maps"

I'd be happy to collaborate on this, in particular as there are
discussions in the OMG HPEC Working Group (http://portals.omg.org/hpec/)
to add new language bindings for Python and MATLAB/Octave to the VSIP
specifications.

    Stefan

--

      ...ich hab' noch einen Koffer in Berlin...


Reply | Threaded
Open this post in threaded view
|

Re: Octave Interpreter

Richard Crozier


On 02/10/14 13:34, Stefan Seefeld wrote:
> On 2014-10-01 21:16, Bipin Mathew wrote:

<snip>

>
>>   All this being said, perhaps someone more familiar with the source
>> code could comment on how easy / impossible such a thing would be
>> given the current state of Octave? In terms of your direct question of
>> how you can build a compiler for Octave, I believe a good place to
>> start is to look at the lex/yacc grammar files:
>>
>> octave/libinterp/parse-tree/lex.ll
>> octave/libinterp/parse-tree/oct-parse.yy
>
> I'm honestly not entirely sure I understand the question: What
> additional support in terms of language support is needed to provide a
> "distributed Octave" ?

>      Stefan
>

Making 'parfor' loops work with actual parallel pools might be what is
meant:

http://www.mathworks.co.uk/help/distcomp/parfor.html

Richard

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


Reply | Threaded
Open this post in threaded view
|

Re: Octave Interpreter

Stefan Seefeld
Hi Richard,

On 2014-10-05 13:29, Richard Crozier wrote:

>
>
> On 02/10/14 13:34, Stefan Seefeld wrote:
>> I'm honestly not entirely sure I understand the question: What
>> additional support in terms of language support is needed to provide a
>> "distributed Octave" ?
>
> Making 'parfor' loops work with actual parallel pools might be what is
> meant:
>
> http://www.mathworks.co.uk/help/distcomp/parfor.html

I see. Still, it seems the frontend (language) changes are only minimal,
while quite a lot needs to be done in terms of backend work to support this.
Could anyone familiar with the Octave internals comment on what  it
would take to support parallel execution (in the sense of multiple
octave instances executing tasks dispatched from a single frontend
session (akin to the architecture described in
http://ipython.scipy.org/talks/0903_siamcse09_ipython_dist_bgranger.pdf,
slides 9-10) ? In other words, what does it take for Octave to support
MPI-style parallelism ?

        Stefan

--

      ...ich hab' noch einen Koffer in Berlin...


Reply | Threaded
Open this post in threaded view
|

Re: Octave Interpreter

c.-2

On 6 Oct 2014, at 01:27, Stefan Seefeld <[hidden email]> wrote:

> what does it take for Octave to support
> MPI-style parallelism ?

Octave language bindings for MPI are already implemented in the MPI
package on Octave forge.

What you seem to be looking at is adding the possibility
to spawn instances of Octave from an interactive session (i.e. an Octave
version of mpiexec), isn't it?

>        Stefan
c.

Reply | Threaded
Open this post in threaded view
|

Re: Octave Interpreter

Stefan Seefeld
On 2014-10-06 03:23, c. wrote:
> On 6 Oct 2014, at 01:27, Stefan Seefeld <[hidden email]> wrote:
>
>> what does it take for Octave to support
>> MPI-style parallelism ?
> Octave language bindings for MPI are already implemented in the MPI
> package on Octave forge.

Oh, good, I didn't know that.

> What you seem to be looking at is adding the possibility
> to spawn instances of Octave from an interactive session (i.e. an Octave
> version of mpiexec), isn't it?

Right. I think pure data parallelism as done with MPI isn't very
practical for an interpreter, as lots of commands typed interactively
aren't meant to be executed in parallel. On the other hand, spawning
other processes for a single parallel operation may not be practical
either. Perhaps something like a thread-pool pattern would work, where
processes are spawned upfront, but are kept in stand-by mode until a
parallel region is entered where data (and computation) is allocated to
them.

    Stefan

--

      ...ich hab' noch einen Koffer in Berlin...


Reply | Threaded
Open this post in threaded view
|

Re: Octave Interpreter

Stefan Seefeld
In reply to this post by Stefan Seefeld
Hi Mathew,

[I'm moving this back to the list, I assume you just accidentally hit
'reply' instead 'reply-all']


On 2014-10-06 00:07, Bipin Mathew wrote:

> Hello Richard, Stefan, et. al.
>
>     Just to get us thinking about what implementing a distributed
> computing engine for Octave would take, lets consider a simple example
> and discuss what might be required. Just to get it rolling, lets look at
>
> A = fft(X)
>
> where X is an instance of a dmat class (i.e. a "distributed matrix" ),
> in the sense that its contents are distributed across several servers.
> For the moment, i suppose we can consider fft to be an overloaded
> function specific to type dmat. Moreover, just for concreteness and
> simplicity lets think of X as a 2d matrix of size 200 rows by 10 columns.

OK.

>
> For a distributed implementation of fft, I reckon the first thing the
> top level fft function would need to know is "Where is X?". This could
> be gotten from a distributed data store of some kind, but suppose in
> the end we get something like this.
>
> X = { num_dimensions = 2
>     [1,100]  
>  ,[1,10],{server1/path_a1;server2/path_b1;...serverN/path_z1}
>     [101,200],[1,10],{server1/path_a2;server2/path_b2;serverN/path_z2}
>     [1,100]    ,[11,20],{server1/path_a3;server2/path_b3;serverN/path_z3}
>     [101,200] ,[11,20],{server1/path_a4;server2/path_b4;serverN/path_z4}
> }

OK.

>
> The first row for example says, all the elements contained withing the
> 1st -> 100th elements of the first dimension and the 1->10th elements
> of the second dimension can be found on
> servers/path  {server1/path_a1;server2/path_b1;...serverN/path_z1}
> (NFS? ). I also imagine instead of NFS paths, we can have identifiers
> to locations in memory of a slave process spawned on these servers.

Yes. I really think we should use standard APIs for this, rather than
inventing our own (API, transport protocol, etc.).
For example, it seems the existing MPI bindings I was just pointed to
would be a great starting point.
(Note that the pMatlab package itself also is layered over MPI Matlab
bindings.)

>
> The top level fft function would then spawn slave octave processes on
> the corresponding servers ( using a job handler ? ). The slaves would
> then load from disk and read into memory the the file they were
> assigned ( the file would be local to that slave. We should move
> computation not data ) and do a localized computation. In this case
> each slave does an FFT along each of the columns that are local to it
> and maintains that "sub"-fft in its memory.

Where these additional processes are spawned is an interesting question.
I doubt that an operation such as fft() would be the right place to do
it, though. For a proof-of-concept, I think we may as well consider the
MPI bindings, and just spawn octave itself via mpirun. Once this
mechanism is working, we could think of ways to delay the sub-process
spawning, to let users only run the normal single-node octave (acting as
a "controller"), and then spawn "engine" processes later from a new
"parallel_init()" function.

>
> The top-level fft function would then ask each slave process for its
> "sub"-fft and do the necessary computation locally to get the FFT of
> the complete vector OR, push the necessary multiplier matrix down to
> the slaves and have the slave multiply the sub-fft matrix by the
> multiplier matrix provided by the top-level fft function and viz-a-vie
> have a distributed matrix that represents the answer to the original FFT.

Yes, though, I think reinventing such an API is not a good idea. Rather,
we should use existing know-how, such as MPI.
In particular, the pMatlab package
(http://www.ll.mit.edu/mission/cybersec/softwaretools/pmatlab/pmatlab.html)
is layered over a Matlab MPI API
(http://www.ll.mit.edu/mission/cybersec/softwaretools/matlabmpi/matlabmpi.html),
so if we can manage to support an MPI API similar to that with the
Octave MPI package, we may get pMatlab support for free.

In other words, I see two main areas that need work:

* Assess the state of the Octave MPI bindings (and adding any additional
functionality required that's not yet covered)

* Consider ways to spawn sub-processes and set up MPI communicators
interactively


    Stefan

--

      ...ich hab' noch einen Koffer in Berlin...


Reply | Threaded
Open this post in threaded view
|

Re: Octave Interpreter

c.-2
In reply to this post by Stefan Seefeld

On 6 Oct 2014, at 14:07, Stefan Seefeld <[hidden email]> wrote:

> Right. I think pure data parallelism as done with MPI isn't very
> practical for an interpreter, as lots of commands typed interactively
> aren't meant to be executed in parallel. On the other hand, spawning
> other processes for a single parallel operation may not be practical
> either. Perhaps something like a thread-pool pattern would work, where
> processes are spawned upfront, but are kept in stand-by mode until a
> parallel region is entered where data (and computation) is allocated to
> them.

I have been interested in this project myself,
what I thought of is something like:

  [out1, out2] = mpirun (numproc, function, input1, input2, ...)

to run an MPI based Octave function from the Octave prompt.
Would that suit you as well?

>    Stefan
c.


Reply | Threaded
Open this post in threaded view
|

Re: Octave Interpreter

Stefan Seefeld
On 2014-10-06 08:28, c. wrote:

> On 6 Oct 2014, at 14:07, Stefan Seefeld <[hidden email]> wrote:
>
>> Right. I think pure data parallelism as done with MPI isn't very
>> practical for an interpreter, as lots of commands typed interactively
>> aren't meant to be executed in parallel. On the other hand, spawning
>> other processes for a single parallel operation may not be practical
>> either. Perhaps something like a thread-pool pattern would work, where
>> processes are spawned upfront, but are kept in stand-by mode until a
>> parallel region is entered where data (and computation) is allocated to
>> them.
> I have been interested in this project myself,
> what I thought of is something like:
>
>   [out1, out2] = mpirun (numproc, function, input1, input2, ...)
>
> to run an MPI based Octave function from the Octave prompt.
> Would that suit you as well?

That might be a good starting point, though I think ultimately we need
something more expressive and robust. I'm notably thinking of an
architecture for managing multiple "engine" processes that run octave
instances but that read their input not from an interactive tty, but
some socket. This would allow multiple engines to be orchestrated via an
MPI Communicator (doing data-parallel computation), or using some other
paradigm such as task-level parallelism.


I think the ipython project has very useful material for this:
http://ipython.org/ipython-doc/2/parallel/parallel_intro.html

    Stefan

--

      ...ich hab' noch einen Koffer in Berlin...


Reply | Threaded
Open this post in threaded view
|

Re: Octave Interpreter

c.-2

On 6 Oct 2014, at 14:39, Stefan Seefeld <[hidden email]> wrote:

> That might be a good starting point, though I think ultimately we need
> something more expressive and robust. I'm notably thinking of an
> architecture for managing multiple "engine" processes that run octave
> instances but that read their input not from an interactive tty, but
> some socket. This would allow multiple engines to be orchestrated via an
> MPI Communicator (doing data-parallel computation), or using some other
> paradigm such as task-level parallelism.

I'm not really sure I understand what you are talking about.

I'd just recommend, before you start implement anything new, you
look at options that are already available for parallel computing
and interprocess communication in Octave and Octave Forge, for example:

 http://octave.sourceforge.net/parallel/
 http://octave.sourceforge.net/sockets/
 http://octave.sourceforge.net/mpi

and if those packages do not have the features you are looking for,
you start by implementing those missing features that you personally need.

> I think the ipython project has very useful material for this:
> http://ipython.org/ipython-doc/2/parallel/parallel_intro.html

looks interesting but I'm not really sure how I would use that myself.
Personally I'd rather stick to standard APIs for parallel algorithms like OpenMP
and MPI if possible.

>    Stefan
c.
Reply | Threaded
Open this post in threaded view
|

Re: Octave Interpreter

Bipin Mathew
Hey Guys,

   I have a few questions about MPI and peoples ideas of the scope of this project more generally.

1. ) Is it possible, using MPI, to spawn processes on specific servers? The tutorials I have seen on line appear to indicate that you can specify the number of processes to spawn but not precisely where they will be spawned ( aside from the mpd.hosts file ), can this be set programatically?

2.) How is MPI's error handling and failure resilience? If a node fails mid-computation what happens? Or even pre-computation, is there an online way of knowing which nodes are available, or will MPI just not even attempt to launch on servers it perceives as down?

3.) Are we expecting to support persistence of distributed objects? This is important since "Big Data" is seldom ephemeral. People want to load their tables / data-cubes once. Of course we should also support a mechanism for temporary distributed objects for constructs like ifft(fft(X)) and for ad-hoc analysis / prototyping. Therefore I vote yes, that we should support persistent distributed objects, but just wanted to get other peoples views. This also motivates my thought that slave processes should launch as close to the data as possible a la. Hadoop. 

4.) What are peoples opinions on other transport layer technologies like Google's protocol buffers or Thrift?

Regards,

Bipin




On Mon, Oct 6, 2014 at 9:03 AM, c. <[hidden email]> wrote:

On 6 Oct 2014, at 14:39, Stefan Seefeld <[hidden email]> wrote:

> That might be a good starting point, though I think ultimately we need
> something more expressive and robust. I'm notably thinking of an
> architecture for managing multiple "engine" processes that run octave
> instances but that read their input not from an interactive tty, but
> some socket. This would allow multiple engines to be orchestrated via an
> MPI Communicator (doing data-parallel computation), or using some other
> paradigm such as task-level parallelism.

I'm not really sure I understand what you are talking about.

I'd just recommend, before you start implement anything new, you
look at options that are already available for parallel computing
and interprocess communication in Octave and Octave Forge, for example:

 http://octave.sourceforge.net/parallel/
 http://octave.sourceforge.net/sockets/
 http://octave.sourceforge.net/mpi

and if those packages do not have the features you are looking for,
you start by implementing those missing features that you personally need.

> I think the ipython project has very useful material for this:
> http://ipython.org/ipython-doc/2/parallel/parallel_intro.html

looks interesting but I'm not really sure how I would use that myself.
Personally I'd rather stick to standard APIs for parallel algorithms like OpenMP
and MPI if possible.

>    Stefan
c.

Reply | Threaded
Open this post in threaded view
|

Re: Octave Interpreter

Stefan Seefeld
On 2014-10-06 22:34, Bipin Mathew wrote:

> Hey Guys,
>
>    I have a few questions about MPI and peoples ideas of the scope of
> this project more generally.
>
> 1. ) Is it possible, using MPI, to spawn processes on specific
> servers? The tutorials I have seen on line appear to indicate that you
> can specify the number of processes to spawn but not precisely where
> they will be spawned ( aside from the mpd.hosts file ), can this be
> set programatically?

Technically, MPI doesn't itself support the spawning of processes.
Rather, a separate tool (typically called 'mpirun') is used to start N
instances of a given application, which then use the MPI API to connect
to each other and process data in parallel. Doing this spawning
programmatically from within a running "frontend" process would be very
useful, I believe, in particular in scripting contexts such as Octave or
Python, but that would have to happen outside the realm of MPI.

>
> 2.) How is MPI's error handling and failure resilience? If a node
> fails mid-computation what happens? Or even pre-computation, is there
> an online way of knowing which nodes are available, or will MPI just
> not even attempt to launch on servers it perceives as down?

Again, as MPI itself isn't doing the launching, checking node status
upfront is outside its scope. Also, given all the assumptions about
symmetry that are made as part of the SPMD (single program multiple
data) execution model, I wouldn't characterize this as failure
resilient. However, individual implementations may provide some means to
recover from errors, which however are fully outside the scope of MPI
itself. In other words, this is a Quality of Service issue, not
something inherently part of the MPI protocol. For example:
https://www.open-mpi.org/faq/?category=ft

>
> 3.) Are we expecting to support persistence of distributed objects?
> This is important since "Big Data" is seldom ephemeral. People want to
> load their tables / data-cubes once. Of course we should also support
> a mechanism for temporary distributed objects for constructs like
> ifft(fft(X)) and for ad-hoc analysis / prototyping. Therefore I vote
> yes, that we should support persistent distributed objects, but just
> wanted to get other peoples views. This also motivates my thought that
> slave processes should launch as close to the data as possible a la.
> Hadoop.

I think at least initially we should consider persistence and
distribution orthogonal. With pMATLAB (as well as OpenVSIP), a
distributed array can be accessed locally in terms of its local
sub-array (with appropriate mapping between local and global indices),
which can each use local I/O for persistence.


>
> 4.) What are peoples opinions on other transport layer technologies
> like Google's protocol buffers or Thrift?

I don't know those, but would really hope that - if we can manage to
define distributed arrays - we may well be able to fully abstract away
the underlying transport, so the specific choice becomes less important.
Users shouldn't have to deal with MPI or similar, they just use regular
access operators, and all required data movement is done behind the
scene (e.g. using the owner-computes rule).

For that reason, I suggest we pick an existing protocol that requires
minimal work to get off the ground (such as MPI), then focus on the
high-level API and semantics for distributed arrays. Once that's
established, other "backends" could be added if that turns out to be useful.


    Stefan


--

      ...ich hab' noch einen Koffer in Berlin...