Is fprintf slow?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Is fprintf slow?

Ted.Harding
Is fprintf in Octave intrinsically slow? I use it to create datafiles
as an interface to Plotmtv, and it seems to run very slow. For instance,
a datafile of 190214 bytes took 433 seconds to write to disk
(439 bytes/second). This strikes me as unexpectedly slow. (CPU is
a 386-DX/25MHz).

Here is the nub of the script which I use to write a matrix X with R rows
and C columns of numbers to an ASCII file FIL in format FMT
with space-separated numbers:

    S=size(X); R=S(1); C=S(2);
    fopen(FIL,"a");
    for i=1:R
      for j=1:C
        fprintf(FIL,FMT,X(i,j)), fprintf(FIL,' '), end, fprintf(FIL,'\n')
    end

FMT is a string (like "%8.5g") for a single number. Maybe there is
some point in initially programming the build-up of a more complex
format string to cope with one entire row of X at a time (fprintf
needs a format specifier for each separate number, and the size of X
is not known beforehand). But any advice or comment would be welcome.
The above timing was for a 10,000 x 2 matrix with FMT="%8.5g", so
I can't see this idea speeding things up by a factor of more than 2-3.
I'm more interested in factors of the order of 50-100!

Here is comparison with timings for "save" in the various builtin formats:
written as above: 433    secs
save -text:        48.6  secs
save -binary:       0.15 secs
save -mat-binary:   0.11 secs

Maybe I could do a binary dump and use a custom C program to write the
ASCII file. In that case where can I find out Octave's binary format?
(and/or Matlab's ... ?).
(However, this would be very inconvenient for the present purpose,
since a lot also depends on interpolating plot-control commands
into the data: this would become very tricky using an external
program to write the Plotmtv data file.)

Ted.                                     ([hidden email])

Reply | Threaded
Open this post in threaded view
|

Re: Is fprintf slow?

Vinayak Dutt

#Is fprintf in Octave intrinsically slow? I use it to create datafiles
#as an interface to Plotmtv, and it seems to run very slow. For instance,
#a datafile of 190214 bytes took 433 seconds to write to disk
#(439 bytes/second). This strikes me as unexpectedly slow. (CPU is
#a 386-DX/25MHz).
#

I also have seen the same. I think the C style standard I/O library is pretty slow.
I have some .m files to read text files (which are not in octave format) using fscanf()
and I get real slow reading as compared to load() function.


--vinayak-
/*
 * vinayak dutt
 * graduate student, ultrasound research
 * mayo graduate school, rochester mn
 *
 * e-mail: [hidden email]
 *         [hidden email]
 *
 */
#include "disclaimer.h"

Reply | Threaded
Open this post in threaded view
|

Re: Is fprintf slow?

John Eaton-4
In reply to this post by Ted.Harding
[hidden email] (Ted Harding) wrote:

: Is fprintf in Octave intrinsically slow?

[...]

: Here is the nub of the script which I use to write a matrix X with R rows
: and C columns of numbers to an ASCII file FIL in format FMT
: with space-separated numbers:
:
:     S=size(X); R=S(1); C=S(2);

You can write this a bit more compactly as:  [R, C] = size (X);

:     fopen(FIL,"a");
:     for i=1:R
:       for j=1:C
:         fprintf(FIL,FMT,X(i,j)), fprintf(FIL,' '), end, fprintf(FIL,'\n')
:     end

I suspect that the nested for loop accounts for a significant amount
of the time.

: The above timing was for a 10,000 x 2 matrix with FMT="%8.5g", so

[...]

: I'm more interested in factors of the order of 50-100!
:
: Here is comparison with timings for "save" in the various builtin formats:
: written as above: 433    secs
: save -text:        48.6  secs
: save -binary:       0.15 secs
: save -mat-binary:   0.11 secs

Using save -binary or -mat-binary results in a single call to write()
to dump the 160k in your matrix, so that is very fast.

Using save -ascii (did you really use -text?) results in a loop like

  for (int i = 0; i < a.rows (); i++)
    {
      for (int j = 0; j < a.cols (); j++)
        os << " " << a.elem (i, j);
      os << "\n";
    }
  return os;

I am surprised that this takes 43 seconds on your system.  On my
SPARCstation 2, it only takes about 5.  It takes much less if I set
the variable save_precision to 5 instead of the default 17.

On the same machine, your for loop solution takes about 120 seconds.

I think what is really needed is a way to convert an entire matrix
using a single call to fprintf.  Matlab does this by repeatedly
applying the format string until there is no more data.  All elements
of a matrix argument are converted in column major order.  For example,

  a = rand (3, 2); b = [1, 2; 3, 4; 5, 6];
  fprintf ("%10f  %10e  %10g\n", a);

produces

  0.328234  6.326386e-01     0.75641
  0.991037  3.653387e-01    0.247039
  1.000000  3.000000e+00           5
  2.000000  4.000000e+00           6

Given the current implementation, it would be fairly difficult to make
Octave's fprintf work this way because it is designed to match the
format specifiers with the arguments and not with the matrix elements
that they refer to.

One possiblity that that would be easy to implement and might cover
many uses for something like this would be to introduce some new
format specifiers that work on matrices.  For example,

  printf (" %8.5G", m)

might print all elements of the matrix m with the format " %8.5g",
separating rows with new lines.

However, this would have the limitation of not allowing you to specify
different formats for different columns in a matrix.

Thanks,

jwe


Reply | Threaded
Open this post in threaded view
|

Re: Is fprintf slow?

Ted.Harding
> I am surprised that this takes 43 seconds on your system.  On my
It's a slow Joe, John!

> SPARCstation 2, it only takes about 5.  It takes much less if I set
> the variable save_precision to 5 instead of the default 17.
>
> One possiblity that that would be easy to implement and might cover
> many uses for something like this would be to introduce some new
> format specifiers that work on matrices.  For example,
>
>   printf (" %8.5G", m)
>
> might print all elements of the matrix m with the format " %8.5g",
> separating rows with new lines.
>
> However, this would have the limitation of not allowing you to specify
> different formats for different columns in a matrix.
>

Granted the limitation, this would be very useful for this case, which
very commonly occurs. I doubt the limitation would be a /systematic/
problem because if you intend to apply different formats to different
columns you are likely to know how many columns there are and can
write a format string for the entire row.

The most likely exception to what I just said is the case of repetition
of formats across the column (e.g. for integer, real, integer, real, ... )

Maybe it's worth thinking about a FORTRAN-style repetition of
format pattern, like

printf("  %4.0f %7.4f", M)d i

where the pattern would be applied across the matrix M, two columns at a
time in this case, until (as they used to say) "the DO is exhausted".
Maybe if there's a logical trap in this, perhaps a special format
character %& (say) meaning "this format string is repeatable" can be
introduced, so the above would be

printf("%&  %4.0f %7.4f", M).

Before you know what's happening, people will be asking for

printf("%2.0f %2.0f %{%&  %4.0f %7.4f%}", M)

(also FORTRANnish, if I remember right).

Thanks for the reply.
Ted.                                     ([hidden email])