Significant speed-up of datevec

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Significant speed-up of datevec

Daniel Sebald
Attached is a patch to significantly speed up datevec() in the case that a string matrix or string cell is passed in.  The ~50 speedup comes simply from only deciphering the format string once instead of for every entry in the cell/string matrix.  Here is a test function:

function DN = testfunc4()
  date1 = '2008-12-16 22:12:00';
  D = repmat(date1, 1000, 1);
  tstart = cputime();
  DN = datenum(D, 'yyyy-mm-dd HH:MM:SS');
  cputime - tstart
end

and results, without the patch:

octave:1> DN = testfunc4();
ans =  70.594

with the patch:

octave:1> DN = testfunc4();
ans =  1.4648

There are probably other little things that could be done to save some time, but this gets the bulk of it.

What I don't understand though, is that successively calling this test function results in worse performance after the first call:

octave:2> DN = testfunc4();
ans =  2.1877
octave:3> DN = testfunc4();
ans =  2.2437
...

Any ideas?  (I check the usual suspects, like having to delete the return value, etc.)

Dan

--- datevec.m.orig 2008-12-10 02:35:30.000000000 -0600
+++ datevec.m 2008-12-17 22:09:53.784096244 -0600
@@ -108,9 +108,7 @@
   endif
 
   if (ischar (date))
-    t = date;
-    date = cell (1);
-    date{1} = t;
+    date = cellstr (date);
   endif
 
   if (iscell (date))
@@ -123,7 +121,8 @@
       for k = 1:nd
         found = false;
         for l = 1:nfmt
-          [found y(k) m(k) d(k) h(k) mi(k) s(k)] = __date_str2vec__ (date{k}, std_formats{l}, p);
+          [f, rY, ry, fy, fm, fd, fh, fmi, fs] = __date_vfmt2sfmt__ (std_formats{l});
+          [found y(k) m(k) d(k) h(k) mi(k) s(k)] = __date_str2vec__ (date{k}, p, f, rY, ry, fy, fm, fd, fh, fmi, fs);
           if (found)
             break;
           endif
@@ -133,8 +132,10 @@
         endif
       endfor
     else
+      % Decipher the format string just once for sake of speed.
+      [f, rY, ry, fy, fm, fd, fh, fmi, fs] = __date_vfmt2sfmt__ (f);
       for k = 1:nd
-        [found y(k) m(k) d(k) h(k) mi(k) s(k)] = __date_str2vec__ (date{k}, f, p);
+        [found y(k) m(k) d(k) h(k) mi(k) s(k)] = __date_str2vec__ (date{k}, p, f, rY, ry, fy, fm, fd, fh, fmi, fs);
         if (! found)
           error ("datevec: date not parsed correctly with given format");
         endif
@@ -183,7 +184,7 @@
 
 ### endfunction
 
-function [found, y, m, d, h, mi, s] = __date_str2vec__ (ds, f, p)
+function [f, rY, ry, fy, fm, fd, fh, fmi, fs] = __date_vfmt2sfmt__ (f)
 
   # Play safe with percent signs
   f = strrep(f, "%", "%%");
@@ -245,14 +246,28 @@
   f = strrep (f, "MM", "%M");
   f = strrep (f, "SS", "%S");
 
+  rY = rindex (f, "%Y");
+  ry = rindex (f, "%y");
+
+  # check whether we need to give default values
+  # possible error when string contains "%%"
+  fy = rY || ry;
+  fm = index (f, "%m") || index (f, "%b") || index (f, "%B");
+  fd = index (f, "%d") || index (f, "%a") || index (f, "%A");
+  fh = index (f, "%H") || index (f, "%I");
+  fmi = index (f, "%M");
+  fs = index (f, "%S");
+
+### endfunction
+
+function [found, y, m, d, h, mi, s] = __date_str2vec__ (ds, p, f, rY, ry, fy, fm, fd, fh, fmi, fs)
+
   [tm, nc] = strptime (ds, f);
 
-  if (nc == length (ds) + 1)
+  if (nc == size (ds, 2) + 1)
     y = tm.year + 1900; m = tm.mon + 1; d = tm.mday;
     h = tm.hour; mi = tm.min; s = tm.sec + tm.usec / 1e6;
     found = true;
-    rY = rindex (f, "%Y");
-    ry = rindex (f, "%y");
     if (rY < ry)
       if (y > 1999)
         y -= 2000;
@@ -264,14 +279,6 @@
         y += 100;
       endif
     endif
-    # check whether we need to give default values
-    # possible error when string contains "%%"
-    fy = rY || ry;
-    fm = index (f, "%m") || index (f, "%b") || index (f, "%B");
-    fd = index (f, "%d") || index (f, "%a") || index (f, "%A");
-    fh = index (f, "%H") || index (f, "%I");
-    fmi = index (f, "%M");
-    fs = index (f, "%S");
     if (! fy && ! fm && ! fd)
       tvm = localtime (time ());  ## tvm: this very moment
       y = tvm.year + 1900;
Reply | Threaded
Open this post in threaded view
|

Re: Significant speed-up of datevec

Jaroslav Hajek-2
On Thu, Dec 18, 2008 at 5:33 AM, Daniel J Sebald <[hidden email]> wrote:

> Attached is a patch to significantly speed up datevec() in the case that a
> string matrix or string cell is passed in.  The ~50 speedup comes simply
> from only deciphering the format string once instead of for every entry in
> the cell/string matrix.  Here is a test function:
>
> function DN = testfunc4()
>  date1 = '2008-12-16 22:12:00';
>  D = repmat(date1, 1000, 1);
>  tstart = cputime();
>  DN = datenum(D, 'yyyy-mm-dd HH:MM:SS');
>  cputime - tstart
> end
>
> and results, without the patch:
>
> octave:1> DN = testfunc4();
> ans =  70.594
>
> with the patch:
>
> octave:1> DN = testfunc4();
> ans =  1.4648
>
> There are probably other little things that could be done to save some time,
> but this gets the bulk of it.
>
> What I don't understand though, is that successively calling this test
> function results in worse performance after the first call:
>
> octave:2> DN = testfunc4();
> ans =  2.1877
> octave:3> DN = testfunc4();
> ans =  2.2437
> ...
>
> Any ideas?  (I check the usual suspects, like having to delete the return
> value, etc.)
>

Not really. I have, however, noted that "clear all" prevents the
effect, so I suspect some problem with persistent variables. But it's
definitely worth further investigation.

> Dan
>

I applied this patch. If you intend to make more than a few
contributions to Octave, please consider making yourself familiar with
Mercurial and then using it to create, organize and send patches.

thanks

--
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz
Reply | Threaded
Open this post in threaded view
|

Re: Significant speed-up of datevec

John W. Eaton
Administrator
On 18-Dec-2008, Jaroslav Hajek wrote:

| I applied this patch. If you intend to make more than a few
| contributions to Octave, please consider making yourself familiar with
| Mercurial and then using it to create, organize and send patches.

These latest two changes should have ChangeLog entries.

Also, what is the purpose of this change in unique.m?

  -  ## I don't know why anyone would need reverse indices, but it
  -  ## was an interesting challenge.  I welcome cleaner solutions.
     if (nargout >= 3)
       j = i;
  -    j(i) = cumsum (prepad (! match, n, 1));
  +    j(i) = cumsum ([1 !match]);


Also, if it is not already in the the coding standards, I think we
should add something saying that elements in matrices, cell arrays,
and lists of values returned from functions should be separated by
commas (or semicolons) not just whitespace.

jwe
Reply | Threaded
Open this post in threaded view
|

Re: Significant speed-up of datevec

Jaroslav Hajek-2
On Thu, Dec 18, 2008 at 8:55 PM, John W. Eaton <[hidden email]> wrote:
> On 18-Dec-2008, Jaroslav Hajek wrote:
>
> | I applied this patch. If you intend to make more than a few
> | contributions to Octave, please consider making yourself familiar with
> | Mercurial and then using it to create, organize and send patches.
>
> These latest two changes should have ChangeLog entries.
>

Uh, sorry, I forgot about those. I'll add them.

> Also, what is the purpose of this change in unique.m?
>
>  -  ## I don't know why anyone would need reverse indices, but it
>  -  ## was an interesting challenge.  I welcome cleaner solutions.
>     if (nargout >= 3)
>       j = i;
>  -    j(i) = cumsum (prepad (! match, n, 1));
>  +    j(i) = cumsum ([1 !match]);
>

I think match is always of length n-1 here, isn't it? If yes, then the
latter seems clearer to me, and likely also faster.

>
> Also, if it is not already in the the coding standards, I think we
> should add something saying that elements in matrices, cell arrays,
> and lists of values returned from functions should be separated by
> commas (or semicolons) not just whitespace.
>

OK.


> jwe
>



--
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz
Reply | Threaded
Open this post in threaded view
|

Re: Significant speed-up of datevec

Daniel Sebald
In reply to this post by Jaroslav Hajek-2
Jaroslav Hajek wrote:

>>What I don't understand though, is that successively calling this test
>>function results in worse performance after the first call:
>>
>>octave:2> DN = testfunc4();
>>ans =  2.1877
>>octave:3> DN = testfunc4();
>>ans =  2.2437
>>...
>>
>>Any ideas?  (I check the usual suspects, like having to delete the return
>>value, etc.)
>>
>
>
> Not really. I have, however, noted that "clear all" prevents the
> effect, so I suspect some problem with persistent variables. But it's
> definitely worth further investigation.

Sure enough.  I didn't notice that.

> I applied this patch.

Thanks.

> If you intend to make more than a few
> contributions to Octave, please consider making yourself familiar with
> Mercurial and then using it to create, organize and send patches.

I'm learning; takes a while.

Dan
Reply | Threaded
Open this post in threaded view
|

Re: Significant speed-up of datevec

Daniel Sebald
In reply to this post by John W. Eaton
John W. Eaton wrote:

> On 18-Dec-2008, Jaroslav Hajek wrote:
>
> | I applied this patch. If you intend to make more than a few
> | contributions to Octave, please consider making yourself familiar with
> | Mercurial and then using it to create, organize and send patches.
>
> These latest two changes should have ChangeLog entries.
>
> Also, what is the purpose of this change in unique.m?
>
>   -  ## I don't know why anyone would need reverse indices, but it
>   -  ## was an interesting challenge.  I welcome cleaner solutions.
>      if (nargout >= 3)
>        j = i;
>   -    j(i) = cumsum (prepad (! match, n, 1));
>   +    j(i) = cumsum ([1 !match]);

Simply to avoid calling an additional script file.  In this case we know
the array is one element short.  Here's a comparison of before/after
with that simple test function I sent.

octave:4> testfunc3();
ans =  12.332
octave:5> clear all
octave:6> testfunc3();
ans =  6.0351
octave:7>


> Also, if it is not already in the the coding standards, I think we
> should add something saying that elements in matrices, cell arrays,
> and lists of values returned from functions should be separated by
> commas (or semicolons) not just whitespace.

OK.  I see Jaroslav has it cover; thanks.

Dan