Significant speed-up of datevec Classic List Threaded 6 messages Open this post in threaded view
|

Significant speed-up of datevec

 Attached is a patch to significantly speed up datevec() in the case that a string matrix or string cell is passed in.  The ~50 speedup comes simply from only deciphering the format string once instead of for every entry in the cell/string matrix.  Here is a test function: function DN = testfunc4()   date1 = '2008-12-16 22:12:00';   D = repmat(date1, 1000, 1);   tstart = cputime();   DN = datenum(D, 'yyyy-mm-dd HH:MM:SS');   cputime - tstart end and results, without the patch: octave:1> DN = testfunc4(); ans =  70.594 with the patch: octave:1> DN = testfunc4(); ans =  1.4648 There are probably other little things that could be done to save some time, but this gets the bulk of it. What I don't understand though, is that successively calling this test function results in worse performance after the first call: octave:2> DN = testfunc4(); ans =  2.1877 octave:3> DN = testfunc4(); ans =  2.2437 ... Any ideas?  (I check the usual suspects, like having to delete the return value, etc.) Dan --- datevec.m.orig 2008-12-10 02:35:30.000000000 -0600 +++ datevec.m 2008-12-17 22:09:53.784096244 -0600 @@ -108,9 +108,7 @@    endif      if (ischar (date)) -    t = date; -    date = cell (1); -    date{1} = t; +    date = cellstr (date);    endif      if (iscell (date)) @@ -123,7 +121,8 @@        for k = 1:nd          found = false;          for l = 1:nfmt -          [found y(k) m(k) d(k) h(k) mi(k) s(k)] = __date_str2vec__ (date{k}, std_formats{l}, p); +          [f, rY, ry, fy, fm, fd, fh, fmi, fs] = __date_vfmt2sfmt__ (std_formats{l}); +          [found y(k) m(k) d(k) h(k) mi(k) s(k)] = __date_str2vec__ (date{k}, p, f, rY, ry, fy, fm, fd, fh, fmi, fs);            if (found)              break;            endif @@ -133,8 +132,10 @@          endif        endfor      else +      % Decipher the format string just once for sake of speed. +      [f, rY, ry, fy, fm, fd, fh, fmi, fs] = __date_vfmt2sfmt__ (f);        for k = 1:nd -        [found y(k) m(k) d(k) h(k) mi(k) s(k)] = __date_str2vec__ (date{k}, f, p); +        [found y(k) m(k) d(k) h(k) mi(k) s(k)] = __date_str2vec__ (date{k}, p, f, rY, ry, fy, fm, fd, fh, fmi, fs);          if (! found)            error ("datevec: date not parsed correctly with given format");          endif @@ -183,7 +184,7 @@    ### endfunction   -function [found, y, m, d, h, mi, s] = __date_str2vec__ (ds, f, p) +function [f, rY, ry, fy, fm, fd, fh, fmi, fs] = __date_vfmt2sfmt__ (f)      # Play safe with percent signs    f = strrep(f, "%", "%%"); @@ -245,14 +246,28 @@    f = strrep (f, "MM", "%M");    f = strrep (f, "SS", "%S");   +  rY = rindex (f, "%Y"); +  ry = rindex (f, "%y"); + +  # check whether we need to give default values +  # possible error when string contains "%%" +  fy = rY || ry; +  fm = index (f, "%m") || index (f, "%b") || index (f, "%B"); +  fd = index (f, "%d") || index (f, "%a") || index (f, "%A"); +  fh = index (f, "%H") || index (f, "%I"); +  fmi = index (f, "%M"); +  fs = index (f, "%S"); + +### endfunction + +function [found, y, m, d, h, mi, s] = __date_str2vec__ (ds, p, f, rY, ry, fy, fm, fd, fh, fmi, fs) +    [tm, nc] = strptime (ds, f);   -  if (nc == length (ds) + 1) +  if (nc == size (ds, 2) + 1)      y = tm.year + 1900; m = tm.mon + 1; d = tm.mday;      h = tm.hour; mi = tm.min; s = tm.sec + tm.usec / 1e6;      found = true; -    rY = rindex (f, "%Y"); -    ry = rindex (f, "%y");      if (rY < ry)        if (y > 1999)          y -= 2000; @@ -264,14 +279,6 @@          y += 100;        endif      endif -    # check whether we need to give default values -    # possible error when string contains "%%" -    fy = rY || ry; -    fm = index (f, "%m") || index (f, "%b") || index (f, "%B"); -    fd = index (f, "%d") || index (f, "%a") || index (f, "%A"); -    fh = index (f, "%H") || index (f, "%I"); -    fmi = index (f, "%M"); -    fs = index (f, "%S");      if (! fy && ! fm && ! fd)        tvm = localtime (time ());  ## tvm: this very moment        y = tvm.year + 1900;
Open this post in threaded view
|

Re: Significant speed-up of datevec

 On Thu, Dec 18, 2008 at 5:33 AM, Daniel J Sebald <[hidden email]> wrote: > Attached is a patch to significantly speed up datevec() in the case that a > string matrix or string cell is passed in.  The ~50 speedup comes simply > from only deciphering the format string once instead of for every entry in > the cell/string matrix.  Here is a test function: > > function DN = testfunc4() >  date1 = '2008-12-16 22:12:00'; >  D = repmat(date1, 1000, 1); >  tstart = cputime(); >  DN = datenum(D, 'yyyy-mm-dd HH:MM:SS'); >  cputime - tstart > end > > and results, without the patch: > > octave:1> DN = testfunc4(); > ans =  70.594 > > with the patch: > > octave:1> DN = testfunc4(); > ans =  1.4648 > > There are probably other little things that could be done to save some time, > but this gets the bulk of it. > > What I don't understand though, is that successively calling this test > function results in worse performance after the first call: > > octave:2> DN = testfunc4(); > ans =  2.1877 > octave:3> DN = testfunc4(); > ans =  2.2437 > ... > > Any ideas?  (I check the usual suspects, like having to delete the return > value, etc.) > Not really. I have, however, noted that "clear all" prevents the effect, so I suspect some problem with persistent variables. But it's definitely worth further investigation. > Dan > I applied this patch. If you intend to make more than a few contributions to Octave, please consider making yourself familiar with Mercurial and then using it to create, organize and send patches. thanks -- RNDr. Jaroslav Hajek computing expert Aeronautical Research and Test Institute (VZLU) Prague, Czech Republic url: www.highegg.matfyz.cz
Open this post in threaded view
|

Re: Significant speed-up of datevec

 Administrator On 18-Dec-2008, Jaroslav Hajek wrote: | I applied this patch. If you intend to make more than a few | contributions to Octave, please consider making yourself familiar with | Mercurial and then using it to create, organize and send patches. These latest two changes should have ChangeLog entries. Also, what is the purpose of this change in unique.m?   -  ## I don't know why anyone would need reverse indices, but it   -  ## was an interesting challenge.  I welcome cleaner solutions.      if (nargout >= 3)        j = i;   -    j(i) = cumsum (prepad (! match, n, 1));   +    j(i) = cumsum ([1 !match]); Also, if it is not already in the the coding standards, I think we should add something saying that elements in matrices, cell arrays, and lists of values returned from functions should be separated by commas (or semicolons) not just whitespace. jwe
Open this post in threaded view
|

Re: Significant speed-up of datevec

 On Thu, Dec 18, 2008 at 8:55 PM, John W. Eaton <[hidden email]> wrote: > On 18-Dec-2008, Jaroslav Hajek wrote: > > | I applied this patch. If you intend to make more than a few > | contributions to Octave, please consider making yourself familiar with > | Mercurial and then using it to create, organize and send patches. > > These latest two changes should have ChangeLog entries. > Uh, sorry, I forgot about those. I'll add them. > Also, what is the purpose of this change in unique.m? > >  -  ## I don't know why anyone would need reverse indices, but it >  -  ## was an interesting challenge.  I welcome cleaner solutions. >     if (nargout >= 3) >       j = i; >  -    j(i) = cumsum (prepad (! match, n, 1)); >  +    j(i) = cumsum ([1 !match]); > I think match is always of length n-1 here, isn't it? If yes, then the latter seems clearer to me, and likely also faster. > > Also, if it is not already in the the coding standards, I think we > should add something saying that elements in matrices, cell arrays, > and lists of values returned from functions should be separated by > commas (or semicolons) not just whitespace. > OK. > jwe > -- RNDr. Jaroslav Hajek computing expert Aeronautical Research and Test Institute (VZLU) Prague, Czech Republic url: www.highegg.matfyz.cz