Hi all,
I was trying to improve the function hilbert_curve.m in the miscellaneous package[1], which is recursive. I found a non-recursive algorithm by Jonas Lundgren using complex numbers. My implementation is here[2]. However when I time these to functions I get recursive (input arg = 2^9, i.e. 9 levels of recursion): 1.7e-3 +- 0.3e-3 s nonrecursive (without pre-allocation, input args = 9, false): 0.6e-2 +- 0.1e-2 s nonrecursive (with pre-allocation, input args = 9, true): 1.4e-2 +- 0.2e-2 s These results go against rule-of-thumb: avoid recursion and pre-allocate. Any ideas what's the source of the time cost in the non-recursive and pre-allocated versions? The octave profiler is not helping here, because it shows the time spent on the operations (about 10% of self time), but not on slicing, which I assume is the culprit here. [1]: https://sourceforge.net/p/octave/miscellaneous/ci/default/tree/inst/hilbert_curve.m [2]: https://bitbucket.org/KaKiLa/mymfiles/src/tip/hilbert_curve_nr.m -- JuanPi Carbajal https://goo.gl/ayiJzi Public GnuPG key: 9C5B72BF ----- "Why is thought, being a secretion of the brain, more wonderful than gravity, a property of matter?" - C. Darwin _______________________________________________ Help-octave mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-octave |
Juan, I tried both of your algorithms and have no deeper insight into the Hilbert curves, but I think your algorithm computes "more" than you expected. For n = 2^2 I get using [1] 16 (x,y) coordinates and using [2] 256 complex coordinates. Plotting both images shows me that [2] created a much higher order Hilbert curve. HTH, Kai |
> I tried both of your algorithms and have no deeper insight into the Hilbert
> curves, but I think your algorithm computes "more" than you expected. For n > = 2^2 I get using [1] 16 (x,y) coordinates and using [2] 256 complex > coordinates. Plotting both images shows me that [2] created a much higher > order Hilbert curve. Yes, one algorithm takes the size of the matrix the other just the order for the recursive one you need to pass 2^n, for the non-recursive ones just n. _______________________________________________ Help-octave mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-octave |
> Yes, one algorithm takes the size of the matrix the other just the order
> for the recursive one you need to pass 2^n, for the non-recursive ones just n. I am sorry, this wasn't clear from my post. _______________________________________________ Help-octave mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-octave |
No problem, that problem would have been too obvious ;-) Now I can reproduce your problem. My next thought was, that the complex computation lowers the performance and replaced all complex computations by meaningless pure real ones. Still this version is by factor 2 slower than the recursive one and with memory allocation by factor 10! So my guess is, that recursion for this example is faster than repetitive memory allocation within for-loops or the many data copy operations between z and zz, where also growing implicit memory allocations might happen as zz changes it's size depending on z(1:nz(k)) within a for-loop? Kai |
> My next thought was, that the complex computation lowers the
> performance and replaced all complex computations by meaningless pure real > ones. What complex-to-real conversion do you refer to? The non-recursive is only complex. >Still this version is by factor 2 slower than the recursive one and > with memory allocation by factor 10! So my guess is, that recursion for this > example is faster than repetitive memory allocation within for-loops or the > many data copy operations between z and zz, where also growing implicit > memory allocations might happen as zz changes it's size depending on > z(1:nz(k)) within a for-loop? The non-recursive version with second input argument true, has the same memory allocation as the recursive one (I refer to the final output). I did some changes to address your comment about copies. Indeed I get a considerable improvement if I avoid zz. Check the latest version[1]. I get (I am in a different PC now): for i=1:100; tic; hilbert_curve_nr(9,true); t(i)=toc; end; [mean(t) std(t)] ans = 0.0088032 0.0024393 Still this is slower than without allocation for i=1:100; tic; hilbert_curve_nr(9,false); t(i)=toc; end; [mean(t) std(t)] ans = 0.0065030 0.0018694 and both are slower than the recursion (using dev version of miscellaneous [2]) for i=1:100; tic; hilbert_curve(2^9); t(i)=toc; end; [mean(t) std(t)] ans = 1.9476e-03 8.0688e-04 The results from the profiler do not add up to the total time of the non-recursive version. I take this as an indication that the time is spent on the slicing and re-writing of the z vector (not reported by the profiler), i.e. at each iteration the first 4^i elements are re-written. How can this re-writing could be faster? Is this something to improve in core octave? (sadly I have not access to matlab to check) [1]: https://bitbucket.org/KaKiLa/mymfiles/src/tip/hilbert_curve_nr.m [2]: https://sourceforge.net/p/octave/miscellaneous/ci/default/tree/ _______________________________________________ Help-octave mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-octave |
Le 07/08/2017 à 11:07, Juan Pablo Carbajal a écrit :
> for i=1:100; tic; hilbert_curve_nr(9,true); t(i)=toc; end; [mean(t) std(t)] > ans = > > 0.0088032 0.0024393 > > Still this is slower than without allocation > > for i=1:100; tic; hilbert_curve_nr(9,false); t(i)=toc; end; [mean(t) std(t)] > ans = > > 0.0065030 0.0018694 > > and both are slower than the recursion (using dev version of miscellaneous [2]) > > for i=1:100; tic; hilbert_curve(2^9); t(i)=toc; end; [mean(t) std(t)] > ans = > > 1.9476e-03 8.0688e-04 > > [...] > > (sadly I have not access to matlab to check) Octave 4.0.1 7.34 ms [0.11 ms] 4.73 ms [0.06 ms] 1.27 ms [0.03 ms] Octave 4.3.0+ 7.61 ms [0.16 ms] 4.82 ms [0.05 ms] 1.42 ms [0.03 ms] Matlab R2016b 8.65 ms [0.58 ms] 2.98 ms [0.10 ms] 1.30 ms [0.08 ms] _______________________________________________ Help-octave mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-octave |
Same figures here except for nuances using Octave 4.3.0+ and Matlab R2017a as Juan and jbect got. |
>> Octave 4.0.1
>> >> 7.34 ms [0.11 ms] >> 4.73 ms [0.06 ms] >> 1.27 ms [0.03 ms] >> >> Octave 4.3.0+ >> >> 7.61 ms [0.16 ms] >> 4.82 ms [0.05 ms] >> 1.42 ms [0.03 ms] >> >> Matlab R2016b >> >> 8.65 ms [0.58 ms] >> 2.98 ms [0.10 ms] >> 1.30 ms [0.08 ms] > > Same figures here except for nuances using Octave 4.3.0+ and Matlab R2017a > as Juan and jbect got. Thanks jbect and Kai. The lesson learned seems to be that recursion can beat loops and pre-allocation. It seems matlab is even worse than Octave at slicing and asignation (assuming second line is the iterations without allocation). Does anybody know how is this possible? Why is re-writting an array's values more expensive than extending the array? I guess we will stick to the recursion then, and maybe to move it to C++, hopping this will further reduce the cost. _______________________________________________ Help-octave mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-octave |
Wow, I tried
z = complex (zeros (nz(end),order+1)); for k = 1:order idx = 1:nz(k); w = i * conj (z(idx,k)); z(1:nz(k+1),k+1) = [w-a; z(idx,k)-b; z(idx,k)+a; -w+b] / 2; endfor that is I use the columns of the matrix to store the intermediate results and it is even more expensive! 24 ms [3 ms] Any clues who's the culprit? Is it the assignation? _______________________________________________ Help-octave mailing list [hidden email] https://lists.gnu.org/mailman/listinfo/help-octave |
Free forum by Nabble | Edit this page |