I finally got one of the profiling tools to work well enough to identify
some of the hotspots. I ended up using the Linux Kernel tool 'perf'.
perf record -g -p <PID>
When running the benchmark bm.toeplitz.orig.m, I find that tree_evaluator
and tree_index_expression::lvalue seem to be time consuming routines.
Children Self Samples Command Shared
- 81.31% 0.27% 160 QThread
liboctinterp.so.7.0.0 [.] octave::tree_evaluator::visit_simple_assignment
If, instead of a callgraph, I look directly at which functions are
consuming the most time it does seem that there is a lot of time spent
allocating/freeing memory and creating/destroying class objects.