
jwe,
The executive summary is that there seems to be a slowdown in the
performance of regular, and possibly indexed, assignment.
Starting with base script bm.toeplitz.orig.m
 Code bm.toeplitz.orig.m 
runs = 5;
cumulate = 0; b = 0;
for i = 1:runs
b = zeros (620, 620);
tic;
for j = 1:620
for k = 1:620
b(k,j) = abs (j  k) + 1;
end
end
timing = toc;
cumulate = cumulate + timing;
end
## Total time
cumulate
 End Code 

First test, remove any loop body.
 Code bm.no_loop_body.m 
...
for j = 1:620
for k = 1:620
## No loop body
end
end
...
 End Code 
Results: no loop body:
3.2.4 : .10605
6.0.0 : .15496
Comments: no loop body:
A 50% slowdown, but at 50 milliseconds, not significant compared to the 7
seconds seen for the original script.

Test case for straight assignment.
 Code bm.assign.m 
...
for j = 1:620
for k = 1:620
z = 13;
end
end
...
 End Code 
Results: assignment:
3.2.4 : 0.17247
6.0.0 : 0.96006
Comments: no loop body:
A 5.6X slowdown between versions. This seems quite significant.

Test case for assignment using 1 index.
 Code bm.1idx_assign.m 
...
for j = 1:620
for k = 1:620
b(k) = 13;
end
end
...
 End Code 
Results: 1index assignment:
3.2.4 : 1.3076
6.0.0 : 3.3534
Comments: 1index assignment:
A 2.6X slowdown between versions, and the absolute magnitude at 2 seconds
is significant. Note that even in 3.2.4 the stepup from scalar assignment
to matrix assignment is 7.6X.

Test case for assignment using 2 indexes.
 Code bm.idx_assign.m 
...
for j = 1:620
for k = 1:620
b(k,j) = 13;
end
end
...
 End Code 
Results: 2index assignment:
3.2.4 : 2.2823
6.0.0 : 4.9126
Comments: 2index assignment:
A 2.15X slowdown between versions, and the absolute magnitude is verging on
3 seconds which is significant. This is also 1 second slower than the
1index case for 3.2.4. It would be worthwhile to check whether
performance is scaling linearly with number of indices (such as 3D and 4D
arrays). This is also the baseline I use for further comparisons.

Test case for single loop versus nested loops.
 Code bm.1loop.m 
...
for j = 1:(620*620)
b(j) = 13;
end
...
 End Code 
Results: single loop:
3.2.4 : 1.3396
6.0.0 : 3.3557
Comments: single loop:
This is nearly identical to the results for 1index assigment. As such, it
doesn't appear that loops are the problem. This is also corroborated by
the first result where taking out the loop body proves that the loops run
fast by themselves.

Test case for arithmetic expression.
 Code bm.arithmetic_op.m 
...
for j = 1:620
for k = 1:620
b(k,j) = k + 1;
end
end
...
 End Code 
Results: arithmetic op:
3.2.4 : 2.8724
6.0.0 : 5.9112
Comments: arithmetic op:
This is quite close to the results for 2index assigment. In fact, the
slowdown is 2.06X versus the 2.15X seen for 2index assignment. So it is
likely that all of the variance is due to the issues with assignment.

Test case for function called with constant value.
 Code bm.fcn_const_val.m 
...
for j = 1:620
for k = 1:620
b(k,j) = abs (13);
end
end
...
 End Code 
Results: function w/constant value:
3.2.4 : 4.8345
6.0.0 : 10.811
Comments: function w/constant value:
Slowdown is 2.24X. The slowdown of 2.15X for 2index assignment would
explain 2.15/2.24 = 96% of this.

Test case for function called with 1 lookup.
 Code bm.fcn_1lookup.m 
...
for j = 1:620
for k = 1:620
b(k,j) = abs (13);
end
end
...
 End Code 
Results: function w/1 lookup:
3.2.4 : 4.9847
6.0.0 : 10.979
Comments: function w/1 lookup:
Slowdown is 2.20X. The slowdown of 2.15X for 2index assignment would
explain 98% of this.
Rik
