Sandy Bridge

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Sandy Bridge

Graaf_van_Vlaanderen
I was just wondering, how well is GNU Octave compiled to recent CPU architectures?
When running the script below, my CPU simply remains running at the lowest speed: 1.6GHz.
In fact 'turbo mode' should kick in and the CPU should run at 3.8GHz, but it doesn't.
When I trigger the CPU with another program, the script runs twice as fast.
Is there a solution for this problem?

CPU: i7-2600k
Linux Kernel: 3.5.0-17
Gnu Octave: 3.6.4-rc0


runs=3;
% (5)
cumulate = 0; p = 0; vt = 0; vr = 0; vrt = 0; rvt = 0; RV = 0; j = 0; k = 0;
x2 = 0; R = 0; Rxx = 0; Ryy = 0; Rxy = 0; Ryx = 0; Rvmax = 0; f = 0;
for i = 1:runs
  x = abs(randn(100,100));
  tic;
    % Calculation of Escoufier's equivalent vectors
    p = size(x, 2);
    vt = [1:p];                                % Variables to test
    vr = [];                                   % Result: ordered variables
    RV = [1:p];                                % Result: correlations
    for j = 1:p                                % loop on the variable number
      Rvmax = 0;
      for k = 1:(p-j+1)                        % loop on the variables
        if j == 1
          x2 = [x, x(:, vt(k))];
        else
          x2 = [x, x(:, vr), x(:, vt(k))];     % New table to test
        end
        R = corr(x2);                      % Correlations table
        Ryy = R(1:p, 1:p);
        Rxx = R(p+1:p+j, p+1:p+j);
        Rxy = R(p+1:p+j, 1:p);
        Ryx = Rxy';
        rvt = trace(Ryx*Rxy)/((trace(Ryy^2)*trace(Rxx^2))^0.5); % RV calculation
        if rvt > Rvmax
          Rvmax = rvt;                         % test of RV
          vrt(j) = vt(k);                      % temporary held variable
        end
      end
      vr(j) = vrt(j);                          % Result: variable
      RV(j) = Rvmax;                           % Result: correlation
      f = find(vt~=vr(j));                     % identify the held variable
      vt = vt(f);                              % reidentify variables to test
    end
  timing = toc;
  cumulate = cumulate + timing;
end
times(5, 3) = timing;
disp(['Escoufier''s method on a 100x100 matrix (mixed)________ (sec): ' num2str(timing)])
clear x; clear p; clear vt; clear vr; clear vrt; clear rvt; clear RV; clear j; clear k;
clear x2; clear R; clear Rxx; clear Ryy; clear Rxy; clear Ryx; clear Rvmax; clear f;

Reply | Threaded
Open this post in threaded view
|

Re: Sandy Bridge

Sergei Steshenko





----- Original Message -----
> From: Graaf_van_Vlaanderen <[hidden email]>
> To: [hidden email]
> Cc:
> Sent: Saturday, October 20, 2012 11:08 AM
> Subject: Sandy Bridge
>
[snip]
> Is there a solution for this problem?
[snip]

http://linux.die.net/man/1/cpufreq-set - at least a workaround.

Regards,
  Sergei.

_______________________________________________
Help-octave mailing list
[hidden email]
https://mailman.cae.wisc.edu/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Sandy Bridge

martin_helm
In reply to this post by Graaf_van_Vlaanderen
If and when your CPU steps to a different speed has absolutely nothing
to do with how a program is compiled, it solely depends on the settings
of the CPU governor in your operating system.
_______________________________________________
Help-octave mailing list
[hidden email]
https://mailman.cae.wisc.edu/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Sandy Bridge

Sergei Steshenko




----- Original Message -----

> From: Martin Helm <[hidden email]>
> To: Graaf_van_Vlaanderen <[hidden email]>; "[hidden email]" <[hidden email]>
> Cc:
> Sent: Monday, October 22, 2012 2:05 AM
> Subject: Re: Sandy Bridge
>
> If and when your CPU steps to a different speed has absolutely nothing
> to do with how a program is compiled, it solely depends on the settings
> of the CPU governor in your operating system.
>

I am not sure that the situation is that straightforward.

For example, it is known that RAM accesses are much slower than cache ones. And if the CPU HW sees too many RAM accesses it may not speed up the CPU - it will still be waiting for data from RAM.

If OTOH there are a lot of cache accesses, it makes for the CPU sense to work faster.

Cache <-> RAM may be dependent on compiler optimizations - compilers are often aware of cache friendliness.

And, specifically, ATLAS is built in cache friendliness awareness mode.


Regards,
  Sergei.

_______________________________________________
Help-octave mailing list
[hidden email]
https://mailman.cae.wisc.edu/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: Sandy Bridge

Graaf_van_Vlaanderen
Thanks already for the feedback.
When I run the script on older Intel CPU's, like a Q6600 or E5400, the system steps up to the maximum CPU speed. So, yes probably there is something missing in the Linux kernel for this.
Regarding compilation there are some tweaks available for Sandy and the last Ivy bridge in the GNU compiler.