why is octave 4.2.0 x86_64-w64-mingw32 so slow?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

why is octave 4.2.0 x86_64-w64-mingw32 so slow?

Ray Tayek
hi, new to octave. taking a machine learning course. i have a 4 core/8
thread 4ghz 16gb pc.

cpu usage (20%) and memory usage (60% with other programs open) seem low
when running the course programs.

one epoch is taking a couple of minutes.

any way to speed this up?

any pointers will be appreciated.

thanks


--
Honesty is a very expensive gift. So, don't expect it from cheap people
- Warren Buffett
http://tayek.com/

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: why is octave 4.2.0 x86_64-w64-mingw32 so slow?

NJank
one epoch is taking a couple of minutes.

any way to speed this up?


Since we have no idea what you're actually trying to do in octave, it's hard to give you any advice to make it run faster. Since you are running on Windows I assume you've done the usual things like close all other programs. Have you checked your power settings to make sure they haven't stepped back your CPU speed? If you hold down Windows key and tap X it will bring up a menu that includes the mobility center. That has shortcuts two power that are mainly used if you're running on a laptop. If you are in a power saver or balanced mode, you may be running with reduced CPU speed. Make sure it says high performance or something equivalent.

That said it's entirely possible you're running an infinite loop inside of octave and you may be waiting a while



_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: why is octave 4.2.0 x86_64-w64-mingw32 so slow?

Ray Tayek
On 10/21/2017 3:04 AM, Nicholas Jankowski wrote:
>     one epoch is taking a couple of minutes.
>
>     any way to speed this up?
>
>
>
> Since we have no idea what you're actually trying to do in octave, it's
> hard to give you any advice to make it run faster. ...

please see code below. it's just doing some standard nn stuff.

> ... Since you are running
> on Windows I assume you've done the usual things  ... if you're running on a laptop.
it's a desktop.

> That said it's entirely possible you're running an infinite loop inside
> of octave and you may be waiting a while
>

the program terminates and seems to behave in a rational way.

it's not doing any i/o. 50% of memory is unused.

i would expect to see at least one thread running near 100%.

so i am puzzled.

thanks

% This function trains a neural network language model.
function [model] = train(epochs)
% Inputs:
%   epochs: Number of epochs to run.
% Output:
%   model: A struct containing the learned weights and biases and
vocabulary.

if size(ver('Octave'),1)
   OctaveMode = 1;
   warning('error', 'Octave:broadcast');
   start_time = time;
else
   OctaveMode = 0;
   start_time = clock;
end

% SET HYPERPARAMETERS HERE.
batchsize = 100;  % Mini-batch size.
learning_rate = .1;  % Learning rate; default = 0.1.
momentum = 0.9;  % Momentum; default = 0.9.
numhid1 = 50;  % Dimensionality of embedding space; default = 50.
numhid2 = 200;  % Number of units in hidden layer; default = 200.
init_wt = 0.01;  % Standard deviation of the normal distribution
                  % which is sampled to get the initial weights; default
= 0.01

% VARIABLES FOR TRACKING TRAINING PROGRESS.
show_training_CE_after = 100;
show_validation_CE_after = 1000;

% LOAD DATA.
[train_input, train_target, valid_input, valid_target, ...
   test_input, test_target, vocab] = load_data(batchsize);
[numwords, batchsize, numbatches] = size(train_input);
vocab_size = size(vocab, 2);

% INITIALIZE WEIGHTS AND BIASES.
word_embedding_weights = init_wt * randn(vocab_size, numhid1);
embed_to_hid_weights = init_wt * randn(numwords * numhid1, numhid2);
hid_to_output_weights = init_wt * randn(numhid2, vocab_size);
hid_bias = zeros(numhid2, 1);
output_bias = zeros(vocab_size, 1);

word_embedding_weights_delta = zeros(vocab_size, numhid1);
word_embedding_weights_gradient = zeros(vocab_size, numhid1);
embed_to_hid_weights_delta = zeros(numwords * numhid1, numhid2);
hid_to_output_weights_delta = zeros(numhid2, vocab_size);
hid_bias_delta = zeros(numhid2, 1);
output_bias_delta = zeros(vocab_size, 1);
expansion_matrix = eye(vocab_size);
count = 0;
tiny = exp(-30);

% TRAIN.
for epoch = 1:epochs
   fprintf(1, 'Epoch %d\n', epoch);
   this_chunk_CE = 0;
   trainset_CE = 0;
   % LOOP OVER MINI-BATCHES.
   for m = 1:numbatches
     input_batch = train_input(:, :, m);
     target_batch = train_target(:, :, m);

     % FORWARD PROPAGATE.
     % Compute the state of each layer in the network given the input batch
     % and all weights and biases
     [embedding_layer_state, hidden_layer_state, output_layer_state] = ...
       fprop(input_batch, ...
             word_embedding_weights, embed_to_hid_weights, ...
             hid_to_output_weights, hid_bias, output_bias);

     % COMPUTE DERIVATIVE.
     %% Expand the target to a sparse 1-of-K vector.
     expanded_target_batch = expansion_matrix(:, target_batch);
     %% Compute derivative of cross-entropy loss function.
     error_deriv = output_layer_state - expanded_target_batch;

     % MEASURE LOSS FUNCTION.
     CE = -sum(sum(...
       expanded_target_batch .* log(output_layer_state + tiny))) /
batchsize;
     count =  count + 1;
     this_chunk_CE = this_chunk_CE + (CE - this_chunk_CE) / count;
     trainset_CE = trainset_CE + (CE - trainset_CE) / m;
     fprintf(1, '\rBatch %d Train CE %.3f', m, this_chunk_CE);
     if mod(m, show_training_CE_after) == 0
       fprintf(1, '\n');
       count = 0;
       this_chunk_CE = 0;
     end
     if OctaveMode
       fflush(1);
     end

     % BACK PROPAGATE.
     %% OUTPUT LAYER.
     hid_to_output_weights_gradient =  hidden_layer_state * error_deriv';
     output_bias_gradient = sum(error_deriv, 2);
     back_propagated_deriv_1 = (hid_to_output_weights * error_deriv) ...
       .* hidden_layer_state .* (1 - hidden_layer_state);

     %% HIDDEN LAYER.
     % FILL IN CODE. Replace the line below by one of the options.
     embed_to_hid_weights_gradient = embed_to_hid_weights_gradient =
embedding_layer_state * back_propagated_deriv_1'; % was zeros(numhid1 *
numwords, numhid2);
     % Options:
     % (a) embed_to_hid_weights_gradient = back_propagated_deriv_1' *
embedding_layer_state;
     % (b) embed_to_hid_weights_gradient = embedding_layer_state *
back_propagated_deriv_1';
     % (c) embed_to_hid_weights_gradient = back_propagated_deriv_1;
     % (d) embed_to_hid_weights_gradient = embedding_layer_state;

     % FILL IN CODE. Replace the line below by one of the options.
     hid_bias_gradient = sum(back_propagated_deriv_1, 2); % was
zeros(numhid2, 1);
     % Options
     % (a) hid_bias_gradient = sum(back_propagated_deriv_1, 2);
     % (b) hid_bias_gradient = sum(back_propagated_deriv_1, 1);
     % (c) hid_bias_gradient = back_propagated_deriv_1;
     % (d) hid_bias_gradient = back_propagated_deriv_1';

     % FILL IN CODE. Replace the line below by one of the options.
     back_propagated_deriv_2 = embed_to_hid_weights *
back_propagated_deriv_1; % was zeros(numhid2, batchsize);
     % Options
     % (a) back_propagated_deriv_2 = embed_to_hid_weights *
back_propagated_deriv_1;
     % (b) back_propagated_deriv_2 = back_propagated_deriv_1 *
embed_to_hid_weights;
     % (c) back_propagated_deriv_2 = back_propagated_deriv_1' *
embed_to_hid_weights;
     % (d) back_propagated_deriv_2 = back_propagated_deriv_1 *
embed_to_hid_weights';

     word_embedding_weights_gradient(:) = 0;
     %% EMBEDDING LAYER.
     for w = 1:numwords
        word_embedding_weights_gradient =
word_embedding_weights_gradient + ...
          expansion_matrix(:, input_batch(w, :)) * ...
          (back_propagated_deriv_2(1 + (w - 1) * numhid1 : w * numhid1,
:)');
     end

     % UPDATE WEIGHTS AND BIASES.
     word_embedding_weights_delta = ...
       momentum .* word_embedding_weights_delta + ...
       word_embedding_weights_gradient ./ batchsize;
     word_embedding_weights = word_embedding_weights...
       - learning_rate * word_embedding_weights_delta;

     embed_to_hid_weights_delta = ...
       momentum .* embed_to_hid_weights_delta + ...
       embed_to_hid_weights_gradient ./ batchsize;
     embed_to_hid_weights = embed_to_hid_weights...
       - learning_rate * embed_to_hid_weights_delta;

     hid_to_output_weights_delta = ...
       momentum .* hid_to_output_weights_delta + ...
       hid_to_output_weights_gradient ./ batchsize;
     hid_to_output_weights = hid_to_output_weights...
       - learning_rate * hid_to_output_weights_delta;

     hid_bias_delta = momentum .* hid_bias_delta + ...
       hid_bias_gradient ./ batchsize;
     hid_bias = hid_bias - learning_rate * hid_bias_delta;

     output_bias_delta = momentum .* output_bias_delta + ...
       output_bias_gradient ./ batchsize;
     output_bias = output_bias - learning_rate * output_bias_delta;

     % VALIDATE.
     if mod(m, show_validation_CE_after) == 0
       fprintf(1, '\rRunning validation ...');
       if OctaveMode
         fflush(1);
       end
       [embedding_layer_state, hidden_layer_state, output_layer_state] = ...
         fprop(valid_input, word_embedding_weights, embed_to_hid_weights,...
               hid_to_output_weights, hid_bias, output_bias);
       datasetsize = size(valid_input, 2);
       expanded_valid_target = expansion_matrix(:, valid_target);
       CE = -sum(sum(...
         expanded_valid_target .* log(output_layer_state + tiny)))
/datasetsize;
       fprintf(1, ' Validation CE %.3f\n', CE);
       if OctaveMode
         fflush(1);
       end
     end
   end
   fprintf(1, '\rAverage Training CE %.3f\n', trainset_CE);
end
fprintf(1, 'Finished Training.\n');
if OctaveMode
   fflush(1);
end
fprintf(1, 'Final Training CE %.3f\n', trainset_CE);

% EVALUATE ON VALIDATION SET.
fprintf(1, '\rRunning validation ...');
if OctaveMode
   fflush(1);
end
[embedding_layer_state, hidden_layer_state, output_layer_state] = ...
   fprop(valid_input, word_embedding_weights, embed_to_hid_weights,...
         hid_to_output_weights, hid_bias, output_bias);
datasetsize = size(valid_input, 2);
expanded_valid_target = expansion_matrix(:, valid_target);
CE = -sum(sum(...
   expanded_valid_target .* log(output_layer_state + tiny))) / datasetsize;
fprintf(1, '\rFinal Validation CE %.3f\n', CE);
if OctaveMode
   fflush(1);
end

% EVALUATE ON TEST SET.
fprintf(1, '\rRunning test ...');
if OctaveMode
   fflush(1);
end
[embedding_layer_state, hidden_layer_state, output_layer_state] = ...
   fprop(test_input, word_embedding_weights, embed_to_hid_weights,...
         hid_to_output_weights, hid_bias, output_bias);
datasetsize = size(test_input, 2);
expanded_test_target = expansion_matrix(:, test_target);
CE = -sum(sum(...
   expanded_test_target .* log(output_layer_state + tiny))) / datasetsize;
fprintf(1, '\rFinal Test CE %.3f\n', CE);
if OctaveMode
   fflush(1);
end

model.word_embedding_weights = word_embedding_weights;
model.embed_to_hid_weights = embed_to_hid_weights;
model.hid_to_output_weights = hid_to_output_weights;
model.hid_bias = hid_bias;
model.output_bias = output_bias;
model.vocab = vocab;

% In MATLAB replace line below with 'end_time = clock;'
if OctaveMode
   end_time = time;
   diff = end_time - start_time;
else
   end_time = clock;
   diff = etime(end_time, start_time);
end
fprintf(1, 'Training took %.2f seconds\n', diff);
end



--
Honesty is a very expensive gift. So, don't expect it from cheap people
- Warren Buffett
http://tayek.com/

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: why is octave 4.2.0 x86_64-w64-mingw32 so slow?

siko1056
Ray Tayek wrote
> hi, new to octave. taking a machine learning course.

Nice, which course do you take part in, maybe some other course participants
can help you out here or benefit from other answers.


Ray Tayek wrote
> it's just doing some standard nn stuff.

Most of us have a great idea about standard Octave stuff, but are not
necessarily experts on neural networks, or on the code you are trying to run
;-)


Ray Tayek wrote

>> ... Since you are running
>> on Windows I assume you've done the usual things  ... if you're running
>> on a laptop.
> it's a desktop.
>
>> That said it's entirely possible you're running an infinite loop inside
>> of octave and you may be waiting a while
>>
>
> the program terminates and seems to behave in a rational way.

This is good news.


Ray Tayek wrote
> it's not doing any i/o. 50% of memory is unused.
>
> i would expect to see at least one thread running near 100%.
>
> so i am puzzled.

And I am puzzled of the huge and still incomplete example you provided (what
are all the sub-functions doing, i/o maybe?).  Consider to use a MCVE
<https://stackoverflow.com/help/mcve> if you are honestly looking for help.
Maybe attach long files to your email to not get wrapped, or better upload
all of the required code to some public service, attaching a link to it.
Like NJank said, it is more guessing than answering what we can do for you
at the moment.

HTH,
Kai



--
Sent from: http://octave.1599824.n4.nabble.com/Octave-General-f1599825.html

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: why is octave 4.2.0 x86_64-w64-mingw32 so slow?

Ray Tayek
On 10/23/2017 3:25 PM, siko1056 wrote:
> Ray Tayek wrote
>> hi, new to octave. taking a machine learning course.
>
> Nice, which course do you take part in, maybe some other course participants
> can help you out here or benefit from other answers.

i am taking a few, but this one uses octave:
https://www.coursera.org/learn/neural-networks/home/welcome

i have posted to their forum, the suggested closing windows. closing
windows reduces the time from 1400 seconds to 1200 seconds.

most of the time, all 8 cpu's are between 10%-20% utilization. windoze
has a lot of services running, so it is hard to know if octave is using
the cores or not.

thanks


>
>
> Ray Tayek wrote
>> it's just doing some standard nn stuff.
>
> Most of us have a great idea about standard Octave stuff, but are not
> necessarily experts on neural networks, or on the code you are trying to run
> ;-)
>
>
> Ray Tayek wrote
>>> ... Since you are running
>>> on Windows I assume you've done the usual things  ... if you're running
>>> on a laptop.
>> it's a desktop.
>>
>>> That said it's entirely possible you're running an infinite loop inside
>>> of octave and you may be waiting a while
>>>
>>
>> the program terminates and seems to behave in a rational way.
>
> This is good news.
>
>
> Ray Tayek wrote
>> it's not doing any i/o. 50% of memory is unused.
>>
>> i would expect to see at least one thread running near 100%.
>>
>> so i am puzzled.
>
> And I am puzzled of the huge and still incomplete example you provided (what
> are all the sub-functions doing, i/o maybe?).  Consider to use a MCVE
> <https://stackoverflow.com/help/mcve> if you are honestly looking for help.
> Maybe attach long files to your email to not get wrapped, or better upload
> all of the required code to some public service, attaching a link to it.
> Like NJank said, it is more guessing than answering what we can do for you
> at the moment.
>
> HTH,
> Kai
>
>
>
> --
> Sent from: http://octave.1599824.n4.nabble.com/Octave-General-f1599825.html
>
> _______________________________________________
> Help-octave mailing list
> [hidden email]
> https://lists.gnu.org/mailman/listinfo/help-octave
>


--
Honesty is a very expensive gift. So, don't expect it from cheap people
- Warren Buffett
http://tayek.com/

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: why is octave 4.2.0 x86_64-w64-mingw32 so slow?

NJank


On Oct 23, 2017 7:20 PM, "Ray Tayek" <[hidden email]> wrote:
On 10/23/2017 3:25 PM, siko1056 wrote:
Ray Tayek wrote
.

And I am puzzled of the huge and still incomplete example you provided (what
are all the sub-functions doing, i/o maybe?).  Consider to use a MCVE
<https://stackoverflow.com/help/mcve> if you are honestly looking for help.

Absent trusting us to debug you know far better 5han we do, have you tried using the code profiler? It's not as thorough as matlab's, but it can give you a good idea of what is chewing up your time, which functions are called the most, etc.

The short instructions are to run:

>> profile on 

And 
>> profile off

Before and after your code. Then you can use profexplore and profshow (I think, check the help) to see the results.

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: why is octave 4.2.0 x86_64-w64-mingw32 so slow?

Ray Tayek
On 10/23/2017 4:31 PM, Nicholas Jankowski wrote:


> ...
> Absent trusting us to debug you know far better 5han we do, have you
> tried using the code profiler? It's not as thorough as matlab's, but it
> can give you a good idea of what is chewing up your time, which
> functions are called the most, etc.
>

i did a profile (please see below). but nothing jumps out at me unless
log is real slow.

i would think that one of the cores/threads would be saturated.

thanks

 

 >> profshow
    #  Function Attr     Time (s)   Time (%)        Calls
--------------------------------------------------------
   47       log           310.235      24.39        37282
   28       exp           299.600      23.55        74565
    1     train           172.548      13.56            1
   32    repmat           133.316      10.48       149128
   30     fprop           115.611       9.09        37282
   21  binary *            58.261       4.58       819795
   23  binary +            40.484       3.18       670631
   45 binary ./            39.241       3.08       260814
   48 binary .*            25.855       2.03       298032
   18  binary -            22.245       1.75       484284
   46       sum            13.330       1.05       186346
   50    fflush             7.749       0.61        37315
   31 postfix '             7.535       0.59        37282
   27  prefix -             4.467       0.35        74565
   37       max             4.209       0.33       186410
   29   fprintf             3.656       0.29        37707
   35       all             2.251       0.18       447384
   22   reshape             1.546       0.12       372822
   13      size             1.535       0.12       410139
   44      ones             1.265       0.10       298256
 >>

--
Honesty is a very expensive gift. So, don't expect it from cheap people
- Warren Buffett
http://tayek.com/

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: why is octave 4.2.0 x86_64-w64-mingw32 so slow?

NJank


>> profshow
   #  Function Attr     Time (s)   Time (%)        Calls
--------------------------------------------------------
  47       log           310.235      24.39        37282
  28       exp           299.600      23.55        74565
   1     train           172.548      13.56            1
  32    repmat           133.316      10.48       149128
  30     fprop           115.611       9.09        37282
  21  binary *            58.261       4.58       819795
  23  binary +            40.484       3.18       670631
  45 binary ./            39.241       3.08       260814
  48 binary .*            25.855       2.03       298032
  18  binary -            22.245       1.75       484284
  46       sum            13.330       1.05       186346
  50    fflush             7.749       0.61        37315
  31 postfix '             7.535       0.59        37282
  27  prefix -             4.467       0.35        74565
  37       max             4.209       0.33       186410
  29   fprintf             3.656       0.29        37707
  35       all             2.251       0.18       447384
  22   reshape             1.546       0.12       372822
  13      size             1.535       0.12       410139
  44      ones             1.265       0.10       298256

>>

Ok, still flying blind here. But that seems like a lot of repmat calls. I seem to recall when working on repelem some people trying to avoid repmat due to it being slow.  If I can find the reference or reason I'll share.  Log and exp seem high, but it really depends on how big a matrix you're calling them on.

I'd recommend trying profexplore and diving down to each function call. It may show you which parts of your program are more expensive and you can focus your optimization efforts.


_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: why is octave 4.2.0 x86_64-w64-mingw32 so slow?

Doug Stewart-4
In reply to this post by Ray Tayek


On Mon, Oct 23, 2017 at 9:11 PM, Ray Tayek <[hidden email]> wrote:
On 10/23/2017 4:31 PM, Nicholas Jankowski wrote:


...
Absent trusting us to debug you know far better 5han we do, have you tried using the code profiler? It's not as thorough as matlab's, but it can give you a good idea of what is chewing up your time, which functions are called the most, etc.


i did a profile (please see below). but nothing jumps out at me unless log is real slow.

i would think that one of the cores/threads would be saturated.

thanks



>> profshow
   #  Function Attr     Time (s)   Time (%)        Calls
--------------------------------------------------------
  47       log           310.235      24.39        37282
  28       exp           299.600      23.55        74565
   1     train           172.548      13.56            1
  32    repmat           133.316      10.48       149128
  30     fprop           115.611       9.09        37282
  21  binary *            58.261       4.58       819795
  23  binary +            40.484       3.18       670631
  45 binary ./            39.241       3.08       260814
  48 binary .*            25.855       2.03       298032
  18  binary -            22.245       1.75       484284
  46       sum            13.330       1.05       186346
  50    fflush             7.749       0.61        37315
  31 postfix '             7.535       0.59        37282
  27  prefix -             4.467       0.35        74565
  37       max             4.209       0.33       186410
  29   fprintf             3.656       0.29        37707
  35       all             2.251       0.18       447384
  22   reshape             1.546       0.12       372822
  13      size             1.535       0.12       410139
  44      ones             1.265       0.10       298256

>>

-- 



I think that you don't understand how the os handles the cores and the load.
With a single thread program running full out the os usually farms the load out between all the cores. thus 1 core at 100% 
when it is shared will show up as 4 cores at 25%. I think this is what you are seeing.




_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: why is octave 4.2.0 x86_64-w64-mingw32 so slow?

Ray Tayek
On 10/23/2017 6:23 PM, Doug Stewart wrote:

>
>
> On Mon, Oct 23, 2017 at 9:11 PM, Ray Tayek <[hidden email]
>     i did a profile ...
>
>     i would think that one of the cores/threads would be saturated. ...
>
>
> I think that you don't understand how the os handles the cores and the load.
> With a single thread program running full out the os usually farms the
> load out between all the cores. thus 1 core at 100%
> when it is shared will show up as 4 cores at 25%. I think this is what
> you are seeing.

maybe. the cpu utilization shows 10%-20% across all cores most of the
time. so it could be spread out. otoh, there are still dozens of windoze
services that are running, so it's hard to tell.

one of the other people taking the course reported that Octave 4.0.3
runs much faster.

i did pass the quiz, so this is not urgent. i am just curious why none
of the cpu's saturate.

thanks


--
Honesty is a very expensive gift. So, don't expect it from cheap people
- Warren Buffett
http://tayek.com/

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: why is octave 4.2.0 x86_64-w64-mingw32 so slow?

NJank



On Oct 23, 2017 9:40 PM, "Ray Tayek" <[hidden email]> wrote:
On 10/23/2017 6:23 PM, Doug Stewart wrote:


On Mon, Oct 23, 2017 at 9:11 PM, Ray Tayek <[hidden email]     i did a profile ...

    i would think that one of the cores/threads would be saturated. ...



I think that you don't understand how the os handles the cores and the load.
With a single thread program running full out the os usually farms the load out between all the cores. thus 1 core at 100%
when it is shared will show up as 4 cores at 25%. I think this is what you are seeing.

maybe. the cpu utilization shows 10%-20% across all cores most of the time. so it could be spread out. otoh, there are still dozens of windoze services that are running, so it's hard to tell.

one of the other people taking the course reported that Octave 4.0.3 runs much faster.

i did pass the quiz, so this is not urgent. i am just curious why none of the cpu's saturate.

thanks



if you have 8 cores, do you ever see a process over 12 or 13%?

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: why is octave 4.2.0 x86_64-w64-mingw32 so slow?

Carlo de Falco-2
In reply to this post by NJank

> On 24 Oct 2017, at 03:17, Nicholas Jankowski <[hidden email]> wrote:
>
>  but it really depends on how big a matrix you're calling them on.
from the large number of calls to log and exp it seems they are not being called on a matrix but rather called repeatedly in a loop.
as log and exp are taking around 50% of the total run time, I think absence of vectorization is most likely the cause of he slowdown
in this case:

  https://www.gnu.org/software/octave/doc/interpreter/Vectorization-and-Faster-Code-Execution.html#Vectorization-and-Faster-Code-Execution

of course this is just a guess as I haven't actually seen the code.
c.



_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: why is octave 4.2.0 x86_64-w64-mingw32 so slow?

Ray Tayek
On 10/24/2017 1:55 AM, Carlo De Falco wrote:
>
>> On 24 Oct 2017, at 03:17, Nicholas Jankowski <[hidden email]> wrote:
>>
>>   but it really depends on how big a matrix you're calling them on.
> from the large number of calls to log and exp it seems they are not being called on a matrix but rather called repeatedly in a loop.
> as log and exp are taking around 50% of the total run time, I think absence of vectorization is most likely the cause of he slowdown ...

i thought i posted the code (please see below). i am new to octave so it
looks vectorized to the dummy here.

thanks

function [embedding_layer_state, hidden_layer_state, output_layer_state]
= ...
   fprop(input_batch, word_embedding_weights, embed_to_hid_weights,...
   hid_to_output_weights, hid_bias, output_bias)
% This method forward propagates through a neural network.
% Inputs:
%   input_batch: The input data as a matrix of size numwords X batchsize
where,
%     numwords is the number of words, batchsize is the number of data
points.
%     So, if input_batch(i, j) = k then the ith word in data point j is word
%     index k of the vocabulary.
%
%   word_embedding_weights: Word embedding as a matrix of size
%     vocab_size X numhid1, where vocab_size is the size of the vocabulary
%     numhid1 is the dimensionality of the embedding space.
%
%   embed_to_hid_weights: Weights between the word embedding layer and
hidden
%     layer as a matrix of soze numhid1*numwords X numhid2, numhid2 is the
%     number of hidden units.
%
%   hid_to_output_weights: Weights between the hidden layer and output
softmax
%               unit as a matrix of size numhid2 X vocab_size
%
%   hid_bias: Bias of the hidden layer as a matrix of size numhid2 X 1.
%
%   output_bias: Bias of the output layer as a matrix of size vocab_size
X 1.
%
% Outputs:
%   embedding_layer_state: State of units in the embedding layer as a
matrix of
%     size numhid1*numwords X batchsize
%
%   hidden_layer_state: State of units in the hidden layer as a matrix
of size
%     numhid2 X batchsize
%
%   output_layer_state: State of units in the output layer as a matrix
of size
%     vocab_size X batchsize
%

[numwords, batchsize] = size(input_batch);
[vocab_size, numhid1] = size(word_embedding_weights);
numhid2 = size(embed_to_hid_weights, 2);

%% COMPUTE STATE OF WORD EMBEDDING LAYER.
% Look up the inputs word indices in the word_embedding_weights matrix.
embedding_layer_state = reshape(...
   word_embedding_weights(reshape(input_batch, 1, []),:)',...
   numhid1 * numwords, []);

%% COMPUTE STATE OF HIDDEN LAYER.
% Compute inputs to hidden units.
inputs_to_hidden_units = embed_to_hid_weights' * embedding_layer_state + ...
   repmat(hid_bias, 1, batchsize);

% Apply logistic activation function.
% FILL IN CODE. Replace the line below by one of the options.
hidden_layer_state = 1 ./ (1 + exp(-inputs_to_hidden_units)); % was
zeros(numhid2, batchsize)
% Options
% (a) hidden_layer_state = 1 ./ (1 + exp(inputs_to_hidden_units));
% (b) hidden_layer_state = 1 ./ (1 - exp(-inputs_to_hidden_units));
% (c) hidden_layer_state = 1 ./ (1 + exp(-inputs_to_hidden_units));
% (d) hidden_layer_state = -1 ./ (1 + exp(-inputs_to_hidden_units));

%% COMPUTE STATE OF OUTPUT LAYER.
% Compute inputs to softmax.
% FILL IN CODE. Replace the line below by one of the options.
inputs_to_softmax = hid_to_output_weights' * hidden_layer_state +
repmat(output_bias, 1, batchsize); % was zeros(vocab_size, batchsize);
% Options
% (a) inputs_to_softmax = hid_to_output_weights' * hidden_layer_state +
repmat(output_bias, 1, batchsize);
% (b) inputs_to_softmax = hid_to_output_weights' * hidden_layer_state +
repmat(output_bias, batchsize, 1);
% (c) inputs_to_softmax = hidden_layer_state * hid_to_output_weights' +
repmat(output_bias, 1, batchsize);
% (d) inputs_to_softmax = hid_to_output_weights * hidden_layer_state +
repmat(output_bias, batchsize, 1);

% Subtract maximum.
% Remember that adding or subtracting the same constant from each input to a
% softmax unit does not affect the outputs. Here we are subtracting
maximum to
% make all inputs <= 0. This prevents overflows when computing their
% exponents.
inputs_to_softmax = inputs_to_softmax...
   - repmat(max(inputs_to_softmax), vocab_size, 1);

% Compute exp.
output_layer_state = exp(inputs_to_softmax);

% Normalize to get probability distribution.
output_layer_state = output_layer_state ./ repmat(...
   sum(output_layer_state, 1), vocab_size, 1);



% This function trains a neural network language model.
function [model] = train(epochs)
% Inputs:
%   epochs: Number of epochs to run.
% Output:
%   model: A struct containing the learned weights and biases and
vocabulary.

if size(ver('Octave'),1)
   OctaveMode = 1;
   warning('error', 'Octave:broadcast');
   start_time = time;
else
   OctaveMode = 0;
   start_time = clock;
end

% SET HYPERPARAMETERS HERE.
batchsize = 100;  % Mini-batch size.
learning_rate = .1;  % Learning rate; default = 0.1.
momentum = 0.9;  % Momentum; default = 0.9.
numhid1 = 50;  % Dimensionality of embedding space; default = 50.
numhid2 = 200;  % Number of units in hidden layer; default = 200.
init_wt = 0.01;  % Standard deviation of the normal distribution
                  % which is sampled to get the initial weights; default
= 0.01

% VARIABLES FOR TRACKING TRAINING PROGRESS.
show_training_CE_after = 100;
show_validation_CE_after = 1000;

% LOAD DATA.
[train_input, train_target, valid_input, valid_target, ...
   test_input, test_target, vocab] = load_data(batchsize);
[numwords, batchsize, numbatches] = size(train_input);
vocab_size = size(vocab, 2);

% INITIALIZE WEIGHTS AND BIASES.
word_embedding_weights = init_wt * randn(vocab_size, numhid1);
embed_to_hid_weights = init_wt * randn(numwords * numhid1, numhid2);
hid_to_output_weights = init_wt * randn(numhid2, vocab_size);
hid_bias = zeros(numhid2, 1);
output_bias = zeros(vocab_size, 1);

word_embedding_weights_delta = zeros(vocab_size, numhid1);
word_embedding_weights_gradient = zeros(vocab_size, numhid1);
embed_to_hid_weights_delta = zeros(numwords * numhid1, numhid2);
hid_to_output_weights_delta = zeros(numhid2, vocab_size);
hid_bias_delta = zeros(numhid2, 1);
output_bias_delta = zeros(vocab_size, 1);
expansion_matrix = eye(vocab_size);
count = 0;
tiny = exp(-30);

% TRAIN.
for epoch = 1:epochs
   fprintf(1, 'Epoch %d\n', epoch);
   this_chunk_CE = 0;
   trainset_CE = 0;
   % LOOP OVER MINI-BATCHES.
   for m = 1:numbatches
     input_batch = train_input(:, :, m);
     target_batch = train_target(:, :, m);

     % FORWARD PROPAGATE.
     % Compute the state of each layer in the network given the input batch
     % and all weights and biases
     [embedding_layer_state, hidden_layer_state, output_layer_state] = ...
       fprop(input_batch, ...
             word_embedding_weights, embed_to_hid_weights, ...
             hid_to_output_weights, hid_bias, output_bias);

     % COMPUTE DERIVATIVE.
     %% Expand the target to a sparse 1-of-K vector.
     expanded_target_batch = expansion_matrix(:, target_batch);
     %% Compute derivative of cross-entropy loss function.
     error_deriv = output_layer_state - expanded_target_batch;

     % MEASURE LOSS FUNCTION.
     CE = -sum(sum(...
       expanded_target_batch .* log(output_layer_state + tiny))) /
batchsize;
     count =  count + 1;
     this_chunk_CE = this_chunk_CE + (CE - this_chunk_CE) / count;
     trainset_CE = trainset_CE + (CE - trainset_CE) / m;
     fprintf(1, '\rBatch %d Train CE %.3f', m, this_chunk_CE);
     if mod(m, show_training_CE_after) == 0
       fprintf(1, '\n');
       count = 0;
       this_chunk_CE = 0;
     end
     if OctaveMode
       fflush(1);
     end

     % BACK PROPAGATE.
     %% OUTPUT LAYER.
     hid_to_output_weights_gradient =  hidden_layer_state * error_deriv';
     output_bias_gradient = sum(error_deriv, 2);
     back_propagated_deriv_1 = (hid_to_output_weights * error_deriv) ...
       .* hidden_layer_state .* (1 - hidden_layer_state);

     %% HIDDEN LAYER.
     % FILL IN CODE. Replace the line below by one of the options.
     embed_to_hid_weights_gradient = embed_to_hid_weights_gradient =
embedding_layer_state * back_propagated_deriv_1'; % was zeros(numhid1 *
numwords, numhid2);
     % Options:
     % (a) embed_to_hid_weights_gradient = back_propagated_deriv_1' *
embedding_layer_state;
     % (b) embed_to_hid_weights_gradient = embedding_layer_state *
back_propagated_deriv_1';
     % (c) embed_to_hid_weights_gradient = back_propagated_deriv_1;
     % (d) embed_to_hid_weights_gradient = embedding_layer_state;

     % FILL IN CODE. Replace the line below by one of the options.
     hid_bias_gradient = sum(back_propagated_deriv_1, 2); % was
zeros(numhid2, 1);
     % Options
     % (a) hid_bias_gradient = sum(back_propagated_deriv_1, 2);
     % (b) hid_bias_gradient = sum(back_propagated_deriv_1, 1);
     % (c) hid_bias_gradient = back_propagated_deriv_1;
     % (d) hid_bias_gradient = back_propagated_deriv_1';

     % FILL IN CODE. Replace the line below by one of the options.
     back_propagated_deriv_2 = embed_to_hid_weights *
back_propagated_deriv_1; % was zeros(numhid2, batchsize);
     % Options
     % (a) back_propagated_deriv_2 = embed_to_hid_weights *
back_propagated_deriv_1;
     % (b) back_propagated_deriv_2 = back_propagated_deriv_1 *
embed_to_hid_weights;
     % (c) back_propagated_deriv_2 = back_propagated_deriv_1' *
embed_to_hid_weights;
     % (d) back_propagated_deriv_2 = back_propagated_deriv_1 *
embed_to_hid_weights';

     word_embedding_weights_gradient(:) = 0;
     %% EMBEDDING LAYER.
     for w = 1:numwords
        word_embedding_weights_gradient =
word_embedding_weights_gradient + ...
          expansion_matrix(:, input_batch(w, :)) * ...
          (back_propagated_deriv_2(1 + (w - 1) * numhid1 : w * numhid1,
:)');
     end

     % UPDATE WEIGHTS AND BIASES.
     word_embedding_weights_delta = ...
       momentum .* word_embedding_weights_delta + ...
       word_embedding_weights_gradient ./ batchsize;
     word_embedding_weights = word_embedding_weights...
       - learning_rate * word_embedding_weights_delta;

     embed_to_hid_weights_delta = ...
       momentum .* embed_to_hid_weights_delta + ...
       embed_to_hid_weights_gradient ./ batchsize;
     embed_to_hid_weights = embed_to_hid_weights...
       - learning_rate * embed_to_hid_weights_delta;

     hid_to_output_weights_delta = ...
       momentum .* hid_to_output_weights_delta + ...
       hid_to_output_weights_gradient ./ batchsize;
     hid_to_output_weights = hid_to_output_weights...
       - learning_rate * hid_to_output_weights_delta;

     hid_bias_delta = momentum .* hid_bias_delta + ...
       hid_bias_gradient ./ batchsize;
     hid_bias = hid_bias - learning_rate * hid_bias_delta;

     output_bias_delta = momentum .* output_bias_delta + ...
       output_bias_gradient ./ batchsize;
     output_bias = output_bias - learning_rate * output_bias_delta;

     % VALIDATE.
     if mod(m, show_validation_CE_after) == 0
       fprintf(1, '\rRunning validation ...');
       if OctaveMode
         fflush(1);
       end
       [embedding_layer_state, hidden_layer_state, output_layer_state] = ...
         fprop(valid_input, word_embedding_weights, embed_to_hid_weights,...
               hid_to_output_weights, hid_bias, output_bias);
       datasetsize = size(valid_input, 2);
       expanded_valid_target = expansion_matrix(:, valid_target);
       CE = -sum(sum(...
         expanded_valid_target .* log(output_layer_state + tiny)))
/datasetsize;
       fprintf(1, ' Validation CE %.3f\n', CE);
       if OctaveMode
         fflush(1);
       end
     end
   end
   fprintf(1, '\rAverage Training CE %.3f\n', trainset_CE);
end
fprintf(1, 'Finished Training.\n');
if OctaveMode
   fflush(1);
end
fprintf(1, 'Final Training CE %.3f\n', trainset_CE);

% EVALUATE ON VALIDATION SET.
fprintf(1, '\rRunning validation ...');
if OctaveMode
   fflush(1);
end
[embedding_layer_state, hidden_layer_state, output_layer_state] = ...
   fprop(valid_input, word_embedding_weights, embed_to_hid_weights,...
         hid_to_output_weights, hid_bias, output_bias);
datasetsize = size(valid_input, 2);
expanded_valid_target = expansion_matrix(:, valid_target);
CE = -sum(sum(...
   expanded_valid_target .* log(output_layer_state + tiny))) / datasetsize;
fprintf(1, '\rFinal Validation CE %.3f\n', CE);
if OctaveMode
   fflush(1);
end

% EVALUATE ON TEST SET.
fprintf(1, '\rRunning test ...');
if OctaveMode
   fflush(1);
end
[embedding_layer_state, hidden_layer_state, output_layer_state] = ...
   fprop(test_input, word_embedding_weights, embed_to_hid_weights,...
         hid_to_output_weights, hid_bias, output_bias);
datasetsize = size(test_input, 2);
expanded_test_target = expansion_matrix(:, test_target);
CE = -sum(sum(...
   expanded_test_target .* log(output_layer_state + tiny))) / datasetsize;
fprintf(1, '\rFinal Test CE %.3f\n', CE);
if OctaveMode
   fflush(1);
end

model.word_embedding_weights = word_embedding_weights;
model.embed_to_hid_weights = embed_to_hid_weights;
model.hid_to_output_weights = hid_to_output_weights;
model.hid_bias = hid_bias;
model.output_bias = output_bias;
model.vocab = vocab;

% In MATLAB replace line below with 'end_time = clock;'
if OctaveMode
   end_time = time;
   diff = end_time - start_time;
else
   end_time = clock;
   diff = etime(end_time, start_time);
end
fprintf(1, 'Training took %.2f seconds\n', diff);
end



--
Honesty is a very expensive gift. So, don't expect it from cheap people
- Warren Buffett
http://tayek.com/

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: why is octave 4.2.0 x86_64-w64-mingw32 so slow?

NJank
On Tue, Oct 24, 2017 at 5:41 AM, Ray Tayek <[hidden email]> wrote:


I was debating giving your script a test run, but you didn't provide any sample, preferably simplified, inputs.

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: why is octave 4.2.0 x86_64-w64-mingw32 so slow?

NJank


On Tue, Oct 24, 2017 at 8:35 AM, Nicholas Jankowski <[hidden email]> wrote:
On Tue, Oct 24, 2017 at 5:41 AM, Ray Tayek <[hidden email]> wrote:


I was debating giving your script a test run, but you didn't provide any sample, preferably simplified, inputs.

oh,wait, you posted two separate functions.  So, 'train' that you posted first calls 'fprop', correct? no other functions missing? If I run train(1) with both functions present, i should see what you're seeing?

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: why is octave 4.2.0 x86_64-w64-mingw32 so slow?

NJank
On Tue, Oct 24, 2017 at 8:38 AM, Nicholas Jankowski <[hidden email]> wrote:


On Tue, Oct 24, 2017 at 8:35 AM, Nicholas Jankowski <[hidden email]> wrote:
On Tue, Oct 24, 2017 at 5:41 AM, Ray Tayek <[hidden email]> wrote:


I was debating giving your script a test run, but you didn't provide any sample, preferably simplified, inputs.

oh,wait, you posted two separate functions.  So, 'train' that you posted first calls 'fprop', correct? no other functions missing? If I run train(1) with both functions present, i should see what you're seeing?


nope, apparently there's at least a custom load_data function that we don't have. 

so, it seems unless you can provide a minimal, complete example to test we're not going to be able to help much.

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: why is octave 4.2.0 x86_64-w64-mingw32 so slow?

lascott
In reply to this post by Ray Tayek
It seems you've run into why people care about fast implementations of log/exp.
That's where your profiler show you are spending 50% of your time.


More generally, to check your machine and octave version, try

On Tue, Oct 24, 2017 at 1:41 PM, Nicholas Jankowski <[hidden email]> wrote:
On Tue, Oct 24, 2017 at 8:38 AM, Nicholas Jankowski <[hidden email]> wrote:


On Tue, Oct 24, 2017 at 8:35 AM, Nicholas Jankowski <[hidden email]> wrote:
On Tue, Oct 24, 2017 at 5:41 AM, Ray Tayek <[hidden email]> wrote:


I was debating giving your script a test run, but you didn't provide any sample, preferably simplified, inputs.

oh,wait, you posted two separate functions.  So, 'train' that you posted first calls 'fprop', correct? no other functions missing? If I run train(1) with both functions present, i should see what you're seeing?


nope, apparently there's at least a custom load_data function that we don't have. 

so, it seems unless you can provide a minimal, complete example to test we're not going to be able to help much.

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave



_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave
Reply | Threaded
Open this post in threaded view
|

Re: why is octave 4.2.0 x86_64-w64-mingw32 so slow?

Ray Tayek
In reply to this post by NJank
On 10/24/2017 5:41 AM, Nicholas Jankowski wrote:
> On Tue, Oct 24, 2017 at 8:38 AM, Nicholas Jankowski <[hidden email]
> <mailto:[hidden email]>> wrote:
...
>
>     oh,wait, you posted two separate functions.  So, 'train' that you
>     posted first calls 'fprop', correct? no other functions missing? If
>     I run train(1) with both functions present, i should see what you're
>     seeing?
>
>

i hope so.


>
> nope, apparently there's at least a custom load_data function that we
> don't have.
>
> so, it seems unless you can provide a minimal, complete example to test
> we're not going to be able to help much.

please see load data below. unfortunately the data.mat is 7 mb and in
some binary format. i tried to look at it and it's a bunch of
structures. indices that select training, validation, and test sets, and
the data itself.

i downloaded the benchmark. but the configure and makefile have carriage
returns in them and cywin complains.

maybe it's just a slow log.

thanks



function [train_input, train_target, valid_input, valid_target,
test_input, test_target, vocab] = load_data(N)
% This method loads the training, validation and test set.
% It also divides the training set into mini-batches.
% Inputs:
%   N: Mini-batch size.
% Outputs:
%   train_input: An array of size D X N X M, where
%                 D: number of input dimensions (in this case, 3).
%                 N: size of each mini-batch (in this case, 100).
%                 M: number of minibatches.
%   train_target: An array of size 1 X N X M.
%   valid_input: An array of size D X number of points in the validation
set.
%   test: An array of size D X number of points in the test set.
%   vocab: Vocabulary containing index to word mapping.

load data.mat;
numdims = size(data.trainData, 1);
D = numdims - 1;
M = floor(size(data.trainData, 2) / N);
train_input = reshape(data.trainData(1:D, 1:N * M), D, N, M);
train_target = reshape(data.trainData(D + 1, 1:N * M), 1, N, M);
valid_input = data.validData(1:D, :);
valid_target = data.validData(D + 1, :);
test_input = data.testData(1:D, :);
test_target = data.testData(D + 1, :);
vocab = data.vocab;
end


--
Honesty is a very expensive gift. So, don't expect it from cheap people
- Warren Buffett
http://tayek.com/

_______________________________________________
Help-octave mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/help-octave