Andy Adler wrote:

>I propose the following patch to hist.m;

>it results in about 2.5x speedup.

>

At one point a rewrote hist to not have any

loops. I was doing a whole lot of really large

histograms (e.g., 100000 values drawn from

a poisson distribution --- I had to rewrite the

poisson generator too ;-)

Please check it out and tell me if it is any faster

than what you've got. I think it might be slower

if you have a histogram with a lot of empty bins.

Thanks,

Paul Kienzle

[hidden email]
## Copyright (C) 1996, 1997 John W. Eaton

##

## This file is part of Octave.

##

## Octave is free software; you can redistribute it and/or modify it

## under the terms of the GNU General Public License as published by

## the Free Software Foundation; either version 2, or (at your option)

## any later version.

##

## Octave is distributed in the hope that it will be useful, but

## WITHOUT ANY WARRANTY; without even the implied warranty of

## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU

## General Public License for more details.

##

## You should have received a copy of the GNU General Public License

## along with Octave; see the file COPYING. If not, write to the Free

## Software Foundation, 59 Temple Place - Suite 330, Boston, MA

## 02111-1307, USA.

## -*- texinfo -*-

## @deftypefn {Function File} {} hist (@var{y}, @var{x}, @var{norm})

## Produce histogram counts or plots.

##

## With one vector input argument, plot a histogram of the values with

## 10 bins. The range of the histogram bins is determined by the range

## of the data.

##

## Given a second scalar argument, use that as the number of bins.

##

## Given a second vector argument, use that as the centers of the bins,

## with the width of the bins determined from the adjacent values in

## the vector.

##

## If third argument is provided, the histogram is normalised such that

## the sum of the bars is equal to @var{norm}.

##

## Extreme values are lumped in the first and last bins.

##

## With two output arguments, produce the values @var{nn} and @var{xx} such

## that @code{bar (@var{xx}, @var{nn})} will plot the histogram.

## @end deftypefn

## @seealso{bar}

## Author: jwe

function [nn, xx] = hist (y, x, norm)

if (nargin < 1 || nargin > 3)

usage ("[nn, xx] = hist (y, x, norm)");

endif

if (is_vector (y))

max_val = max (y);

min_val = min (y);

else

error ("hist: first argument must be a vector");

endif

if (nargin == 1)

x = 10;

endif

if (is_scalar (x))

n = x;

if (n <= 0 || n != fix(n))

error ("hist: number of bins must be a positive integer");

endif

delta = (max_val - min_val) / n / 2;

x = linspace (min_val+delta, max_val-delta, n);

freq = zeros(1,n);

q = sort(y(:).');

L = length(q);

if (q(1) == q(L))

freq(n) = L;

else

q = (q - q(1))/(q(L)-q(1))/(1+eps); # set y-range to [0,1)

q = fix(q*n); # split into n bins

same = ( q == [q(2:L),-Inf] ); # true if neighbours are in the same bin

q = q(~same); # q lists the 'active' bins (0-origin)

f = cumsum(same); # cumulative histogram

f = f(~same);

f = [f(1), diff(f)] + 1; # cumulative histogram -> histogram

## we need to add 1 since we did not count the point at the

## boundary between bins (it was turned into zero)

## distribute f to the active bins, leaving the remaining bins empty

freq(q+1) = f;

endif

elseif (is_vector (x))

tmp = sort (x);

if (any (tmp != x))

warning ("hist: bin values not sorted on input");

x = tmp;

endif

n = length(x);

cutoff = ( x(1:n-1) + x(2:n) ) / 2; # find bin boundaries

[s, idx] = sort ( [cutoff(:); y(:)] ); # put elements between boundaries

chist = cumsum(idx>n); # integrate over all elements

chist = [chist(idx<n); chist(length(chist))]; # keep totals at boundaries

freq = [chist(1); diff(chist) ]; # differentiate for histogram

else

error ("hist: second argument must be a scalar or a vector");

endif

if (nargin == 3)

## Normalise the histogram.

freq = freq / length(y) * norm;

endif

if (nargout > 0)

nn = freq;

xx = x;

else

bar (x, freq);

endif

endfunction