# Re: [OctDev] Fwd: Re: Problems with Statistics package Classic List Threaded 3 messages Open this post in threaded view
|

## Re: [OctDev] Fwd: Re: Problems with Statistics package

 On Wed, 30 May 2012 11:25:14, marco atzeri wrote: > Hi Hannes, octave-forge mailing list is the right place for bugs and > patches on the statistics package anova.m is not part of the Octave Forge Statistics package. It is part of Octave core. I am cc'ing this mail along with the patch to the Octave maintainers list. Arno > -------- Original Message -------- > Subject: Re: Problems with Statistics package > Date: Wed, 30 May 2012 11:06:42 +0200 > From: Hannes <[hidden email]> > To: [hidden email] > > Ok, I think I got it pinned downed and solved. I have attached a patch. > > Heres the explenation: > >>  total_mean = mean (y(:)); >>  SSB = sum (group_count .* (group_mean - total_mean) .^ 2); > is always positive [sum of squares between] > >>  SST = sumsq (reshape (y, n, 1) - total_mean); > is always positive and >= SSB [sum of squares total] > >>  SSW = SST - SSB; > => should always be >=0 > but it is an ill conditioned operation if SST==SSB resulting in > possibly negative error. Note that if the error is positive, it > doesn't matter because it will produce f == Inf and p = 0 as expected. > >>  df_b = k - 1; >>  df_w = n - k; >>  v_b = SSB / df_b; >>  v_w = SSW / df_w; > this can become negative, if the SSW is negative or 0 if SSW is zero > or close to zero. > > >>  f = v_b / v_w; > this results in divide by zero if v_w is zero and results in *wrong > result without warning* if v_w is negative. > > > My patch changes this whole block to: > >    SSB = sum (group_count .* (group_mean - total_mean) .^ 2); >    SST = sumsq (reshape (y, n, 1) - total_mean); >    SSW = SST - SSB; >    if (SSW < 0) # machine error if SSB == SSW >      SSW = 0; >    endif >    df_b = k - 1; >    df_w = n - k; >    v_b = SSB / df_b; >    v_w = SSW / df_w; >    if (v_w != 0) >      f = v_b / v_w; >      pval = 1 - fcdf (f, df_b, df_w); >    elseif (v_b == 0) >      f = NaN; >      pval = 1; > > 0 / 0 is  NaN and the hypothesis mus be rejected > >    else >      f = Inf; >      pval = 0; > > x / 0 is Inf and hypothesis must be accepted (because there is no > variance inside the sample but there is outside > >    endif > > HTH, > Hannes > patch_anova.m (771 bytes) Download Attachment
 Concerning the matter below, I didn't yet get a response. However I   realised that most of the tests fail or produce (divide by zero   errors) if the variance of a sample is 0. Since errors imply a wrong   behaviour by the user (and e.g. testing two samples with variance 0 in   var_test is completely legal) I think this should be fixed. I have attached patches for t_test_2 and var_test. Other tests can   fixed accordingly. Thanks for hacking gnu-octave. Best, Hannes Quoting "Arno Onken" <[hidden email]>: > On Wed, 30 May 2012 11:25:14, marco atzeri wrote: >> Hi Hannes, octave-forge mailing list is the right place for bugs and >> patches on the statistics package > > anova.m is not part of the Octave Forge Statistics package. It is part > of Octave core. I am cc'ing this mail along with the patch to the Octave > maintainers list. > > Arno > >> -------- Original Message -------- >> Subject: Re: Problems with Statistics package >> Date: Wed, 30 May 2012 11:06:42 +0200 >> From: Hannes <[hidden email]> >> To: [hidden email] >> >> Ok, I think I got it pinned downed and solved. I have attached a patch. >> >> Heres the explenation: >> >>>  total_mean = mean (y(:)); >>>  SSB = sum (group_count .* (group_mean - total_mean) .^ 2); >> is always positive [sum of squares between] >> >>>  SST = sumsq (reshape (y, n, 1) - total_mean); >> is always positive and >= SSB [sum of squares total] >> >>>  SSW = SST - SSB; >> => should always be >=0 >> but it is an ill conditioned operation if SST==SSB resulting in >> possibly negative error. Note that if the error is positive, it >> doesn't matter because it will produce f == Inf and p = 0 as expected. >> >>>  df_b = k - 1; >>>  df_w = n - k; >>>  v_b = SSB / df_b; >>>  v_w = SSW / df_w; >> this can become negative, if the SSW is negative or 0 if SSW is zero >> or close to zero. >> >> >>>  f = v_b / v_w; >> this results in divide by zero if v_w is zero and results in *wrong >> result without warning* if v_w is negative. >> >> >> My patch changes this whole block to: >> >>    SSB = sum (group_count .* (group_mean - total_mean) .^ 2); >>    SST = sumsq (reshape (y, n, 1) - total_mean); >>    SSW = SST - SSB; >>    if (SSW < 0) # machine error if SSB == SSW >>      SSW = 0; >>    endif >>    df_b = k - 1; >>    df_w = n - k; >>    v_b = SSB / df_b; >>    v_w = SSW / df_w; >>    if (v_w != 0) >>      f = v_b / v_w; >>      pval = 1 - fcdf (f, df_b, df_w); >>    elseif (v_b == 0) >>      f = NaN; >>      pval = 1; >> >> 0 / 0 is  NaN and the hypothesis mus be rejected >> >>    else >>      f = Inf; >>      pval = 0; >> >> x / 0 is Inf and hypothesis must be accepted (because there is no >> variance inside the sample but there is outside >> >>    endif >> >> HTH, >> Hannes >> > patch-t_test_2.m (1K) Download Attachment patch-var_test.m (1K) Download Attachment