# Problems with Statistics package

4 messages
Open this post in threaded view
|
Report Content as Inappropriate

## Problems with Statistics package

 Hi everyone, I am using the octave statistics package right now, but have some   unexpected (possibly bugged) results. I have attached a log that shows the problem. On a perfect grouping,   anova produces p-val of 1 instead of 0. The f-statistic being   displayed as -94353808874586048.0000 leads me to the conclusion that some internal number limits where   exceeded, which is strange, because the example is very simple, the   involved numbers very low etc. Could someone help me with that? Thank you very much! Best, Hannes PS: System is: octave:83> version ans = 3.2.4 octave:84> uname ans = {    sysname = Linux    nodename = research-macpro01    release = 3.2.0-24-generic    version = #38-Ubuntu SMP Tue May 1 16:18:50 UTC 2012    machine = x86_64 } _______________________________________________ Help-octave mailing list [hidden email] https://mailman.cae.wisc.edu/listinfo/help-octave octave_anova.log (2K) Download Attachment
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Problems with Statistics package

 On 5/30/2012 7:56 AM, Hannes wrote: > Hi everyone, > > I am using the octave statistics package right now, but have some > unexpected (possibly bugged) results. > > I have attached a log that shows the problem. On a perfect grouping, > anova produces p-val of 1 instead of 0. The f-statistic being displayed as > -94353808874586048.0000 > leads me to the conclusion that some internal number limits where > exceeded, which is strange, because the example is very simple, the > involved numbers very low etc. > > Could someone help me with that? > > Thank you very much! > > Best, > Hannes > > PS: System is: > > octave:83> version > ans = 3.2.4 > octave:84> uname > ans = > { > sysname = Linux > nodename = research-macpro01 > release = 3.2.0-24-generic > version = #38-Ubuntu SMP Tue May 1 16:18:50 UTC 2012 > machine = x86_64 > } > > There is an overflow: octave:4> anova(samples,grps) warning: division by zero One-way ANOVA Table: Source of Variation   Sum of Squares    df  Empirical Var ********************************************************* Between Groups              944.4969     1       944.4969 Within Groups                 0.0000   159         0.0000 --------------------------------------------------------- Total                       944.4969   160 Test Statistic f                 Inf p-value                       0.0000 ans = 0 octave:5> version ans = 3.6.1 octave:6> uname ans =    scalar structure containing the fields:      sysname = CYGWIN_NT-6.1-WOW64      nodename = MARCOATZERI      release = 1.7.16s(0.261/5/3)      version = 20120525 17:01:08      machine = i686 with statistics-1.1.0 Regards Marco _______________________________________________ Help-octave mailing list [hidden email] https://mailman.cae.wisc.edu/listinfo/help-octave
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Problems with Statistics package

 > > There is an overflow: > > octave:4> anova(samples,grps) > warning: division by zero Note that I do not get a division by 0 warning on this example. But I   do get quite a few division by zero warnings on other examples (always   when the sample values are all identical). Because some of the results   seemed right, I just disabled the warnings, but I don't feel I can   trust the results at all right now. > One-way ANOVA Table: > > Source of Variation   Sum of Squares    df  Empirical Var > ********************************************************* > Between Groups              944.4969     1       944.4969 > Within Groups                 0.0000   159         0.0000 > --------------------------------------------------------- > Total                       944.4969   160 > > Test Statistic f                 Inf > p-value                       0.0000 > > ans = 0 At least you get the desired result (pval==0), while I get the opposite... I have looked into the source code, which states the following:    total_mean = mean (y(:));    SSB = sum (group_count .* (group_mean - total_mean) .^ 2);    SST = sumsq (reshape (y, n, 1) - total_mean);    SSW = SST - SSB;    df_b = k - 1;    df_w = n - k;    v_b = SSB / df_b;    v_w = SSW / df_w;    f = v_b / v_w;    pval = 1 - fcdf (f, df_b, df_w); Now the place where the problematic division takes place is    f = v_b / v_w which is variance_between by variance_within. This is the right thing   to do according to Wikipedia, but of course variance_within can quite   possibly be 0. Now from my statistical intution the following results are expected: v_b != 0 && v_w == 0 -> pval = 1 v_b == 0 && v_w == 0 -> pval = 0 But I am no expert on this... also I don't know what to do if v_w is   really close to 0 but not 0. Does anyone know more? > octave:5> version > ans = 3.6.1 I updated to 3.6.1 as well, no change. Best, Hannes _______________________________________________ Help-octave mailing list [hidden email] https://mailman.cae.wisc.edu/listinfo/help-octave