Hi everyone,
I am using the octave statistics package right now, but have some unexpected (possibly bugged) results. I have attached a log that shows the problem. On a perfect grouping, anova produces p-val of 1 instead of 0. The f-statistic being displayed as -94353808874586048.0000 leads me to the conclusion that some internal number limits where exceeded, which is strange, because the example is very simple, the involved numbers very low etc. Could someone help me with that? Thank you very much! Best, Hannes PS: System is: octave:83> version ans = 3.2.4 octave:84> uname ans = { sysname = Linux nodename = research-macpro01 release = 3.2.0-24-generic version = #38-Ubuntu SMP Tue May 1 16:18:50 UTC 2012 machine = x86_64 } _______________________________________________ Help-octave mailing list [hidden email] https://mailman.cae.wisc.edu/listinfo/help-octave octave_anova.log (2K) Download Attachment |
On 5/30/2012 7:56 AM, Hannes wrote:
> Hi everyone, > > I am using the octave statistics package right now, but have some > unexpected (possibly bugged) results. > > I have attached a log that shows the problem. On a perfect grouping, > anova produces p-val of 1 instead of 0. The f-statistic being displayed as > -94353808874586048.0000 > leads me to the conclusion that some internal number limits where > exceeded, which is strange, because the example is very simple, the > involved numbers very low etc. > > Could someone help me with that? > > Thank you very much! > > Best, > Hannes > > PS: System is: > > octave:83> version > ans = 3.2.4 > octave:84> uname > ans = > { > sysname = Linux > nodename = research-macpro01 > release = 3.2.0-24-generic > version = #38-Ubuntu SMP Tue May 1 16:18:50 UTC 2012 > machine = x86_64 > } > > There is an overflow: octave:4> anova(samples,grps) warning: division by zero One-way ANOVA Table: Source of Variation Sum of Squares df Empirical Var ********************************************************* Between Groups 944.4969 1 944.4969 Within Groups 0.0000 159 0.0000 --------------------------------------------------------- Total 944.4969 160 Test Statistic f Inf p-value 0.0000 ans = 0 octave:5> version ans = 3.6.1 octave:6> uname ans = scalar structure containing the fields: sysname = CYGWIN_NT-6.1-WOW64 nodename = MARCOATZERI release = 1.7.16s(0.261/5/3) version = 20120525 17:01:08 machine = i686 with statistics-1.1.0 Regards Marco _______________________________________________ Help-octave mailing list [hidden email] https://mailman.cae.wisc.edu/listinfo/help-octave |
> > There is an overflow: > > octave:4> anova(samples,grps) > warning: division by zero Note that I do not get a division by 0 warning on this example. But I do get quite a few division by zero warnings on other examples (always when the sample values are all identical). Because some of the results seemed right, I just disabled the warnings, but I don't feel I can trust the results at all right now. > One-way ANOVA Table: > > Source of Variation Sum of Squares df Empirical Var > ********************************************************* > Between Groups 944.4969 1 944.4969 > Within Groups 0.0000 159 0.0000 > --------------------------------------------------------- > Total 944.4969 160 > > Test Statistic f Inf > p-value 0.0000 > > ans = 0 At least you get the desired result (pval==0), while I get the opposite... I have looked into the source code, which states the following: total_mean = mean (y(:)); SSB = sum (group_count .* (group_mean - total_mean) .^ 2); SST = sumsq (reshape (y, n, 1) - total_mean); SSW = SST - SSB; df_b = k - 1; df_w = n - k; v_b = SSB / df_b; v_w = SSW / df_w; f = v_b / v_w; pval = 1 - fcdf (f, df_b, df_w); Now the place where the problematic division takes place is f = v_b / v_w which is variance_between by variance_within. This is the right thing to do according to Wikipedia, but of course variance_within can quite possibly be 0. Now from my statistical intution the following results are expected: v_b != 0 && v_w == 0 -> pval = 1 v_b == 0 && v_w == 0 -> pval = 0 But I am no expert on this... also I don't know what to do if v_w is really close to 0 but not 0. Does anyone know more? > octave:5> version > ans = 3.6.1 I updated to 3.6.1 as well, no change. Best, Hannes _______________________________________________ Help-octave mailing list [hidden email] https://mailman.cae.wisc.edu/listinfo/help-octave |
Ok, I think I got it pinned downed and solved. I have attached a patch.
Heres the explenation: > total_mean = mean (y(:)); > SSB = sum (group_count .* (group_mean - total_mean) .^ 2); is always positive [sum of squares between] > SST = sumsq (reshape (y, n, 1) - total_mean); is always positive and >= SSB [sum of squares total] > SSW = SST - SSB; => should always be >=0 but it is an ill conditioned operation if SST==SSB resulting in possibly negative error. Note that if the error is positive, it doesn't matter because it will produce f == Inf and p = 0 as expected. > df_b = k - 1; > df_w = n - k; > v_b = SSB / df_b; > v_w = SSW / df_w; this can become negative, if the SSW is negative or 0 if SSW is zero or close to zero. > f = v_b / v_w; this results in divide by zero if v_w is zero and results in *wrong result without warning* if v_w is negative. My patch changes this whole block to: SSB = sum (group_count .* (group_mean - total_mean) .^ 2); SST = sumsq (reshape (y, n, 1) - total_mean); SSW = SST - SSB; if (SSW < 0) # machine error if SSB == SSW SSW = 0; endif df_b = k - 1; df_w = n - k; v_b = SSB / df_b; v_w = SSW / df_w; if (v_w != 0) f = v_b / v_w; pval = 1 - fcdf (f, df_b, df_w); elseif (v_b == 0) f = NaN; pval = 1; 0 / 0 is NaN and the hypothesis mus be rejected else f = Inf; pval = 0; x / 0 is Inf and hypothesis must be accepted (because there is no variance inside the sample but there is outside endif HTH, Hannes _______________________________________________ Help-octave mailing list [hidden email] https://mailman.cae.wisc.edu/listinfo/help-octave patch_anova.m (769 bytes) Download Attachment |
Free forum by Nabble | Edit this page |