Problems with Statistics package Classic List Threaded 4 messages Open this post in threaded view
|

Problems with Statistics package

 Hi everyone, I am using the octave statistics package right now, but have some   unexpected (possibly bugged) results. I have attached a log that shows the problem. On a perfect grouping,   anova produces p-val of 1 instead of 0. The f-statistic being   displayed as -94353808874586048.0000 leads me to the conclusion that some internal number limits where   exceeded, which is strange, because the example is very simple, the   involved numbers very low etc. Could someone help me with that? Thank you very much! Best, Hannes PS: System is: octave:83> version ans = 3.2.4 octave:84> uname ans = {    sysname = Linux    nodename = research-macpro01    release = 3.2.0-24-generic    version = #38-Ubuntu SMP Tue May 1 16:18:50 UTC 2012    machine = x86_64 } _______________________________________________ Help-octave mailing list [hidden email] https://mailman.cae.wisc.edu/listinfo/help-octave octave_anova.log (2K) Download Attachment
Open this post in threaded view
|

Re: Problems with Statistics package

 On 5/30/2012 7:56 AM, Hannes wrote: > Hi everyone, > > I am using the octave statistics package right now, but have some > unexpected (possibly bugged) results. > > I have attached a log that shows the problem. On a perfect grouping, > anova produces p-val of 1 instead of 0. The f-statistic being displayed as > -94353808874586048.0000 > leads me to the conclusion that some internal number limits where > exceeded, which is strange, because the example is very simple, the > involved numbers very low etc. > > Could someone help me with that? > > Thank you very much! > > Best, > Hannes > > PS: System is: > > octave:83> version > ans = 3.2.4 > octave:84> uname > ans = > { > sysname = Linux > nodename = research-macpro01 > release = 3.2.0-24-generic > version = #38-Ubuntu SMP Tue May 1 16:18:50 UTC 2012 > machine = x86_64 > } > > There is an overflow: octave:4> anova(samples,grps) warning: division by zero One-way ANOVA Table: Source of Variation   Sum of Squares    df  Empirical Var ********************************************************* Between Groups              944.4969     1       944.4969 Within Groups                 0.0000   159         0.0000 --------------------------------------------------------- Total                       944.4969   160 Test Statistic f                 Inf p-value                       0.0000 ans = 0 octave:5> version ans = 3.6.1 octave:6> uname ans =    scalar structure containing the fields:      sysname = CYGWIN_NT-6.1-WOW64      nodename = MARCOATZERI      release = 1.7.16s(0.261/5/3)      version = 20120525 17:01:08      machine = i686 with statistics-1.1.0 Regards Marco _______________________________________________ Help-octave mailing list [hidden email] https://mailman.cae.wisc.edu/listinfo/help-octave
 Ok, I think I got it pinned downed and solved. I have attached a patch. Heres the explenation: >   total_mean = mean (y(:)); >   SSB = sum (group_count .* (group_mean - total_mean) .^ 2); is always positive [sum of squares between] >   SST = sumsq (reshape (y, n, 1) - total_mean); is always positive and >= SSB [sum of squares total] >   SSW = SST - SSB; => should always be >=0 but it is an ill conditioned operation if SST==SSB resulting in   possibly negative error. Note that if the error is positive, it   doesn't matter because it will produce f == Inf and p = 0 as expected. >   df_b = k - 1; >   df_w = n - k; >   v_b = SSB / df_b; >   v_w = SSW / df_w; this can become negative, if the SSW is negative or 0 if SSW is zero   or close to zero. >   f = v_b / v_w; this results in divide by zero if v_w is zero and results in *wrong   result without warning* if v_w is negative. My patch changes this whole block to:    SSB = sum (group_count .* (group_mean - total_mean) .^ 2);    SST = sumsq (reshape (y, n, 1) - total_mean);    SSW = SST - SSB;    if (SSW < 0) # machine error if SSB == SSW      SSW = 0;    endif    df_b = k - 1;    df_w = n - k;    v_b = SSB / df_b;    v_w = SSW / df_w;    if (v_w != 0)      f = v_b / v_w;      pval = 1 - fcdf (f, df_b, df_w);    elseif (v_b == 0)      f = NaN;      pval = 1; 0 / 0 is  NaN and the hypothesis mus be rejected    else      f = Inf;      pval = 0; x / 0 is Inf and hypothesis must be accepted (because there is no   variance inside the sample but there is outside    endif HTH, Hannes_______________________________________________ Help-octave mailing list [hidden email] https://mailman.cae.wisc.edu/listinfo/help-octave patch_anova.m (769 bytes) Download Attachment