Re: [OctDev] Fwd: Re: Problems with Statistics package

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [OctDev] Fwd: Re: Problems with Statistics package

Arno Onken-2
On Wed, 30 May 2012 11:25:14, marco atzeri wrote:
> Hi Hannes, octave-forge mailing list is the right place for bugs and
> patches on the statistics package

anova.m is not part of the Octave Forge Statistics package. It is part
of Octave core. I am cc'ing this mail along with the patch to the Octave
maintainers list.

Arno

> -------- Original Message --------
> Subject: Re: Problems with Statistics package
> Date: Wed, 30 May 2012 11:06:42 +0200
> From: Hannes <[hidden email]>
> To: [hidden email]
>
> Ok, I think I got it pinned downed and solved. I have attached a patch.
>
> Heres the explenation:
>
>>  total_mean = mean (y(:));
>>  SSB = sum (group_count .* (group_mean - total_mean) .^ 2);
> is always positive [sum of squares between]
>
>>  SST = sumsq (reshape (y, n, 1) - total_mean);
> is always positive and >= SSB [sum of squares total]
>
>>  SSW = SST - SSB;
> => should always be >=0
> but it is an ill conditioned operation if SST==SSB resulting in
> possibly negative error. Note that if the error is positive, it
> doesn't matter because it will produce f == Inf and p = 0 as expected.
>
>>  df_b = k - 1;
>>  df_w = n - k;
>>  v_b = SSB / df_b;
>>  v_w = SSW / df_w;
> this can become negative, if the SSW is negative or 0 if SSW is zero
> or close to zero.
>
>
>>  f = v_b / v_w;
> this results in divide by zero if v_w is zero and results in *wrong
> result without warning* if v_w is negative.
>
>
> My patch changes this whole block to:
>
>    SSB = sum (group_count .* (group_mean - total_mean) .^ 2);
>    SST = sumsq (reshape (y, n, 1) - total_mean);
>    SSW = SST - SSB;
>    if (SSW < 0) # machine error if SSB == SSW
>      SSW = 0;
>    endif
>    df_b = k - 1;
>    df_w = n - k;
>    v_b = SSB / df_b;
>    v_w = SSW / df_w;
>    if (v_w != 0)
>      f = v_b / v_w;
>      pval = 1 - fcdf (f, df_b, df_w);
>    elseif (v_b == 0)
>      f = NaN;
>      pval = 1;
>
> 0 / 0 is  NaN and the hypothesis mus be rejected
>
>    else
>      f = Inf;
>      pval = 0;
>
> x / 0 is Inf and hypothesis must be accepted (because there is no
> variance inside the sample but there is outside
>
>    endif
>
> HTH,
> Hannes
>

patch_anova.m (771 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [OctDev] Fwd: Re: Problems with Statistics package

Hannes
Concerning the matter below, I didn't yet get a response. However I  
realised that most of the tests fail or produce (divide by zero  
errors) if the variance of a sample is 0. Since errors imply a wrong  
behaviour by the user (and e.g. testing two samples with variance 0 in  
var_test is completely legal) I think this should be fixed.

I have attached patches for t_test_2 and var_test. Other tests can  
fixed accordingly.

Thanks for hacking gnu-octave.

Best,
Hannes



Quoting "Arno Onken" <[hidden email]>:

> On Wed, 30 May 2012 11:25:14, marco atzeri wrote:
>> Hi Hannes, octave-forge mailing list is the right place for bugs and
>> patches on the statistics package
>
> anova.m is not part of the Octave Forge Statistics package. It is part
> of Octave core. I am cc'ing this mail along with the patch to the Octave
> maintainers list.
>
> Arno
>
>> -------- Original Message --------
>> Subject: Re: Problems with Statistics package
>> Date: Wed, 30 May 2012 11:06:42 +0200
>> From: Hannes <[hidden email]>
>> To: [hidden email]
>>
>> Ok, I think I got it pinned downed and solved. I have attached a patch.
>>
>> Heres the explenation:
>>
>>>  total_mean = mean (y(:));
>>>  SSB = sum (group_count .* (group_mean - total_mean) .^ 2);
>> is always positive [sum of squares between]
>>
>>>  SST = sumsq (reshape (y, n, 1) - total_mean);
>> is always positive and >= SSB [sum of squares total]
>>
>>>  SSW = SST - SSB;
>> => should always be >=0
>> but it is an ill conditioned operation if SST==SSB resulting in
>> possibly negative error. Note that if the error is positive, it
>> doesn't matter because it will produce f == Inf and p = 0 as expected.
>>
>>>  df_b = k - 1;
>>>  df_w = n - k;
>>>  v_b = SSB / df_b;
>>>  v_w = SSW / df_w;
>> this can become negative, if the SSW is negative or 0 if SSW is zero
>> or close to zero.
>>
>>
>>>  f = v_b / v_w;
>> this results in divide by zero if v_w is zero and results in *wrong
>> result without warning* if v_w is negative.
>>
>>
>> My patch changes this whole block to:
>>
>>    SSB = sum (group_count .* (group_mean - total_mean) .^ 2);
>>    SST = sumsq (reshape (y, n, 1) - total_mean);
>>    SSW = SST - SSB;
>>    if (SSW < 0) # machine error if SSB == SSW
>>      SSW = 0;
>>    endif
>>    df_b = k - 1;
>>    df_w = n - k;
>>    v_b = SSB / df_b;
>>    v_w = SSW / df_w;
>>    if (v_w != 0)
>>      f = v_b / v_w;
>>      pval = 1 - fcdf (f, df_b, df_w);
>>    elseif (v_b == 0)
>>      f = NaN;
>>      pval = 1;
>>
>> 0 / 0 is  NaN and the hypothesis mus be rejected
>>
>>    else
>>      f = Inf;
>>      pval = 0;
>>
>> x / 0 is Inf and hypothesis must be accepted (because there is no
>> variance inside the sample but there is outside
>>
>>    endif
>>
>> HTH,
>> Hannes
>>
>


patch-t_test_2.m (1K) Download Attachment
patch-var_test.m (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [OctDev] Fwd: Re: Problems with Statistics package

Carnë Draug-2
Hi Hannes

thanks for this. As it was mentioned before, the bug is on octave
core, not in the statistics package. Do you think you could report
this here http://savannah.gnu.org/bugs/?group=octave with the patch? I
fear that the subject line may be a reason for the octave core devs to
ignore it and the bug tracker is the recommended place for them. If
you can generate an hg changeset, that would be even better.

Thank you,
Carnë

On 12 June 2012 10:34, Hannes <[hidden email]> wrote:

> Concerning the matter below, I didn't yet get a response. However I realised
> that most of the tests fail or produce (divide by zero errors) if the
> variance of a sample is 0. Since errors imply a wrong behaviour by the user
> (and e.g. testing two samples with variance 0 in var_test is completely
> legal) I think this should be fixed.
>
> I have attached patches for t_test_2 and var_test. Other tests can fixed
> accordingly.
>
> Thanks for hacking gnu-octave.
>
> Best,
> Hannes
>
>
>
>
> Quoting "Arno Onken" <[hidden email]>:
>
>> On Wed, 30 May 2012 11:25:14, marco atzeri wrote:
>>>
>>> Hi Hannes, octave-forge mailing list is the right place for bugs and
>>> patches on the statistics package
>>
>>
>> anova.m is not part of the Octave Forge Statistics package. It is part
>> of Octave core. I am cc'ing this mail along with the patch to the Octave
>> maintainers list.
>>
>> Arno
>>
>>> -------- Original Message --------
>>> Subject: Re: Problems with Statistics package
>>> Date: Wed, 30 May 2012 11:06:42 +0200
>>> From: Hannes <[hidden email]>
>>> To: [hidden email]
>>>
>>> Ok, I think I got it pinned downed and solved. I have attached a patch.
>>>
>>> Heres the explenation:
>>>
>>>>  total_mean = mean (y(:));
>>>>  SSB = sum (group_count .* (group_mean - total_mean) .^ 2);
>>>
>>> is always positive [sum of squares between]
>>>
>>>>  SST = sumsq (reshape (y, n, 1) - total_mean);
>>>
>>> is always positive and >= SSB [sum of squares total]
>>>
>>>>  SSW = SST - SSB;
>>>
>>> => should always be >=0
>>> but it is an ill conditioned operation if SST==SSB resulting in
>>> possibly negative error. Note that if the error is positive, it
>>> doesn't matter because it will produce f == Inf and p = 0 as expected.
>>>
>>>>  df_b = k - 1;
>>>>  df_w = n - k;
>>>>  v_b = SSB / df_b;
>>>>  v_w = SSW / df_w;
>>>
>>> this can become negative, if the SSW is negative or 0 if SSW is zero
>>> or close to zero.
>>>
>>>
>>>>  f = v_b / v_w;
>>>
>>> this results in divide by zero if v_w is zero and results in *wrong
>>> result without warning* if v_w is negative.
>>>
>>>
>>> My patch changes this whole block to:
>>>
>>>   SSB = sum (group_count .* (group_mean - total_mean) .^ 2);
>>>   SST = sumsq (reshape (y, n, 1) - total_mean);
>>>   SSW = SST - SSB;
>>>   if (SSW < 0) # machine error if SSB == SSW
>>>     SSW = 0;
>>>   endif
>>>   df_b = k - 1;
>>>   df_w = n - k;
>>>   v_b = SSB / df_b;
>>>   v_w = SSW / df_w;
>>>   if (v_w != 0)
>>>     f = v_b / v_w;
>>>     pval = 1 - fcdf (f, df_b, df_w);
>>>   elseif (v_b == 0)
>>>     f = NaN;
>>>     pval = 1;
>>>
>>> 0 / 0 is  NaN and the hypothesis mus be rejected
>>>
>>>   else
>>>     f = Inf;
>>>     pval = 0;
>>>
>>> x / 0 is Inf and hypothesis must be accepted (because there is no
>>> variance inside the sample but there is outside
>>>
>>>   endif
>>>
>>> HTH,
>>> Hannes
>>>
>>
>