Chi2 test

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Chi2 test

Javier Garcia Falaguera
Good afternoon, 

I'm trying to perform a chi2 test in octave, but I cannot find any sentence or function to perform.
My Octave version is the 4.4.0.

I have tried also to install the package but I have the following error

error: the following dependencies were unsatisfied:
   statistics needs io >= 1.0.18

I do not know how to continue.

Many thanks or the help.

KR

JGF


Reply | Threaded
Open this post in threaded view
|

Re: Chi2 test

James Sherman
On Sat, Oct 13, 2018 at 12:21 PM Javier Garcia Falaguera <[hidden email]> wrote:
Good afternoon, 

I'm trying to perform a chi2 test in octave, but I cannot find any sentence or function to perform.
My Octave version is the 4.4.0.

I have tried also to install the package but I have the following error

error: the following dependencies were unsatisfied:
   statistics needs io >= 1.0.18

I do not know how to continue.

Many thanks or the help.

KR

JGF


Hi Javier,

That error message is telling you that you need to install(and load) the io package (version 1.0.18).  If you can access the octave forge repository, you can do:
>>pkg install -forge io
>>pkg load io

and you should be good to go.

Hope this helps,

James Sherman Jr.


Reply | Threaded
Open this post in threaded view
|

Re: Chi2 test

Javier Garcia Falaguera
Good evening James, 

Could you please take a look to this problem I have.

Im trying to perform the chisquare_test_homogeneity over two different vectors.

The data is:

d1=[417.76 3753.7 2977.3 1780.7 1068.4 651.9 447.33 264.95 189.78 145.41 101.05 86.262 73.939 64.081 40.667 34.505 33.273 30.808 22.182 20.949 13.556]
d2=[1683         3811 3053 1886 874         382         213         95         88         39         28         22         13         7         8         5         2         3         2         2         2]
c1=[0.023734 0.071202 0.11867 0.16614 0.21361 0.26107 0.30854 0.35601 0.40348 0.45095 0.49842 0.54588 0.59335 0.64082 0.68829 0.73576 0.78322 0.83069 0.87816 0.92563 0.9731]

The instruction I am using is the following one:

[pval_mu, chisq_mu, df_mu] = chisquare_test_homogeneity (d1, d2, c1);

After the execution, the values are:

pval_mu = 0;
chisq_mu = NaN;
df_mu = 21;

I dont understand this behavior, because I have tried with another set of data and it worked.

It is a .m where I use this test 2 times.

Could you please give it a look?

Kind regards, 


JGF 

El sáb., 13 oct. 2018 a las 18:45, James Sherman Jr. (<[hidden email]>) escribió:
On Sat, Oct 13, 2018 at 12:21 PM Javier Garcia Falaguera <[hidden email]> wrote:
Good afternoon, 

I'm trying to perform a chi2 test in octave, but I cannot find any sentence or function to perform.
My Octave version is the 4.4.0.

I have tried also to install the package but I have the following error

error: the following dependencies were unsatisfied:
   statistics needs io >= 1.0.18

I do not know how to continue.

Many thanks or the help.

KR

JGF


Hi Javier,

That error message is telling you that you need to install(and load) the io package (version 1.0.18).  If you can access the octave forge repository, you can do:
>>pkg install -forge io
>>pkg load io

and you should be good to go.

Hope this helps,

James Sherman Jr.


Reply | Threaded
Open this post in threaded view
|

Re: Chi2 test

nrjank

After the execution, the values are:

pval_mu = 0;
chisq_mu = NaN;
df_mu = 21;

I dont understand this behavior, because I have tried with another set of data and it worked.



what do you expect is the correct output?  


Reply | Threaded
Open this post in threaded view
|

Re: Chi2 test

Javier Garcia Falaguera
Hi Nicholas,

Thanks for the support.

The expected result is something like:

pval_mu = something aprox to 0, but no 0.0
chisq_mu = a positive value
df_mu = 21 (not a value of interest);

Kind regards, 

JGF

El mié., 13 feb. 2019 a las 22:54, Nicholas Jankowski (<[hidden email]>) escribió:

After the execution, the values are:

pval_mu = 0;
chisq_mu = NaN;
df_mu = 21;

I dont understand this behavior, because I have tried with another set of data and it worked.



what do you expect is the correct output?  


Reply | Threaded
Open this post in threaded view
|

Re: Chi2 test

nrjank
(fyi, the convention is to trim text and bottompost on this help-list)

On Wed, Feb 13, 2019 at 4:58 PM Javier Garcia Falaguera <[hidden email]> wrote:
Hi Nicholas,

Thanks for the support.

The expected result is something like:

pval_mu = something aprox to 0, but no 0.0
chisq_mu = a positive value
df_mu = 21 (not a value of interest);



running the same data, I get the following:

 pval_mu =  NaN
chisq_mu =  NaN
df_mu =  21


looking at the function (you can type  'edit chisquare_test_homogeneity' to open m-file functions and step through the code)

df is just the length of c, so df = 21

chisq gives NaN because the values used for d1 and c, and also d2 and c, produce a divide by zero error.  See the operations performed on lines 55, 58, and then 59.  

I don't know the chisquare test operation well enough to identify any errors in the code, but essectially the values of c are very small relative to the d1 and d2 values, producing zeros for the internal variables n_x and n_y. divide by (n_x+n_y) appears in the chisq calculation, so this obviously is a divide by zero error, or NaN.

I have access to a copy of matlab and it appears they do not have this function.  Can you point to some other implementation of the test that we can compare results?


Reply | Threaded
Open this post in threaded view
|

Re: Chi2 test

Juan Pablo Carbajal-2
The problem is that your binning vector C doesn't include the values
of the variabes d1 and d2.
The function is doing a very crude
quantization/binning/cutting/partition of the data, similar to what
hist does, but in a more innefficient way.
You need to make sure the vecotr C covers the span of the data (or
similar to), currently they differ by orders of magnitude.


Reply | Threaded
Open this post in threaded view
|

Re: Chi2 test

Javier Garcia Falaguera
Hi people, 

Many thanks for the support. I just find out that this function 
[pval_mu, chisq_mu, df_mu] = chisquare_test_homogeneity (d1, d2, c1)

Doesnt allow any of the values of c1 to be less than 0 (even 0.99 induces to error).

As commented in some mail before, the result is that we get a division by 0, lead to NaN error. It could be great to check it and change it at some point.

If you had the time to check it soon, it could be very helpfull.

Kind regards, 

JGF

El lun., 18 feb. 2019 a las 9:19, Juan Pablo Carbajal (<[hidden email]>) escribió:
The problem is that your binning vector C doesn't include the values
of the variabes d1 and d2.
The function is doing a very crude
quantization/binning/cutting/partition of the data, similar to what
hist does, but in a more innefficient way.
You need to make sure the vecotr C covers the span of the data (or
similar to), currently they differ by orders of magnitude.


Reply | Threaded
Open this post in threaded view
|

Re: Chi2 test

nrjank
do you have a 

On Thu, Feb 21, 2019 at 12:44 PM Javier Garcia Falaguera <[hidden email]> wrote:
Hi people, 

Many thanks for the support. I just find out that this function 
[pval_mu, chisq_mu, df_mu] = chisquare_test_homogeneity (d1, d2, c1)

Doesnt allow any of the values of c1 to be less than 0 (even 0.99 induces to error).

As commented in some mail before, the result is that we get a division by 0, lead to NaN error. It could be great to check it and change it at some point.


As I asked in an earlier email, do you have a good reference we can use for expected behavior, inputs/outputs, maybe a paper or another free open source implementation that we could use as a example... anything? 


Reply | Threaded
Open this post in threaded view
|

Re: Chi2 test

nrjank


On Thu, Feb 21, 2019 at 6:34 PM Nicholas Jankowski <[hidden email]> wrote:
do you have a 

On Thu, Feb 21, 2019 at 12:44 PM Javier Garcia Falaguera <[hidden email]> wrote:
Many thanks for the support. I just find out that this function 
[pval_mu, chisq_mu, df_mu] = chisquare_test_homogeneity (d1, d2, c1)

Doesnt allow any of the values of c1 to be less than 0 (even 0.99 induces to error).

As commented in some mail before, the result is that we get a division by 0, lead to NaN error. It could be great to check it and change it at some point.


As I asked in an earlier email, do you have a good reference we can use for expected behavior, inputs/outputs, maybe a paper or another free open source implementation that we could use as a example... anything? 

a trivial test: 

>> [a b c]=chisquare_test_homogeneity([.1 .2 .3],[.2 .3 .1],[.7 .8 .9])
a =  1
b = 0
c =  3

no error, and c < 1

just to clarify, NaN is not necessarily an error. it's a result (unless you are getting an actual error message, in which case please report the error message). in this case a divide by zero is caused by the lines:

(x is your d1, y is your d2, c is your c1)

55: n_x   = sum (x * ones (1, df+1) < ones (l_x, 1) * c);
58: n_y   = sum (y * ones (1, df+1) < ones (l_y, 1) * c);
59: chisq = l_x * l_y * sum ((n_x/l_x - n_y/l_y).^2 ./ (n_x + n_y));

There is no error from putting in c<1 that I can see. the NaN comes from c being smaller than d1 and d2, making n_x and n_y zero, making chisq a divide by zero.

If this isn't what should happen, you should provide some reference that can describe a desired result.