Radiation Oncology/Medical Statistics/Chi Squared

Χ2 (Chi-Squared)

Overview

 * Used for comparison of two classifications schemes, which may each have multiple categories
 * Purpose is to determine the probability that observed data are (or are not) consistent with the hypothesis H0: the probability of outcomes in the different groups is the same
 * Used to approximate Fisher's Exact Test (2x2) for large numbers:
 * Accuracy of estimation depends on the total number of observations in each cell
 * Expected observations number (calculated from actual observations; see below) should be at least 5 in each cell
 * Used to extend Fisher's Exact Test for comparison of classification schemes with >2 categories

Χ2 for 2x2 Table

 * Used for tables, which are too large for Fisher's Exact Test
 * The process is parallel; please see that page for details of initial set-up
 * Start by assuming that H0 is true, and that p0 = p1 = p2
 * Calculate the expected 2x2 table based on the observed total numbers
 * Expected population "success rate" is p0 = C1 / N
 * Using p0, and the observed Group 1/2, Outcome 1/2 numbers, calculate the expected 2x2 table
 * Compare the expected table to the observed table, by calculating test statistic T
 * One way of calculating T is to evaluate the proportional difference in each cell between the observed and expected values, and then sum them all
 * T = ((O11-E11)2/E12) + ((O12-E12)2/E12) + ((O21-E21)2/E21) + ((O22-E22)2/E22)
 * After some nifty mathematics, this can more simply be calculated from the original observed table
 * T = N * (|O11 * O22 - O12 * O21| - 1/2*N)2 / R1 * R2 * C1 * C2
 * Because T is derived from observed-expected difference, the larger the T, the more different the tables are, and the less likely H0
 * In order to calculate the significance level, we need to evaluate the probability that the observed table was due to random sampling, which is related to the size of T. We also need to evaluate the probability of all the other possible tables that could have been observed (again, same as in Fisher's test)
 * When H0 is true, the probability distribution of T is approximately the same as the probability distribution for the Χ2 function
 * We can therefore approximately determine the probability of observed T by evaluating the Χ2 function at the T level (by looking it up in a table)
 * Because these are approximations, the table typically gives critical values:
 * This is the probability that the observed outcome (and any possible outcomes less likely than this one) occurred due to random sampling only
 * This is the probability that the observed outcome (and any possible outcomes less likely than this one) occurred due to random sampling only

Χ2 for 2x2 Table Example

 * Based on data from PMID 136605, as shown in Using and Understanding Medical Statistics
 * Test question is whether size of bone marrow dose correlates with graft rejection rate


 * T = 8.01
 * From the Χ2 table above, p is between 0.005 and 0.001.
 * We can therefore conclude that p < 0.005 and that high cell dose correlates strongly with engraftment