How to handle missing data when determining differences between groups using chi-squared or Fisher's exact test

Question

I have 168 rows of patient data: 104 controls and 64 cases. I want to know if albumin status (low or high) is related to case/control status. I made a table using R:

> table(Albumin, Status, useNA = "ifany")
Albumin    Control  Case
    Low    51       16
    High   39       32
    <NA>   14       16

As you can see, I have missing data. I did a chi-squared test on the entire table:

> chisq.test(table(Albumin, Status, useNA = "ifany"))$p.value
[1] 0.006222513

Question: Should I perform the test on the 3x2 table above that includes the missing data? Or should I perform it on a 2x2 table that excludes the missing data, as shown below?

> chisq.test(table(Albumin, Status))$p.value
[1] 0.01496166

Problem: In this example, both approaches yield significant p-values. However, I have other variables for which the difference is insignificant when missing values are excluded, but significant when they are included. I have some variables with only one missing value, as well.

Question: How should I apply the chi-squared test in those situations? Is my choice of test correct, or should I be using Fisher's exact test or some other test? And are there any diagnostics that I need to do before even applying these tests?

Peter Flom · Accepted Answer · 2012-02-18 17:04:55Z

5

Unless there is some specific reason for people being NA, and unless you are interested in that reason, then I would say to not include people who are missing.

You don't need an exact test here; all the cell sizes are reasonable.

However 1) Don't you want some form of regression instead? and 2) Why is Albumin dichotomized into low and high? Dichotomizing continuous variables is usually a bad idea (see Royston, Altman & Sauerbrei).

If you have actual values for albumin, I suggest a linear regression albumin~case, possibly with other covariates added, if you have data. This is especially important if this is an observational study, but is still worthwhile if it is an experimental one, because covariates can vary between groups, even if assignment is random, and because covariates can affect other regressors.

answered Feb 18, 2012 at 17:04

Peter Flom

141k37 gold badges201 silver badges484 bronze badges

2

$\begingroup$ +1, thanks for your suggestion! [Also, I didn’t explain the full scope of my study above. It’s a study of new molecular prognostic markers in cancer. Albumin status is one element of a previously published prognostic score that is currently used in the clinic, and the published guidelines call for it to be dichotomized based on the results of previous studies. I am trying to set up “Table 1” (i.e., study population characteristics) of my paper and want to report all elements of the currently accepted prognostic score in my patients.] $\endgroup$

Alexander
– Alexander

2012-02-18 17:45:11 +00:00
Commented Feb 18, 2012 at 17:45

Add a comment |

Stack Exchange Network

How to handle missing data when determining differences between groups using chi-squared or Fisher's exact test

1 Answer 1

Your Answer

Hot Network Questions

How to handle missing data when determining differences between groups using chi-squared or Fisher's exact test

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions