I have 168 rows of patient data: 104 controls and 64 cases. I want to know if albumin status (low or high) is related to case/control status. I made a table using R:
> table(Albumin, Status, useNA = "ifany")
Albumin Control Case
Low 51 16
High 39 32
<NA> 14 16
As you can see, I have missing data. I did a chi-squared test on the entire table:
> chisq.test(table(Albumin, Status, useNA = "ifany"))$p.value
[1] 0.006222513
Question: Should I perform the test on the 3x2 table above that includes the missing data? Or should I perform it on a 2x2 table that excludes the missing data, as shown below?
> chisq.test(table(Albumin, Status))$p.value
[1] 0.01496166
Problem: In this example, both approaches yield significant p-values. However, I have other variables for which the difference is insignificant when missing values are excluded, but significant when they are included. I have some variables with only one missing value, as well.
Question: How should I apply the chi-squared test in those situations? Is my choice of test correct, or should I be using Fisher's exact test or some other test? And are there any diagnostics that I need to do before even applying these tests?