0

I have a data-set in which I need to perform a Poisson regression analysis of how number of visits to the doctor in the two week period varies as a function of age group (i.e.<30, between 30 and 50, and >50), sex and illness. Holding sex and number of illnesses constant at their mean values.

Here is a sample of my data:

 visits gender age illness
     1 female  19       1
     1 female  19       1
     1   male  19       3
     1   male  19       1
     1   male  19       2
     1 female  19       5
     1 female  19       4
     1 female  19       3
     1 female  19       2
     1   male  19       1

However, I do not know how to go about this as I don't know how to correctly input these groups. As I need to discover the predicted rates of visits to a doctor over a two week period for different age groups.

I know how to input the initial equation: glm(visits ~ age + gender + illness, data=DoctorVisits, family=poisson)

But I do not know how I would go about creating the predict function.

4
  • Start with creating age groups in your training data using cut. Commented Dec 28, 2016 at 16:23
  • @Roland What function or operator would I use for the age group 'between 30 and 50'? Thanks for the help. Commented Dec 28, 2016 at 16:27
  • I have told you: cut. Study its documentation. Commented Dec 28, 2016 at 16:29
  • To use the predict function with the output of the glm, first create a data frame with the columns names the same as your model. In this case a 3 column data frame with age, gender and illness. Since I am assuming age is an integer, then to obtain a prediction for the group between 30-50 you will either have to take some type of average or go back to the original dataset and turn the age column into a factor with the cut function as Roland suggested. Commented Dec 28, 2016 at 16:44

1 Answer 1

0

Say you want to predict for male 21 y/o and illness = 3

predict(your_glm, 
newdata = data.frame(gender = "male", age = 21, illness = 3),
type = "response")

Here you basically are creating a data frame with obseravations you want the predictions for inside the function. If you have several observations you want the prediction for then it may be more sensible to create the data frame separately first and then feed it to predict function, just swap "newdata=" to "data=".

type = "response" will give you the prediction in the same format as in glm, otherwise it will be log ods.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.