Timeseries binary-decision problem

I have data from an experiment in which subjects repeatedly completed a specific task that required a forced binary decision (option A vs. B). I only observe this decision at the end of each recording. As predictors/IVs, I collect sensor data at ~100 Hz, which is usually changing rather slowly over time.

I would like to take the sensor data as predictors and develop a model of the decision-making process: the binary decision and its uncertainty. The model should then be able to predict the decision/probability and its uncertainty given new sensor data. I suppose what I need to use in PyMC is some sort of Bernoulli/Beta regression, but I am unsure how to best incorporate/address the time-dependency. Any thoughts/advice would be greatly appreciated. :slight_smile:

Perhaps @jessegrabowski has some idea

1 Like

If you could directly observe the “intent to choose A”, what function would you use to connect the sensor readings to that intent? The answer to that question will guide how to build a latent timeseries model.

Thanks for the prompt response. And good question. One option could be that the sensor readings directly map to the decision/probability, e.g., through some linear/logistic function. But I could also imagine that the sensor readings resemble pieces of evidence that, when accumulating them over time, drive the decision towards either A or B.

If there’s no accumulation process, there’s no time series model (since only the sensor reading at the time of the decision would matter). So I think it would be good to think about how that accumulation should look, then you can take the last value of the latent process and use it as a logit for the bernoulli probability.

1 Like

Sorry, I missed one possibly important detail: I don’t know when exactly subjects made the decision, I can only observe it towards the end of each recording, but the subject likely decided already seconds earlier.

One approach to use here is to do what Don Rubin does and ask yourself what you’d do if you were able to observe all the data. Here, what you have is a sequence of covariates x_1, \ldots, x_N and you know a decision of y was made at some step n \leq N. You have the problem that you don’t observe n, which isn’t a problem in either Bayesian or frequentist inference because you can marginalize the likelihood.

A simple way to do all of this conceptually is to let \phi = f(x, \theta) \in \Delta^{N-1} be a simplex (i.e., N non-negative values that sum to 1) defined as a function of the covariates x_{1, \ldots, N} and the model parameters \theta (presumably in a regression that only depends on the past at each point, perhaps as a sequential Bernoulli decision process), so that your unknown decision time is generated

  • n \sim \textrm{categorical}(\phi)

Then if you have the outcome generated conditional on the choice time as

  • Y \sim p(x, n, \theta),

presumably also only depending on the values of x_{1, \ldots, n}, then you can marginalize out the unobserved n to get

  • p(Y | x, \theta) = \sum_{n=1}^N p(Y, n \mid x, \theta).

You can define a simplex sequentially through a sequential decision process, for example, letting

\Pr[n = n' \mid n \geq n'] = g(x_{1, \ldots, n'}, \theta),

and then multiplying out to get

  • \Pr[n = 0] = 0
  • \Pr[n = n'] = \Pr[n = n' \mid n \geq n'] \cdot \prod_{n'' < n'} (1 - \Pr[n = n'']).
3 Likes