0
$\begingroup$

I have a dataset that has 6 input features and 5 output features. I want to use a keras sequential model to estimate the mean vector and covariance matrix from any row of input features assuming the output features to be following Multivariate Normal Distribution.

That is for my dataset for any row of 6 input features, I want to get a mean vector of 5 values and a 5*5 covariance matrix.

sample=pd.DataFrame({'X1':[1,2,3,4,5,6],
              'X2':[1,3,1,5,2,7],
              'X3':[3,0,0,7,5,0],
              'X4':[0,4,3,2,5,8],
              'X5':[9,7,0,2,4,5],
              'X6':[1,1,8,7,0,0],
              'Y1':[0.5,1.2,6.3,4.5,1.5,6.6],
              'Y2':[6.1,4.3,2.1,1.5,4.2,8.7],
              'Y3':[0,0,3.2,3.7,5.5,0.2],
              'Y4':[0.5,1.4,8.3,5.2,1.5,1.8],
              'Y5':[2.9,1.7,6.3,5.2,9.4,1.5]})
sample
    X1  X2  X3  X4  X5  X6  Y1  Y2  Y3  Y4  Y5
0   1   1   3   0   9   1   0.5 6.1 0.0 0.5 2.9
1   2   3   0   4   7   1   1.2 4.3 0.0 1.4 1.7
2   3   1   0   3   0   8   6.3 2.1 3.2 8.3 6.3
3   4   5   7   2   2   7   4.5 1.5 3.7 5.2 5.2
4   5   2   5   5   4   0   1.5 4.2 5.5 1.5 9.4
5   6   7   0   8   5   0   6.6 8.7 0.2 1.8 1.5

For loss function I am using the following, which maximizes the log probability.

def lossF(y_true, mu, cov):

  dist = tfp.distributions.MultivariateNormalTriL(loc=mu, scale_tril=tf.linalg.cholesky(cov))
  return tf.reduce_mean(-dist.log_prob(y_true))

I am trying something like below, but getting confused in the middle.

#X_train has 6 values in each row
#y_train has 5 values in each row
#y_pred should be either a distribution function or mu & cov for each row

opt = Adam(learning_rate=0.001)
inputs = Input(shape=(6,))
layer1 = Dense(24, activation='relu')(inputs)
layer2 = Dense(12, activation='relu')(layer1)
predictions = ???
model = Model(inputs=???, outputs=???)
model.compile(optimizer=opt, loss=loss_fn)
model.fit(X_train, y_train, epochs=100, batch_size=100)
y_pred=model.predict(X_test)

Note: instead of getting mu and cov separately, if its possible to get distribution function as output that would be helpful too.

$\endgroup$
2
  • $\begingroup$ ML model learn from huge number of instances. What you are trying to do doesn’t really make sense. $\endgroup$ Commented Dec 3, 2020 at 23:46
  • $\begingroup$ @lcrmorin I do have huge number of instances. The dataset in question is just an example of how the data looks like. $\endgroup$ Commented Dec 5, 2020 at 4:53

1 Answer 1

0
$\begingroup$

Given that the covariance matrix has to be positive definite, the cholesky decomposition is a good way to solve this problem. So the output of the network will be the mean vector mu and the upper triangular part of the cholesky matrix (denoted T here). The diagonal of this matrix must be positive elements (the diagonal of the covariance matrix are standard deviations):

p = y_train.shape[1] # dimension of the covariance matrix 
inputs = Input(shape=(6,))
layer1 = Dense(24, activation='relu')(inputs)
layer2 = Dense(12, activation='relu')(layer1)
mu = Dense(p, activation = "linear")(layer1)
T1 = Dense(p, activation="exponential")(layer1)# diagonal of T
T2 = Dense((p*(p-1)/2), activation="linear")(layer1)
outputs = Concatenate()([mu, T1, T2]) 

Now let's define the loss function. Firstly, let's define the function that will extract the outputs of the network:

def mu_sigma(output):
    mu = output[0][0:p]
    T1 = output[0][p:2*p]
    T2 = output[0][2*p:]
    ones = tf.ones((p,p), dtype=tf.float32) 
    mask_a = tf.linalg.band_part(ones, 0, -1)  
    mask_b = tf.linalg.band_part(ones, 0, 0)  
    mask = tf.subtract(mask_a, mask_b) 
    zero = tf.constant(0, dtype=tf.float32)
    non_zero = tf.not_equal(mask, zero)
    indices = tf.where(non_zero)
    T2 = tf.sparse.SparseTensor(indices,T2,dense_shape=tf.cast((p,p),
         dtype=tf.int64))
    T2 = tf.sparse.to_dense(T2)
    T1 = tf.linalg.diag(T1)
    sigma = T1 + T2
    return mu, sigma

Now for the loss function:

from tensorflow_probability import distributions as tfd
def gnll_loss(y, pred):
    mu, sigma = mu_sigma(pred)
    gm = tfd.MultivariateNormalTriL(loc=mu, scale_tril=sigma)
    log_likelihood = gm.log_prob(y)          
    return - tf.math.reduce_sum(log_likelihood)
$\endgroup$
2
  • $\begingroup$ tf.sparse.SparseTensor(indices,sigma2,dense_shape=tf.cast((10,10), dtype=tf.int64)) could you please tell me what's sigma2 and why dense_shape=tf.cast((10,10), dtype=tf.int64) thanks for your answer. $\endgroup$ Commented Apr 29, 2021 at 18:27
  • $\begingroup$ sorry its tf.sparse.SparseTensor(indices,T2,dense_shape=tf.cast((p,p), dtype=tf.int64)) $\endgroup$ Commented Apr 30, 2021 at 6:59

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.