0

I have an idea for a tensor operation that would not be difficult to implement via iteration, with batch size one. However I would like to parallelize it as much as possible.

I have two tensors with shape (n, 5) called X and Y. X is actually supposed to represent 5 one-dimensional tensors with shape (n, 1): (x_1, ..., x_n). Ditto for Y.

I would like to compute a tensor with shape (n, 25) where each column represents the output of the tensor operation f(x_i, y_j), where f is fixed for all 1 <= i, j <= 5. The operation f has output shape (n, 1), just like x_i and y_i.

I feel it is important to clarify that f is essentially a fully-connected layer from the concatenated [...x_i, ...y_i] tensor with shape (1, 10), to an output layer with shape (1,5).

Again, it is easy to see how to do this manually with iteration and slicing. However this is probably very slow. Performing this operation in batches, where the tensors X, Y now have shape (n, 5, batch_size) is also desirable, particularly for mini-batch gradient descent.

It is difficult to really articulate here why I desire to create this network; I feel it is suited for my domain of 'itemized tabular data' and cuts down significantly on the number of weights per operation, compared to a fully connected network.

Is this possible using tensorflow? Certainly not using just keras. Below is an example in numpy per AloneTogether's request

import numpy as np

features = 16
batch_size = 256

X_batch = np.random.random((features, 5, batch_size))
Y_batch = np.random.random((features, 5, batch_size))

# one tensor operation to reduce weights in this custom 'layer'
f = np.random.random((features, 2 * features))

for b in range(batch_size):
    X = X_batch[:, :, b]
    Y = Y_batch[:, :, b]
    for i in range(5):
        x_i = X[:, i:i+1]
        for j in range(5):
            y_j = Y[:, j:j+1]

            x_i_y_j = np.concatenate([x_i, y_j], axis=0)
            
            # f(x_i, y_j)
            # implemented by a fully-connected layer
            f_i_j = np.matmul(f, x_i_y_j)
4
  • Good question, could you provide a simple example and your desired output? Commented Mar 30, 2022 at 6:03
  • 1
    I've added a simple example in numpy Commented Mar 30, 2022 at 6:20
  • 1
    This is intended to be the first half of a residual block design. The outputs are intended to be X' and Y', with the same shape as X, Y (finally, a skip connection is added). If x_i, y_j are two neurons in the input tensor, then say z_i_j = ReLu(f_i_j) is an output neuron in the middle. Then the output neuron x'_i = ReLu(g(z_i_1) + g(z_i_2) ... + g(z_i_5)) for a fixed tensor operator g. Similarly for the y'. The idea is that the res blocks are invariant to permutations S_5 of X and Y. I wanted to keep my question simple. Really, X and Y are just two classes of many. Commented Mar 30, 2022 at 7:51
  • 1
    Although the number of size of the classes is a hyperparameter for the network. I think it may be possible to tackle the batch size issue by 'cloning' the params of f along certain axis. It's just the crisscross operation of the X and Y that confuse me. I actually can apply f, in a slightly altered form to X and Y separately. Its just adding f(X) and f(Y) in the way described to get the tensor with shape (n_features, 25) that confounds me Commented Mar 30, 2022 at 8:03

1 Answer 1

1

All operations you need (concatenation and matrix multiplication) can be batched. Difficult part here is, that you want to concatenate features of all items in X with features of all items in Y (all combinations). My recommended solution is to expand the dimensions of X to [batch, features, 5, 1], expand dimensions of Y to [batch, features, 1, 5] Than tf.repeat() both tensors so their shapes become [batch, features, 5, 5]. Now you can concatenate X and Y. You will have a tensor of shape [batch, 2*features, 5, 5]. Observe that this way all combinations are built. Next step is matrix multiplication. tf.matmul() can also do batch matrix multiplication, but I use here tf.einsum() because I want more control over which dimensions are considered as batch. Full code:

import tensorflow as tf
import numpy as np

batch_size=3
features=6
items=5

x = np.random.uniform(size=[batch_size,features,items])
y = np.random.uniform(size=[batch_size,features,items])

f = np.random.uniform(size=[2*features,features])

x_reps= tf.repeat(x[:,:,:,tf.newaxis], items, axis=3)
y_reps= tf.repeat(y[:,:,tf.newaxis,:], items, axis=2)

xy_conc = tf.concat([x_reps,y_reps], axis=1)

f_i_j = tf.einsum("bfij, fg->bgij", xy_conc,f)

f_i_j = tf.reshape(f_i_j , [batch_size,features,items*items])
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you very much for your answer. I will look more into einsum before I accept as answer. I mentioned in the comments that I was originally going to break f into f_X and f_Y, both of shape (features, features) and then simply add them to get the z's. But I suspect I can do this and the second layer with ein_sum too!
Side side note: I think breaking them up and adding is literally the same in the case of just X and Y, but I'm going to re-use f_X when I do this same operation with the classes X and W etc, thus reducing the number of weights even more

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.