1

I need to load from text files rows that contain string representations of 2D arrays, for later use in training a Tensorflow CNN, but I cannot get the strings converted into a format Tensorflow likes. I have tried all sorts of combinations of apply/map/various functions, but always get some cryptic error. Below is a toy example code that is close to working, but still throws an error:

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray)

import tensorflow as tf
import numpy as np
import pandas as pd
from ast import literal_eval

def df_to_dataset(dataframe):
    Y = tf.convert_to_tensor( dataframe['Y'].values )
    X = tf.convert_to_tensor(
         dataframe['X'].apply(literal_eval).apply(np.array).values
       )
    return tf.data.Dataset.from_tensor_slices( ( X , Y ) 
)

data = [[ 1, "[[0,1],[0,1]]" ] , [ 0 , "[[1,0],[1,0]]" ]]
df = pd.DataFrame(data, columns=['Y','X'])
dataset = df_to_dataset(df)
for feature in dataset.take(1):
    print( feature )
0

1 Answer 1

1

So your dataframe displays as:

In [161]: df
Out[161]: 
   Y              X
0  1  [[0,1],[0,1]]
1  0  [[1,0],[1,0]]

Though that doesn't show the string quotes.

In [162]: df['Y'].values
Out[162]: array([1, 0])

THe X column is a 1d array of strings, object dtype:

In [163]: df['X'].values
Out[163]: array(['[[0,1],[0,1]]', '[[1,0],[1,0]]'], dtype=object)

With the eval, values is now a array of lists:

In [164]: from ast import literal_eval
In [165]: df['X'].apply(literal_eval)
Out[165]: 
0    [[0, 1], [0, 1]]
1    [[1, 0], [1, 0]]
Name: X, dtype: object
In [166]: df['X'].apply(literal_eval).values
Out[166]: array([list([[0, 1], [0, 1]]), list([[1, 0], [1, 0]])], dtype=object)

But if instead we extract it as a list:

In [168]: df['X'].apply(literal_eval).to_list()
Out[168]: [[[0, 1], [0, 1]], [[1, 0], [1, 0]]]

We can easily turn that into an array:

In [169]: np.array(_)
Out[169]: 
array([[[0, 1],
        [0, 1]],

       [[1, 0],
        [1, 0]]])

Back to the array form, we can "reduce" that using stack

In [170]: np.stack(df['X'].apply(literal_eval).values)
Out[170]: 
array([[[0, 1],
        [0, 1]],

       [[1, 0],
        [1, 0]]])

stack is like concatenate or vstack except it adds a dimension, acting more like np.array.

Now the tensorflow conversion should work.

Your second apply, only changes the array of lists into an array of arrays.

In [174]: df['X'].apply(literal_eval).apply(np.array).values
Out[174]: 
array([array([[0, 1],
              [0, 1]]), array([[1, 0],
                               [1, 0]])], dtype=object)

np.stack works on that as well.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.