-1

I have a CSV file, which has one column with image data. Before saving to CSV each image was a 3D numpy array. So each cell of this column was a 3D array. After saving to CSV and reading using pandas they converted to string. Now I want to recreate an array from them. Below you can find a sample of string which I want to convert to 3D numpy array.

import numpy as np

my_string_array = str(np.random.randint(0, high=255, size=(51, 52, 3)))

I tried the staff described here how to read numpy 2D array from string?, but seems that I need to have something different, since I have 3D array.

I know that if the arrays were converted to list before saving to CSV, then

import ast
my_array = np.array(ast.literal_eval(my_string_array))

would work, but unfortunately this is not the case. After running this I got an error:

Traceback (most recent call last):

  File "/opt/lyp-venv/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3319, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "<ipython-input-25-3e5a6dae7682>", line 2, in <module>
    my_array = np.array(ast.literal_eval(my_string_array))

  File "/usr/lib/python3.7/ast.py", line 46, in literal_eval
    node_or_string = parse(node_or_string, mode='eval')

  File "/usr/lib/python3.7/ast.py", line 35, in parse
    return compile(source, filename, mode, PyCF_ONLY_AST)

  File "<unknown>", line 1
    [[[205  60 145]
             ^  
SyntaxError: invalid syntax
8
  • If your string really has ... in it, you won't be able to recover the data. Commented Jan 30, 2020 at 15:19
  • The linked question fully answers yours. It is not limited to 2D Commented Jan 30, 2020 at 15:21
  • @MadPhysicist can you please specify which one ? It does not contain ... you will not see them if you use print( my_string_array) . Commented Jan 30, 2020 at 15:23
  • print(str(np.random.randint(0, high=255, size=(51, 52, 3)))) results in seven ellipses Commented Jan 30, 2020 at 15:24
  • @MadPhysicist yep you are right, it contains ... when print, but what is the issue with that? Commented Jan 30, 2020 at 15:34

1 Answer 1

0

Regarding the error that you added:

ast.literal_eval(my_string_array)
....
[[[205  60 145]
         ^  
SyntaxError: invalid syntax

literal_eval works on a limited subset of Python syntax. For example it will work on a valid list input, e.g. "[[205, 60, 145]]". But the string in the error message does not match that; it's missing the commas. The str(an_array) omits the commas. str(an_array.tolist()) does not.

Most of the answers that deal with loading csv files like this stress that you will need to replace the spaces (or blank delimiters) with commas.

So in this case the error has nothing to do with the array being 3d.

Let me illustrate:

make 3d array:

In [720]: arr = np.arange(24).reshape(2,3,4)                                                     

In [722]: arr                                                                                    
Out[722]: 
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

It's str representation, which is probably what pandas writes to the csv:

In [723]: str(arr)                                                                               
Out[723]: '[[[ 0  1  2  3]\n  [ 4  5  6  7]\n  [ 8  9 10 11]]\n\n [[12 13 14 15]\n  [16 17 18 19]\n  [20 21 22 23]]]'

Compare that with what a list str looks like:

In [724]: arr.tolist()                                                                           
Out[724]: 
[[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]],
 [[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]]
In [725]: str(arr.tolist())                                                                      
Out[725]: '[[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]], [[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]]'

literal_eval has no problem with this triple nested list string:

In [726]: ast.literal_eval(_)                                                                    
Out[726]: 
[[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]],
 [[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]]

literal_eval applied to the array string produces your error:

In [727]: ast.literal_eval(Out[721])                                                             
Traceback (most recent call last):

  File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 3319, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "<ipython-input-727-700e3f960e29>", line 1, in <module>
    ast.literal_eval(Out[721])

  File "/usr/lib/python3.6/ast.py", line 48, in literal_eval
    node_or_string = parse(node_or_string, mode='eval')

  File "/usr/lib/python3.6/ast.py", line 35, in parse
    return compile(source, filename, mode, PyCF_ONLY_AST)

  File "<unknown>", line 1
    [[[ 0  1  2  3]
           ^
SyntaxError: invalid syntax

I might be able to fix that with a couple of string substitutions, effectively converting Out[721] to Out[725].

@Mad pointed out that if the array is large enough (over 1000 elements) str will produce a condensed version, replacing a lot of the values with '...'. You can verify that yourself. If that is the case, no amount of string editing will fix the problem. That string is useless.

In how to read numpy 2D array from string?, my answer has limited values since you already have string. https://stackoverflow.com/a/44323021/901925 is better. I've also SO questions that deal specifically with the strings that appear in pandas csv. In any case you need to pay attention to the details of the string, especially delimiters and special characters.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.