23

I have a list of numpy arrays that I'm trying to convert to DataFrame. Each array should be a row of the dataframe.

Using pd.DataFrame() isn't working. It always gives the error: ValueError: Must pass 2-d input.

Is there a better way to do this?

This is my current code:

list_arrays = [ array([[0, 0, 0, 1, 0, 0, 0, 0, 00]], dtype='uint8'), 
                array([[0, 0, 3, 2, 0, 0, 0, 0, 00]], dtype='uint8')
              ]

d = pd.DataFrame(list_arrays)

ValueError: Must pass 2-d input
0

4 Answers 4

28

Option 1:

In [143]: pd.DataFrame(np.concatenate(list_arrays))
Out[143]:
   0  1  2  3  4  5  6  7  8
0  0  0  0  1  0  0  0  0  0
1  0  0  3  2  0  0  0  0  0

Option 2:

In [144]: pd.DataFrame(list(map(np.ravel, list_arrays)))
Out[144]:
   0  1  2  3  4  5  6  7  8
0  0  0  0  1  0  0  0  0  0
1  0  0  3  2  0  0  0  0  0

Why do I get:

ValueError: Must pass 2-d input

I think pd.DataFrame() tries to convert it to NDArray like as follows:

In [148]: np.array(list_arrays)
Out[148]:
array([[[0, 0, 0, 1, 0, 0, 0, 0, 0]],

       [[0, 0, 3, 2, 0, 0, 0, 0, 0]]], dtype=uint8)

In [149]: np.array(list_arrays).shape
Out[149]: (2, 1, 9)     # <----- NOTE: 3D array
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you! All of them works. But i wonder why i was getting that 2-d error.
For me, pd.DataFrame(np.concatenate(list_arrays)) just caused all my arrays to flatten and be 1 dimensional instead of "row stacking" them. Therefore, I recommend just use pd.DataFrame(np.row_stack(list_arrays)) . It was a couple seconds on 140k rows x 17k columns
8

Alt 1

pd.DataFrame(sum(map(list, list_arrays), []))

   0  1  2  3  4  5  6  7  8
0  0  0  0  1  0  0  0  0  0
1  0  0  3  2  0  0  0  0  0

Alt 2

pd.DataFrame(np.row_stack(list_arrays))

   0  1  2  3  4  5  6  7  8
0  0  0  0  1  0  0  0  0  0
1  0  0  3  2  0  0  0  0  0

Comments

4

Here is one way.

import numpy as np, pandas as pd

lst = [np.array([[0, 0, 0, 1, 0, 0, 0, 0, 0]], dtype=int),
       np.array([[0, 0, 3, 2, 0, 0, 0, 0, 0]], dtype=int)]

df = pd.DataFrame(np.vstack(lst))

#    0  1  2  3  4  5  6  7  8
# 0  0  0  0  1  0  0  0  0  0
# 1  0  0  3  2  0  0  0  0  0

Comments

4

You can using pd.Series

pd.Series(l).apply(lambda x : pd.Series(x[0]))
Out[294]: 
   0  1  2  3  4  5  6  7  8
0  0  0  0  1  0  0  0  0  0
1  0  0  3  2  0  0  0  0  0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.