1

For a data frame df:

name       list1                    list2
a          [1, 3, 10, 12, 20..]     [2, 6, 23, 29...]
b          [2, 10, 14, 3]           [4, 7, 8, 13...]
c          []                       [98, 101, 200]
...

I want to transfer the list1 and list2 to np.array and then hstack them. Here is what I did:

df.pv = df.apply(lambda row: np.hstack((np.asarray(row.list1), np.asarray(row.list2))), axis=1)

And I got such an error:

ValueError: Shape of passed values is (138493, 175), indices imply (138493, 4)

Where 138493==len(df)

Please note that some value in list1 and list2 is empty list, []. And the length of list are different among rows. Do you know what is the reason how can I fix the problem? Thanks in advance!

EDIT:

When I just try to convert one list to array:

df.apply(lambda row: np.asarray(row.list1), axis=1)

An error also occurs:

ValueError: Empty data passed with indices specified.
5
  • can you provide a reproducible input? Commented Oct 6, 2016 at 10:07
  • @ColonelBeauvel Thanks for your reply! Isn't the sample above reproducible? Commented Oct 6, 2016 at 10:26
  • @user5779223 How did you create you dataframe, that's what he meant Commented Oct 6, 2016 at 10:32
  • @MMF I read in a data set and convert it to the form like this. Indeed I still don't know what information you need? Commented Oct 6, 2016 at 10:33
  • share with us the code where you create df. df = ? Commented Oct 6, 2016 at 10:39

1 Answer 1

1

Your apply function is almost correct. All you have to do - convert the output of the np.hstack() function back to a python list.

df.apply(lambda row: list(np.hstack((np.asarray(row.list1), np.asarray(row.list2)))), axis=1)

The code is shown below (including the df creation):

df = pd.DataFrame([('a',[1, 3, 10, 12, 20],[2, 6, 23, 29]),
                   ('b',[2, 10, 1.4, 3],[4, 7, 8, 13]),
                   ('c',[],[98, 101, 200])],
                   columns = ['name','list1','list2'])

df['list3'] = df.apply(lambda row: list(np.hstack((np.asarray(row.list1), np.asarray(row.list2)))), axis=1)

print(df)

Output:

0              [1, 3, 10, 12, 20, 2, 6, 23, 29]
1    [2.0, 10.0, 1.4, 3.0, 4.0, 7.0, 8.0, 13.0]
2                          [98.0, 101.0, 200.0]
Name: list3, dtype: object

If you want a numpy array, the only way I could get it to work is:

df['list3'] = df['list3'].apply(lambda x: np.array(x))

print(type(df['list3'].ix[0]))
Out[] : numpy.ndarray
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your answer, but what if I wanted it be a numpy array?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.