0

I have a dataframe in the following format:

0 [[2387, 1098], [1873, 6792], ....

1 [0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, ...

I need to convert the array of first column into two futher columns. I have seen other such similiar questions but the solutions are given for smaller data, I have around 300 rows and can't write them all down manually. I have tried to_list() but I get an error when using it.

What code should I use to split it into two? Also, why is my dataframe not displaying in the form of columns rather in rows?

2 Answers 2

1

You can convert your dataframe in this way:

import pandas as pd
import numpy as np

df = pd.DataFrame({0:[[2387, 1098], [1873, 6792],], 1:[0,1]})
arr = np.array(df.loc[:,0].to_list())
df2 = pd.DataFrame({0:arr[:,0], 1:arr[:,1], 2:df.loc[:,1]})
print(df2)

The result is:

      0     1  2
0  2387  1098  0
1  1873  6792  1

A second way to solve the problem (with a "moon" sample) is:

import sklearn
import sklearn.datasets

X, y = sklearn.datasets.make_moons()
pd.DataFrame({'x0':X[:,0], 'x1': X[:,1], 'y':y})

and the result is:

          x0        x1  y
0   0.981559  0.191159  0
1   0.967948 -0.499486  1
2   0.018441  0.308841  1
3  -0.981559  0.191159  0
4   0.967295  0.253655  0
..       ...       ... ..
95  0.238554 -0.148228  1
96  0.096023  0.995379  0
97  0.327699 -0.240278  1
98  0.900969  0.433884  0
99  1.981559  0.308841  1

[100 rows x 3 columns]

Sign up to request clarification or add additional context in comments.

4 Comments

I replaced your line of code: df = pd.DataFrame({0:[[2387, 1098], [1873, 6792],], 1:[0,1]}) with df = pd.DataFrame(data). data contains a moon dataset from scikit learn. I copied the rest of your lines as it it. arr = np.array(df.loc[:,0].to_list()) line gives me a value error of: ValueError Traceback (most recent call last) <ipython-input-7-2a0be59c6b12> in <module> ----> 1 arr = np.array(df.loc[:,0].to_list()) ValueError: could not broadcast input array from shape (200,2) into shape (200)
A moon dataset in sklearn is under the form (X,y) where X.shape is (200,2) and y.shape is (200,). In this case, the better way to separate columns is to write:
pd.DataFrame({'x0':X[:,0], 'x1': X[:,1], 'y':y})
I will add that in my answer cause one cannot write code in comments ...
1

EDIT:

Maybe it looks strange but you can use .str[0] to get first column from lists in DataFrame.

import pandas as pd

df = pd.DataFrame({0:[[2387, 1098], [1873, 6792],], 1:[0,1]})

new_df = pd.DataFrame({
              0: df[0].str[0], 
              1: df[0].str[1], 
              2: df[1]
         })

print(new_df)

OLDER:

Using apply() with pandas.Series you can convert first column into new DataFrame with two columns

import pandas as pd

df = pd.DataFrame({0:[[2387, 1098], [1873, 6792],], 1:[0,1]})

new_df = df[0].apply(pd.Series)

print(new_df)

Result:

      0     1
0  2387  1098
1  1873  6792

And later you can assing them back to old `DataFrame

df[2] = df[1]       # move `[0,1,...]` to column 2
df[[0,1]] = new_df  # put `new_df` in columns 0,1

Result:

      0     1  2
0  2387  1098  0
1  1873  6792  1

Or you can copy column [0,1,...] from old df to new_df

import pandas as pd

df = pd.DataFrame({0:[[2387, 1098], [1873, 6792],], 1:[0,1]})

new_df = df[0].apply(pd.Series)
new_df[2] = df[1]

print(new_df)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.