4

How could I convert column b and column c to float and also expend column b to two columns.

Example dataframe:

    a                              b             c
0  36   [-212828.804308, 100000067.554]  [-3079773936.0]
1  39  [-136.358761948, -50000.0160325]  [1518911.64408]
2  40  [-136.358761948, -50000.0160325]  [1518911.64408]

Expected:

    a        b1                  b2             c
0  36   -212828.804308  100000067.554  -3079773936.0
1  39  -136.358761948, -50000.0160325  1518911.64408
2  40  -136.358761948, -50000.0160325  1518911.64408
2
  • Can you please share how the dataframe was created? Are columns b and c actually list or string? Commented Apr 26, 2017 at 0:21
  • @Abdou b and c are list Commented Apr 26, 2017 at 0:22

3 Answers 3

4

Here are two alternatives:

1) Convert the columns to a list then construct a DataFrame from scratch:

pd.concat((df['a'], pd.DataFrame(df['b'].tolist()), pd.DataFrame(df['c'].tolist())), axis=1)
Out: 
    a              0             1             0
0  36 -212828.804308  1.000001e+08 -3.079774e+09
1  39    -136.358762 -5.000002e+04  1.518912e+06
2  40    -136.358762 -5.000002e+04  1.518912e+06

Or in a loop:

pd.concat((pd.DataFrame(df[col].tolist()) for col in df), axis=1)
Out: 
    0              0             1             0
0  36 -212828.804308  1.000001e+08 -3.079774e+09
1  39    -136.358762 -5.000002e+04  1.518912e+06
2  40    -136.358762 -5.000002e+04  1.518912e+06

2) Apply pd.Series to each column (possibly slower):

pd.concat((df[col].apply(pd.Series) for col in df), axis=1)
Out: 
    0              0             1             0
0  36 -212828.804308  1.000001e+08 -3.079774e+09
1  39    -136.358762 -5.000002e+04  1.518912e+06
2  40    -136.358762 -5.000002e+04  1.518912e+06
Sign up to request clarification or add additional context in comments.

Comments

2

Construct new columns from 'b' and the drop 'b'. Column 'c' you may replace inplace.

df[['b1','b2']] = pd.DataFrame([x for x in df.b]) # new b1,b2
df.drop('b',axis=1,inplace=True) # drop b
df['c'] = pd.DataFrame([x for x in df.c]) # remove list from c

Comments

1

I extend solution from @ayhan in case you want to rename columns name in case you have multiple columns also. Note that I assume each columns has list with the same length.

col_names = []
for col in df.columns:
    if df[col].dtype == 'O' and len(df[col].iloc[0]) > 1:
        col_names.extend([col + str(i + 1) for i in range(len(df[col].iloc[0]))])
    else:
        col_names.extend([col])

df_new = pd.concat([df[col].apply(pd.Series) for col in df], axis=1)
df_new.columns = col_names

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.