Python: Split pandas dataframe column containing a list and a value into two columns

Question

I have a Pandas data frame with a column that contains a list and a value: ([z, z, z, z, m, ., c, l, u, b, .], 0.0)

How do I split this column into two columns that I add to the data frame? The output I want: one column will contain the list, the other column will contain the value. For example:

[z, z, z, z, m, ., c, l, u, b, .] and 0.0

I have tried str.split(...,expand=True,) but the output is just a column of NaN. I can't use the comma delimiter and ], both produce one column of NaN rather than a column of lists and a column of values.

Here's 4 rows of the column of my Pandas data frame that I'm trying to manipulate.

X['set']
1                  ([z, z, z, z, m, ., c, l, u, b, .], 0.0)
2                  ([z, z, z, z, g, ., c, l, u, b, .], 0.0)
3              ([z, z, z, z, cy, s, ., l, o, a, n, .], 0.0)
4                        ([z, z, z, x, c, ., u, s, .], 0.0)

what is your expected\ output ?

BENY
– BENY

2019-10-06 22:10:34 +00:00
Commented Oct 6, 2019 at 22:10 — BENY
– BENY, Commented Oct 6, 2019 at 22:10

user2205916 · Accepted Answer · 2019-10-07 00:26:56Z

1

I was able to figure it out based on deduction using the answers of other users.

pd.DataFrame(X['set'].tolist(), index=df.index)

Related post: how to split column of tuples in pandas dataframe?

answered Oct 7, 2019 at 0:26

user2205916

3,47612 gold badges61 silver badges91 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

GSatterwhite · Accepted Answer · 2019-10-06 22:10:45Z

0

can you try making the delimiter ],?

answered Oct 6, 2019 at 22:10

GSatterwhite

3011 gold badge3 silver badges16 bronze badges

Comments

adrianp · Accepted Answer · 2019-10-06 23:24:24Z

0

You just need a bit of string gymnastics:

def separate(x):
    closing_bracket_index = x.index(']')
    list_vals = x[:closing_bracket_index+1]
    val = x[closing_bracket_index+3:]

    return pd.Series([list_vals, val], index=['list', 'val'])

X['set'].apply(separate)

edited Oct 6, 2019 at 23:24

answered Oct 6, 2019 at 22:46

adrianp

1,0191 gold badge9 silver badges14 bronze badges

1 Comment

user2205916 Over a year ago

Doesn't work. ValueError: tuple.index(x): x not in tuple. My Series is not a list of list_vals and values. It's a tuple.

Bane · Accepted Answer · 2019-10-07 05:34:41Z

0

Hope this works

import numpy as np
import pandas as pd

a = (['g','f'],0.0)
b = (['d','e'],0.1)
df = pd.DataFrame({'col':[a,b]})
df

Out[1]: 
             col
0  ([g, f], 0.0)
1  ([d, e], 0.1)

def split_val(col):
    list_val = col[0]
    value    = col[1]
    return pd.Series([list_val, value], index=['list', 'val'])


df[['list_val','value']] = df['col'].apply(split_val) 
df

Out[2]: 
             col list_val  value
0  [[g, f], 0.0]   [g, f]    0.0
1  [[d, e], 0.1]   [d, e]    0.1

edited Oct 7, 2019 at 5:34

answered Oct 6, 2019 at 23:06

Bane

4943 silver badges13 bronze badges

2 Comments

user2205916 Over a year ago

This doesn't work. Your df has square brackets for each row whereas mine has parens, so your example uses a list whereas mine is a tuple.

Bane Over a year ago

@user2205916 this works as well, but your code is cleaner :)

Collectives™ on Stack Overflow

Python: Split pandas dataframe column containing a list and a value into two columns

4 Answers 4

Comments

Comments

1 Comment

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related