0

I have a Pandas data frame with a column that contains a list and a value: ([z, z, z, z, m, ., c, l, u, b, .], 0.0)

How do I split this column into two columns that I add to the data frame? The output I want: one column will contain the list, the other column will contain the value. For example:

[z, z, z, z, m, ., c, l, u, b, .] and 0.0

I have tried str.split(...,expand=True,) but the output is just a column of NaN. I can't use the comma delimiter and ], both produce one column of NaN rather than a column of lists and a column of values.

Here's 4 rows of the column of my Pandas data frame that I'm trying to manipulate.

X['set']
1                  ([z, z, z, z, m, ., c, l, u, b, .], 0.0)
2                  ([z, z, z, z, g, ., c, l, u, b, .], 0.0)
3              ([z, z, z, z, cy, s, ., l, o, a, n, .], 0.0)
4                        ([z, z, z, x, c, ., u, s, .], 0.0)
1
  • what is your expected\ output ? Commented Oct 6, 2019 at 22:10

4 Answers 4

1

I was able to figure it out based on deduction using the answers of other users.

pd.DataFrame(X['set'].tolist(), index=df.index)

Related post: how to split column of tuples in pandas dataframe?

Sign up to request clarification or add additional context in comments.

Comments

0

can you try making the delimiter ],?

Comments

0

You just need a bit of string gymnastics:

def separate(x):
    closing_bracket_index = x.index(']')
    list_vals = x[:closing_bracket_index+1]
    val = x[closing_bracket_index+3:]

    return pd.Series([list_vals, val], index=['list', 'val'])

X['set'].apply(separate)

1 Comment

Doesn't work. ValueError: tuple.index(x): x not in tuple. My Series is not a list of list_vals and values. It's a tuple.
0

Hope this works

import numpy as np
import pandas as pd

a = (['g','f'],0.0)
b = (['d','e'],0.1)
df = pd.DataFrame({'col':[a,b]})
df

Out[1]: 
             col
0  ([g, f], 0.0)
1  ([d, e], 0.1)

def split_val(col):
    list_val = col[0]
    value    = col[1]
    return pd.Series([list_val, value], index=['list', 'val'])


df[['list_val','value']] = df['col'].apply(split_val) 
df

Out[2]: 
             col list_val  value
0  [[g, f], 0.0]   [g, f]    0.0
1  [[d, e], 0.1]   [d, e]    0.1

2 Comments

This doesn't work. Your df has square brackets for each row whereas mine has parens, so your example uses a list whereas mine is a tuple.
@user2205916 this works as well, but your code is cleaner :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.