5

I have a pandas series that contains an array for each element, like so:

0            [0, 0]
1          [12, 15]
2          [43, 45]
3           [9, 10]
4            [0, 0]
5            [3, 3]
6            [0, 0]
7            [0, 0]
8            [0, 0]
9            [3, 3]
10           [2, 2]

I want to extract all the first elements, put them in another Series or list and do the same for the second element. I've tried doing regular expression:

mySeries.str.extract(r'\[(\d+), (\d+)\]', expand=True)

and also splitting:

mySeries.str.split(', ').tolist())

both give nan values. What am I doing wrong?

3 Answers 3

4

Case 1
Column of lists
You will need to .tolist that column and load it into a DataFrame.

pd.DataFrame(df['col'].tolist())

df
         col
0     [0, 0]
1   [12, 15]
2   [43, 15]
3    [9, 10]
4     [0, 0]
5     [3, 3]
6     [0, 0]
7     [0, 0]
8     [0, 0]
9     [3, 3]
10    [2, 2]

pd.DataFrame(df['col'].tolist())

     0   1
0    0   0
1   12  15
2   43  15
3    9  10
4    0   0
5    3   3
6    0   0
7    0   0
8    0   0
9    3   3
10   2   2

Note: If your data has NaNs, I'd recommend dropping them first: df = df.dropna() and then proceed as shown above.


Case 2
Column of strings represented as lists

If you have < 100 rows, use:

df['col'] = pd.eval(df['col'])

And then implement case 1. Otherwise, use ast:

import ast
df['col'] = df['col'].apply(ast.literal_eval)

And proceed as before.

Sign up to request clarification or add additional context in comments.

Comments

3

zip the elements of df.col

df.assign(**dict(zip('AB', zip(*df.col))))

         col   A   B
0     [0, 0]   0   0
1   [12, 15]  12  15
2   [43, 45]  43  45
3    [9, 10]   9  10
4     [0, 0]   0   0
5     [3, 3]   3   3
6     [0, 0]   0   0
7     [0, 0]   0   0
8     [0, 0]   0   0
9     [3, 3]   3   3
10    [2, 2]   2   2

Or

df['A'], df['B'] = zip(*df.col)
df

         col   A   B
0     [0, 0]   0   0
1   [12, 15]  12  15
2   [43, 45]  43  45
3    [9, 10]   9  10
4     [0, 0]   0   0
5     [3, 3]   3   3
6     [0, 0]   0   0
7     [0, 0]   0   0
8     [0, 0]   0   0
9     [3, 3]   3   3
10    [2, 2]   2   2

Comments

2

One solution is to use pd.DataFrame.apply with pd.Series. This assumes you have a series of lists, as implied in your question, rather than of strings.

Your logic will not work with a series of lists as these are not represented as strings in pandas.

df = pd.DataFrame({'A': [[1, 2], [3, 4], [5, 6]]})

df[['B', 'C']] = df['A'].apply(pd.Series)

print(df)

        A  B  C
0  [1, 2]  1  2
1  [3, 4]  3  4
2  [5, 6]  5  6

1 Comment

One should note that apply + Series is a match made in hell in terms of performance. OP's headache, less skin off my back :p

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.