12

I have the following series:

s = pd.Series([['a', 'b'], ['c', 'd'], ['f', 'g']])
>>> s
0    [a, b]
1    [c, d]
2    [f, g]
dtype: object

what is the easiest - preferably vectorized - way to concatenate all lists in the series, so that I get:

l = ['a', 'b', 'c', 'd', 'f', 'g']

Thanks!

3
  • 5
    s.sum() is the easiest vectorised way, but it's probably not very efficient... Commented Nov 5, 2015 at 22:36
  • awesome! good enough for me! thanx a lot Commented Nov 5, 2015 at 22:38
  • @ajcr should be an answer! Commented Nov 6, 2015 at 0:35

2 Answers 2

16

A nested list comprehension should be much faster.

>>> [element for list_ in s for element in list_]
    ['a', 'b', 'c', 'd', 'f', 'g']

>>> %timeit -n 100000 [element for list_ in s for element in list_]
100000 loops, best of 3: 5.2 µs per loop

>>> %timeit -n 100000 s.sum()
100000 loops, best of 3: 50.7 µs per loop

Directly accessing the values of the list is even faster.

>>> %timeit -n 100000 [element for list_ in s.values for element in list_]
100000 loops, best of 3: 2.77 µs per loop
Sign up to request clarification or add additional context in comments.

Comments

2

I'm not timing or testing these options, but there's the new pandas method explode, and also numpy.concatenate.

1 Comment

Seems like explode takes much more time than other options: ``` %timeit -n 100000 [element for list_ in s for element in list_] 2.17 µs ± 97.6 ns per loop %timeit -n 100000 [element for list_ in s.values for element in list_] 1.76 µs ± 209 ns per loop %timeit -n 100000 [element for element in s.explode()] 77.4 µs ± 5.08 µs per loop %timeit -n 100000 [element for element in s.explode().values] 76.4 µs ± 7.02 µs per loop ```

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.