Create pandas dataframe from list of tuple of nested lists

Question

I have this data below, which is a list with 4 elements. These elements are tuple which items are list them self...

data = [(['a', 'b', 'c'],
  [1, 2, 3, 4, 5],
  ['aa', 'bb'],
  ['00', '03', '0000', '0006']),
 (['e', 'f', 'g'],
  [2, 1, 4, 4, 6],
  ['qq', 'er'],
  ['10', '04', '3340', '9009']),
 (['w', 'd', 'c'],
  [5, 6, 55, 1, 6],
  ['rr', 'rr'],
  ['55', '11', '6788', '7789']),
 (['l', 'a', 's'],
  [29, 2, 9, 4, 3],
  ['yy', 'uu'],
  ['33', '67', '0000', '0237'])]

I want to convert it to dataframe in such a way that each element is broken onto column of the dataframe. For example; df = pd.DataFrame(data)

will result into a dataframe with four columns. What I want is for each column to be broken into columns of the dataframe as seen below in red lines...

That is to say, above dataframe will have each column sub divided into the number of items that made up the cell.

jezrael · Accepted Answer · 2017-12-12 09:57:39Z

1

You can flatten nested lists:

df = pd.DataFrame([[item for sublist in l for item in sublist] for l in data])
print (df)
  0  1  2   3   4   5   6   7   8   9   10  11    12    13
0  a  b  c   1   2   3   4   5  aa  bb  00  03  0000  0006
1  e  f  g   2   1   4   4   6  qq  er  10  04  3340  9009
2  w  d  c   5   6  55   1   6  rr  rr  55  11  6788  7789
3  l  a  s  29   2   9   4   3  yy  uu  33  67  0000  0237

Timings:

data = data * 100

In [128]: %timeit pd.DataFrame([[item for sublist in l for item in sublist] for l in data])
100 loops, best of 3: 2.03 ms per loop

#cᴏʟᴅsᴘᴇᴇᴅ1 
In [137]: %timeit pd.DataFrame(list(map(lambda d:  list(chain.from_iterable(d)), data)))
1000 loops, best of 3: 1.97 ms per loop

#cᴏʟᴅsᴘᴇᴇᴅ2 
In [129]: %timeit pd.DataFrame(np.concatenate(list(zip(*data)), axis=1))
1000 loops, best of 3: 1.46 ms per loop

#cᴏʟᴅsᴘᴇᴇᴅ3 
In [130]: %timeit pd.DataFrame([np.concatenate(d) for d in data])
100 loops, best of 3: 5.9 ms per loop


data = data * 10000

In [121]: %timeit pd.DataFrame([[item for sublist in l for item in sublist] for l in data])
10 loops, best of 3: 99.2 ms per loop

#cᴏʟᴅsᴘᴇᴇᴅ1 
In [139]: %timeit pd.DataFrame(list(map(lambda d: list(chain.from_iterable(d)), data)))
10 loops, best of 3: 95.8 ms per loop

#cᴏʟᴅsᴘᴇᴇᴅ2 
In [122]: %timeit pd.DataFrame(np.concatenate(list(zip(*data)), axis=1))
10 loops, best of 3: 150 ms per loop

#cᴏʟᴅsᴘᴇᴇᴅ3 
In [123]: %timeit pd.DataFrame([np.concatenate(d) for d in data])
1 loop, best of 3: 560 ms per loop

edited Dec 12, 2017 at 9:57

answered Dec 12, 2017 at 9:28

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

cs95 Over a year ago

I've added another option, could you please update if it isn't too much of a hassle?

jezrael Over a year ago

Sure, give me a sec

jezrael Over a year ago

Here it is minimal difference with nested flattenting.

Collectives™ on Stack Overflow

Create pandas dataframe from list of tuple of nested lists

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related