1

I have this data below, which is a list with 4 elements. These elements are tuple which items are list them self...

data = [(['a', 'b', 'c'],
  [1, 2, 3, 4, 5],
  ['aa', 'bb'],
  ['00', '03', '0000', '0006']),
 (['e', 'f', 'g'],
  [2, 1, 4, 4, 6],
  ['qq', 'er'],
  ['10', '04', '3340', '9009']),
 (['w', 'd', 'c'],
  [5, 6, 55, 1, 6],
  ['rr', 'rr'],
  ['55', '11', '6788', '7789']),
 (['l', 'a', 's'],
  [29, 2, 9, 4, 3],
  ['yy', 'uu'],
  ['33', '67', '0000', '0237'])]

I want to convert it to dataframe in such a way that each element is broken onto column of the dataframe. For example; df = pd.DataFrame(data)

will result into a dataframe with four columns. What I want is for each column to be broken into columns of the dataframe as seen below in red lines... enter image description here

That is to say, above dataframe will have each column sub divided into the number of items that made up the cell.

1 Answer 1

1

You can flatten nested lists:

df = pd.DataFrame([[item for sublist in l for item in sublist] for l in data])
print (df)
  0  1  2   3   4   5   6   7   8   9   10  11    12    13
0  a  b  c   1   2   3   4   5  aa  bb  00  03  0000  0006
1  e  f  g   2   1   4   4   6  qq  er  10  04  3340  9009
2  w  d  c   5   6  55   1   6  rr  rr  55  11  6788  7789
3  l  a  s  29   2   9   4   3  yy  uu  33  67  0000  0237

Timings:

data = data * 100

In [128]: %timeit pd.DataFrame([[item for sublist in l for item in sublist] for l in data])
100 loops, best of 3: 2.03 ms per loop

#cᴏʟᴅsᴘᴇᴇᴅ1 
In [137]: %timeit pd.DataFrame(list(map(lambda d:  list(chain.from_iterable(d)), data)))
1000 loops, best of 3: 1.97 ms per loop

#cᴏʟᴅsᴘᴇᴇᴅ2 
In [129]: %timeit pd.DataFrame(np.concatenate(list(zip(*data)), axis=1))
1000 loops, best of 3: 1.46 ms per loop

#cᴏʟᴅsᴘᴇᴇᴅ3 
In [130]: %timeit pd.DataFrame([np.concatenate(d) for d in data])
100 loops, best of 3: 5.9 ms per loop


data = data * 10000

In [121]: %timeit pd.DataFrame([[item for sublist in l for item in sublist] for l in data])
10 loops, best of 3: 99.2 ms per loop

#cᴏʟᴅsᴘᴇᴇᴅ1 
In [139]: %timeit pd.DataFrame(list(map(lambda d: list(chain.from_iterable(d)), data)))
10 loops, best of 3: 95.8 ms per loop

#cᴏʟᴅsᴘᴇᴇᴅ2 
In [122]: %timeit pd.DataFrame(np.concatenate(list(zip(*data)), axis=1))
10 loops, best of 3: 150 ms per loop

#cᴏʟᴅsᴘᴇᴇᴅ3 
In [123]: %timeit pd.DataFrame([np.concatenate(d) for d in data])
1 loop, best of 3: 560 ms per loop
Sign up to request clarification or add additional context in comments.

3 Comments

I've added another option, could you please update if it isn't too much of a hassle?
Sure, give me a sec
Here it is minimal difference with nested flattenting.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.