Multiple lists to Pandas DataFrame

Question

I have three list here

[1,2,3,4,5]

[5,4,6,7,2]

[1,2,4,5,6,7,8,9,0]

I want this kind of output:

A     B    C
1     5    1
2     4    2
3     6    4
4     7    5
5     2    6
           7
           8
           9
           0

I tried one syntax , but it gives me this error arrays must all be same length and another error was Length of values does not match length of index

Is there any way to get this kind of output?

cs95 · Accepted Answer · 2018-12-19 11:51:58Z

6

This is not easily supported, but it can be done. DataFrame.from_dict will with the "index" orient. Assuming your lists are A, B, and C:

pd.DataFrame([A, B, C]).T

     0    1    2
0  1.0  5.0  1.0
1  2.0  4.0  2.0
2  3.0  6.0  4.0
3  4.0  7.0  5.0
4  5.0  2.0  6.0
5  NaN  NaN  7.0
6  NaN  NaN  8.0
7  NaN  NaN  9.0
8  NaN  NaN  0.0

Another option is using DataFrame.from_dict:

pd.DataFrame.from_dict({'A' : A, 'B' : B, 'C' : C}, orient='index').T

     A    B    C
0  1.0  5.0  1.0
1  2.0  4.0  2.0
2  3.0  6.0  4.0
3  4.0  7.0  5.0
4  5.0  2.0  6.0
5  NaN  NaN  7.0
6  NaN  NaN  8.0
7  NaN  NaN  9.0
8  NaN  NaN  0.0

A third solution with zip_longest and DataFrame.from_records:

from itertools import zip_longest
pd.DataFrame.from_records(zip_longest(A, B, C), columns=['A', 'B', 'C'])
# pd.DataFrame.from_records(list(zip_longest(A, B, C)), columns=['A', 'B', 'C'])

     A    B  C
0  1.0  5.0  1
1  2.0  4.0  2
2  3.0  6.0  4
3  4.0  7.0  5
4  5.0  2.0  6
5  NaN  NaN  7
6  NaN  NaN  8
7  NaN  NaN  9
8  NaN  NaN  0

edited Dec 19, 2018 at 11:51

answered Dec 19, 2018 at 10:21

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

EdChum Over a year ago

Interestingly the last method is the fastest, probably the lack of transposing removes the intermediary structure +1

cs95 Over a year ago

@EdChum Thanks for the timings! Happy to see you answering, have already returned the vote :)

EdChum Over a year ago

@coldspeed thanks took a break from answering, not too many interesting and new questions IMO, mainly performing cleanup duties. Also very busy at work, always interesting to browse questions occassionally

cs95 Over a year ago

@Coder I found yet another solution, I've added it at the top.

EdChum · Accepted Answer · 2018-12-19 10:25:47Z

alternative is to perform a list comprehension of a Series of each list and construct a df from this:

In[61]:
df = pd.DataFrame([pd.Series(x) for x in [A,B,C]], index=list('ABC')).T
df

Out[61]: 
     A    B    C
0  1.0  5.0  1.0
1  2.0  4.0  2.0
2  3.0  6.0  4.0
3  4.0  7.0  5.0
4  5.0  2.0  6.0
5  NaN  NaN  7.0
6  NaN  NaN  8.0
7  NaN  NaN  9.0
8  NaN  NaN  0.0

timings:

%timeit pd.DataFrame([pd.Series(x) for x in [A,B,C]], index=list('ABC')).T
%timeit pd.DataFrame.from_dict({'A' : A, 'B' : B, 'C' : C}, orient='index').T
from itertools import zip_longest
%timeit pd.DataFrame.from_records(list(zip_longest(A, B, C)), columns=['A', 'B', 'C'])

1.23 ms ± 12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
977 µs ± 1.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
545 µs ± 8.08 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

So the last method is the fastest

iGian · Accepted Answer · 2018-12-19 10:35:07Z

0

An idea for a custom way.

Define a couple of methods to adjust the input data:

def longest(*lists):
  return max([ len(x) for x in lists])

def equalize(col, size):
  delta = size - len(col)
  if delta == 0: return col
  return col + [None for _ in range(delta)]

To be used building the dataframe:

import pandas as pd

size = longest(col1, col2, col3)
df = pd.DataFrame({'a':equalize(col1, size), 'b':equalize(col2, size), 'c':equalize(col3, size)})

Which returns

     a    b  c
0  1.0  5.0  1
1  2.0  4.0  2
2  3.0  6.0  4
3  4.0  7.0  5
4  5.0  2.0  6
5  NaN  NaN  7
6  NaN  NaN  8
7  NaN  NaN  9
8  NaN  NaN  0

answered Dec 19, 2018 at 10:35

iGian

11.2k3 gold badges24 silver badges38 bronze badges

Collectives™ on Stack Overflow

Multiple lists to Pandas DataFrame

3 Answers 3

4 Comments

timings:

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

timings:

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related