How to create data fame from random lists length using python?

Question

I want to create pandas data frame with multiple lists with different length. Below is my python code.

import pandas as pd

A=[1,2]
B=[1,2,3]
C=[1,2,3,4,5,6]

lenA = len(A)
lenB = len(B)
lenC = len(C)

df = pd.DataFrame(columns=['A', 'B','C'])

for i,v1 in enumerate(A):
    for j,v2 in enumerate(B):
        for k, v3 in enumerate(C):
            if(i<random.randint(0, lenA)):
                if(j<random.randint(0, lenB)):
                    if (k < random.randint(0, lenC)):
                        df = df.append({'A': v1, 'B': v2,'C':v3}, ignore_index=True)            
print(df)

My lists are as below:

A=[1,2]
B=[1,2,3]
C=[1,2,3,4,5,6,7]

In each run I got different output and which is correct. But not covers all list items in each run. In one run I got below output as:

In the above output 'A' list's all items (1,2) are there. But 'B' list has only (1,2) items, the item 3 is missing. Also list 'C' has (1,2,3,5) items only. (4,6,7) items are missing in 'C' list. My expectation is: in each list each item should be in the data frame at least once and 'C' list items should be in data frame only once. My expected sample output is as below:

Guide me to get my expected output. Thanks in advance.

jezrael · Accepted Answer · 2020-04-23 08:26:51Z

2

You can add random values of each list to total length and then use DataFrame.sample:

A=[1,2]
B=[1,2,3]
C=[1,2,3,4,5,6]

L = [A,B,C]
m = max(len(x) for x in L)
print (m)
6

a = [np.hstack((np.random.choice(x, m - len(x)), x)) for x in L]

df = pd.DataFrame(a, index=['A', 'B', 'C']).T.sample(frac=1)
print (df)
   A  B  C
2  2  2  3
0  2  1  1
3  1  1  4
4  1  2  5
5  2  3  6
1  2  2  2

edited Apr 23, 2020 at 8:26

answered Apr 23, 2020 at 7:57

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user1999109 Over a year ago

Thank you very much for your valuable guidance. I got my expected output as per your suggested code. Thanks.

utsavan · Accepted Answer · 2020-04-23 07:50:49Z

0

You can use transpose to achieve the same. EDIT: Used random to randomize the output as requested.

import pandas as pd
from random import shuffle, choice


A=[1,2]
B=[1,2,3]
C=[1,2,3,4,5,6]
shuffle(A)
shuffle(B)
shuffle(C)

data = [A,B,C]

df = pd.DataFrame(data)
df = df.transpose()
df.columns = ['A', 'B', 'C']
df.loc[:,'A'].fillna(choice(A), inplace=True)
df.loc[:,'B'].fillna(choice(B), inplace=True)

This should give the below output

     A    B    C
0  1.0  1.0  1.0
1  2.0  2.0  2.0
2  NaN  3.0  3.0
3  NaN  4.0  4.0
4  NaN  NaN  5.0
5  NaN  NaN  6.0

edited Apr 23, 2020 at 7:50

answered Apr 23, 2020 at 6:56

utsavan

1557 bronze badges

2 Comments

user1999109 Over a year ago

Thank you very much for your valuable response. But A and B columns should not have 'NaN' values. It should be any one of the A and B values. Also the row/dataframe creation should be random. Guide me for this situation. Thanks..

utsavan Over a year ago

Used random module to provide some level of randomized output.

Collectives™ on Stack Overflow

How to create data fame from random lists length using python?

2 Answers 2

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related