1

I want to create pandas data frame with multiple lists with different length. Below is my python code.

import pandas as pd

A=[1,2]
B=[1,2,3]
C=[1,2,3,4,5,6]

lenA = len(A)
lenB = len(B)
lenC = len(C)

df = pd.DataFrame(columns=['A', 'B','C'])

for i,v1 in enumerate(A):
    for j,v2 in enumerate(B):
        for k, v3 in enumerate(C):
            if(i<random.randint(0, lenA)):
                if(j<random.randint(0, lenB)):
                    if (k < random.randint(0, lenC)):
                        df = df.append({'A': v1, 'B': v2,'C':v3}, ignore_index=True)            
print(df)

My lists are as below:

A=[1,2]
B=[1,2,3]
C=[1,2,3,4,5,6,7]

In each run I got different output and which is correct. But not covers all list items in each run. In one run I got below output as:

   A  B  C
0  1  1  3
1  1  2  1
2  1  2  2
3  2  2  5

In the above output 'A' list's all items (1,2) are there. But 'B' list has only (1,2) items, the item 3 is missing. Also list 'C' has (1,2,3,5) items only. (4,6,7) items are missing in 'C' list. My expectation is: in each list each item should be in the data frame at least once and 'C' list items should be in data frame only once. My expected sample output is as below:

   A  B  C
0  1  1  3
1  1  2  1
2  1  2  2
3  2  2  5
4  2  3  4
5  1  1  7
6  2  3  6

Guide me to get my expected output. Thanks in advance.

2 Answers 2

2

You can add random values of each list to total length and then use DataFrame.sample:

A=[1,2]
B=[1,2,3]
C=[1,2,3,4,5,6]

L = [A,B,C]
m = max(len(x) for x in L)
print (m)
6

a = [np.hstack((np.random.choice(x, m - len(x)), x)) for x in L]

df = pd.DataFrame(a, index=['A', 'B', 'C']).T.sample(frac=1)
print (df)
   A  B  C
2  2  2  3
0  2  1  1
3  1  1  4
4  1  2  5
5  2  3  6
1  2  2  2
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much for your valuable guidance. I got my expected output as per your suggested code. Thanks.
0

You can use transpose to achieve the same. EDIT: Used random to randomize the output as requested.

import pandas as pd
from random import shuffle, choice


A=[1,2]
B=[1,2,3]
C=[1,2,3,4,5,6]
shuffle(A)
shuffle(B)
shuffle(C)

data = [A,B,C]

df = pd.DataFrame(data)
df = df.transpose()
df.columns = ['A', 'B', 'C']
df.loc[:,'A'].fillna(choice(A), inplace=True)
df.loc[:,'B'].fillna(choice(B), inplace=True)

This should give the below output

     A    B    C
0  1.0  1.0  1.0
1  2.0  2.0  2.0
2  NaN  3.0  3.0
3  NaN  4.0  4.0
4  NaN  NaN  5.0
5  NaN  NaN  6.0

2 Comments

Thank you very much for your valuable response. But A and B columns should not have 'NaN' values. It should be any one of the A and B values. Also the row/dataframe creation should be random. Guide me for this situation. Thanks..
Used random module to provide some level of randomized output.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.