I am trying to create a pandas data frame using two lists and the output is erroneous for a given length of the lists.(this is not due to varying lengths)
Here I have two cases, one that works as expected and one that doesn't(commented out):
import string
d = dict.fromkeys(string.ascii_lowercase, 0).keys()
groups = sorted(d)[:3]
numList = range(0,4)
# groups = sorted(d)[:20]
# numList = range(0,25)
df = DataFrame({'Number':sorted(numList)*len(groups), 'Group':sorted(groups)*len(numList)})
df.sort_values(['Group', 'Number'])
Expected Output: every item in groups, to correspond to all items in numList
Group Number
a 0
a 1
a 2
a 3
b 0
b 1
b 2
b 3
c 0
c 1
c 2
c 3
Actual Results: Works for case in which lists are sized 3 and 4 but not 20 , and 25 (I have commented out that case in the above code)
Why is that? and how to fix that?
print(df)for both lists sized 3 and 4, and also 20 and 25 before implementingdf.sort_values(['Group', 'Number'])to check the differences. From here, you can understand the root cause of the problems.