2

I have a dataframe like this:

matrix = [(222, {'a': 1, 'b':3, 'c':2, 'd':1}),
         (333, {'a': 1, 'b':0, 'c':0, 'd':1})]

df = pd.DataFrame(matrix, columns=['ordernum', 'dict_of item_counts'])
   ordernum               dict_of item_counts
0       222  {'a': 1, 'b': 3, 'c': 2, 'd': 1}
1       333  {'a': 1, 'b': 0, 'c': 0, 'd': 1}

and I would like to create a dataframe in which each ordernum is repeated for each dictionary key in dict_of_item_counts that is not 0. I would also like to create a key column that shows the corresponding dictionary key for this row as well as a value column that contains the dictionary values. Finally, I would also an ordernum_index that counts the different rows in the dataframe for each ordernum.

The final dataframe should look like this:

ordernum      ordernum_index      key     value

222           1                   a       1
222           2                   b       3 
222           3                   c       2
222           4                   d       1
333           1                   a       1
333           2                   d       1 

Any help would be much appreciated :)

1
  • Have you tried anything? Commented May 26, 2019 at 20:11

5 Answers 5

2

Always try to structure your data, Can be done easily like below:

>>> matrix
[(222, {'a': 1, 'b': 3, 'c': 2, 'd': 1}), (333, {'a': 1, 'b': 0, 'c': 0, 'd': 1})]
>>> data = [[item[0]]+[i+1]+list(value) for item in matrix for i,value in enumerate(item[1].items()) if value[-1]!=0]
>>> data
[[222, 1, 'a', 1], [222, 2, 'b', 3], [222, 3, 'c', 2], [222, 4, 'd', 1], [333, 1, 'a', 1], [333, 4, 'd', 1]]
>>> pd.DataFrame(data, columns=['ordernum', 'ordernum_index', 'key', 'value'])
   ordernum  ordernum_index key  value
0       222               1   a      1
1       222               2   b      3
2       222               3   c      2
3       222               4   d      1
4       333               1   a      1
5       333               4   d      1
Sign up to request clarification or add additional context in comments.

Comments

0

Expand the dictionary by using apply with pd.Series and use concat to concatenate that to your other column (ordernum). See below for your in-between result of df2. Now to turn every column into a row, use melt, then use query to drop all the 0-rows and finally assign the cumcount to get the index (after ordering) and add 1 to start counting from 1, not 0.

df2 = pd.concat([df[['ordernum']], df['dict_of item_counts'].apply(pd.Series)], axis=1)
(df2.melt(id_vars='ordernum', var_name='key')
.query('value != 0')
.sort_values(['ordernum', 'key'])
.assign(ordernum_index = lambda df: df.groupby('ordernum').cumcount().add(1)))
#   ordernum key  value  ordernum_index
#0       222   a      1               1
#2       222   b      3               2
#4       222   c      2               3
#6       222   d      1               4
#1       333   a      1               1
#7       333   d      1               2

Now df2 looks like:

#   ordernum  a  b  c  d
#0       222  1  3  2  1
#1       333  1  0  0  1

Comments

0

You can do this by unpacking your dictionarys while accesing them with iterrows and creating a tuple out of the ordernum, key, value.

Finally to create your ordernum_index we groupby on ordernum and do a cumcount:

data = [(r['ordernum'], k, v) for _, r in df.iterrows() for k, v in r['dict_of item_counts'].items() ]

new = pd.DataFrame(data, columns=['ordernum', 'key', 'value']).sort_values('ordernum').reset_index(drop=True)

new['ordernum_index'] = new[new['value'].ne(0)].groupby('ordernum').cumcount().add(1)
new.dropna(inplace=True)

   ordernum key  value  ordernum_index
0       222   a      1             1.0
1       222   b      3             2.0
2       222   c      2             3.0
3       222   d      1             4.0
4       333   a      1             1.0
7       333   d      1             2.0

1 Comment

The key which has 0 value must be excluded as per OP.
0

Construct dataframe df1 using df['dict_of item_counts'].tolist() for values and df.ordernum for index. replace 0 with np.nan and stack with dropna=True to ignore 0 values. reset_index to get all columns.

Next, create column ordernum_index by using groupby and cumcount.

Finally, change column names to appropriate names.

df1 = pd.DataFrame(df['dict_of item_counts'].tolist(), index=df.ordernum).replace(0, np.nan).stack(dropna=True).reset_index(name='value')
df1['ordernum_index'] = df1.groupby('ordernum')['value'].cumcount() + 1
df1 = df1.rename(columns={'level_1': 'key'})

Out[732]:
   ordernum key  value  ordernum_index
0       222   a    1.0               1
1       222   b    3.0               2
2       222   c    2.0               3
3       222   d    1.0               4
4       333   a    1.0               1
5       333   d    1.0               2

Comments

0
dd1=df1.set_index("ordernum").dict_of2item_counts.map(eval).apply(pd.Series).stack().reset_index().rename(columns={'level_1':"key",0:"value"}).query("value>0")
dd1.assign(ordernum_index=dd1.groupby("ordernum").key.transform('rank',method='first').astype(int))


  ordernum key  value  ordernum_index
0       222   a      1               1
1       222   b      3               2
2       222   c      2               3
3       222   d      1               4
4       333   a      1               1
7       333   d      1               2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.