2

I have a list of tuples like this:

a=[('A7855', 'item1', 'item2'),('A7856', 'item3', 'item4', 'item5')]

and I want to save that list to a dataframe, like this:

No    ID     itemNum
1     A7855  item1
2            item2
3     A7856  item3
4            item4
5            item5

How do I solve this problem?

3 Answers 3

2

Also you can use melt here::

df=(pd.DataFrame(a).melt(0,value_name='itemNum').
drop('variable',1).dropna().sort_values(0).rename(columns={0:'ID'}).reset_index(drop=True))
print(df)

      ID itemNum
0  A7855   item1
1  A7855   item2
2  A7856   item3
3  A7856   item4
4  A7856   item5

To match you exact requirement, do on df:

df.loc[df.duplicated('ID'),'ID']=''
df.insert(0,'No',range(1,len(df)+1))
print(df)

   No     ID itemNum
0   1  A7855   item1
1   2          item2
2   3  A7856   item3
3   4          item4
4   5          item5
Sign up to request clarification or add additional context in comments.

Comments

1

Use list comprehension with flattening and loop all values without first for list of tuples, then pass it to DataFrame constructor:

b = [(x[0], y) for x in a for y in x[1:]]
df = pd.DataFrame(b, columns=['ID','itemNum'])
print (df)
      ID itemNum
0  A7855   item1
1  A7855   item2
2  A7856   item3
3  A7856   item4
4  A7856   item5

If need only first values for ID column add if-else statement with enumerate for counter of lists:

b = [(x[0], y) if i == 0 
               else ('', y) 
               for x in a for i, y in enumerate(x[1:])]
df = pd.DataFrame(b, columns=['ID','itemNum'])
print (df)
      ID itemNum
0  A7855   item1
1          item2
2  A7856   item3
3          item4
4          item5

And if need new column No add DataFrame.insert for add first new column with index values + 1:

df.insert(0, 'No', df.index + 1)
print (df)
   No     ID itemNum
0   1  A7855   item1
1   2          item2
2   3  A7856   item3
3   4          item4
4   5          item5

Comments

1

I suggest that you use multiple variable assignment. All variables in the tuple after the first one go into "itemnum".

data=[('A7855', 'item1', 'item2'),('A7856', 'item3', 'item4', 'item5')]
rows = []
ids = set()
for idx, *itemnum in data:
    for i in itemnum:
        if idx in ids:
            idx = ''
        rows.append((idx, i))
        ids.add(idx)
df = pd.DataFrame(rows, columns=['ID','itemNum'])
df.index = [i+1 for i in df.index]

My output:

      ID itemNum
1  A7855   item1
2          item2
3  A7856   item3
4          item4
5          item5

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.