Run without ['Color']
new = df.groupby("ID").agg(",".join).reset_index()
and it will run it for all columns.
Minimal working example:
data = """ID Color Fruit
1 Orange Orange
1 Yellow Banana
1 Red Apple
2 Yellow Banana
2 Green Pear
3 Purple Grapes
4 Red Apple
4 Orange Orange
5 Orange Peach
5 Red Apple
5 Yellow Banana
5 Purple Grapes
6 Red Apple
7 Orange Peach"""
import pandas as pd
import io
df = pd.read_csv(io.StringIO(data), sep=r"\s+")
print(df)
new = df.groupby("ID").agg(",".join).reset_index() #drop=True)
print(new)
Result:
ID Color Fruit
0 1 Orange,Yellow,Red Orange,Banana,Apple
1 2 Yellow,Green Banana,Pear
2 3 Purple Grapes
3 4 Red,Orange Apple,Orange
4 5 Orange,Red,Yellow,Purple Peach,Apple,Banana,Grapes
5 6 Red Apple
6 7 Orange Peach
EDIT:
Your error in comment may suggest that you have column with numbers and this may need to convert them to strings before running ",".join - ie. using map(str, column)
def convert(column):
#column = map(str, column)
#return ",".join(column))
return ",".join(map(str, column))
new = df.groupby("ID").agg(convert).reset_index()
Other idea is to keep everything as lists instead of converting to strings
new = df.groupby("ID").agg(list).reset_index()
Eventually you can check type of data in column and
- columns with integer/float values keep as list of values,
- other columns convert to strings.
def convert(column):
if column.dtype in (int, float):
return list(column)
else:
return ",".join(map(str, column))
new = df.groupby("ID").agg(convert).reset_index()
Minimal working code with columnRank which has integer values.
data = """ID Color Fruit Rank
1 Orange Orange 1
1 Yellow Banana 2
1 Red Apple 3
2 Yellow Banana 4
2 Green Pear 5
3 Purple Grapes 6
4 Red Apple 7
4 Orange Orange 8
5 Orange Peach 9
5 Red Apple 10
5 Yellow Banana 11
5 Purple Grapes 12
6 Red Apple 13
7 Orange Peach 14"""
import pandas as pd
import io
df = pd.read_csv(io.StringIO(data), sep=r"\s+")
print(df)
# new = df.groupby("ID").agg(",".join).reset_index(drop=True)
# print(new)
print("--- strings ---")
def convert(column):
# column = map(str, column)
# return ",".join(column))
return ",".join(map(str, column))
new = df.groupby("ID").agg(convert).reset_index(drop=True)
print(new)
print("--- lists ---")
new = df.groupby("ID").agg(list).reset_index(drop=True)
print(new)
print("--- strings and lists ---")
def convert(column):
if column.dtype in (int, float):
return list(column)
else:
return ",".join(map(str, column))
new = df.groupby("ID").agg(convert).reset_index(drop=True)
print(new)
Result:
ID Color Fruit Rank
0 1 Orange Orange 1
1 1 Yellow Banana 2
2 1 Red Apple 3
3 2 Yellow Banana 4
4 2 Green Pear 5
5 3 Purple Grapes 6
6 4 Red Apple 7
7 4 Orange Orange 8
8 5 Orange Peach 9
9 5 Red Apple 10
10 5 Yellow Banana 11
11 5 Purple Grapes 12
12 6 Red Apple 13
13 7 Orange Peach 14
--- strings ---
Color Fruit Rank
0 Orange,Yellow,Red Orange,Banana,Apple 1,2,3
1 Yellow,Green Banana,Pear 4,5
2 Purple Grapes 6
3 Red,Orange Apple,Orange 7,8
4 Orange,Red,Yellow,Purple Peach,Apple,Banana,Grapes 9,10,11,12
5 Red Apple 13
6 Orange Peach 14
--- lists ---
Color Fruit Rank
0 [Orange, Yellow, Red] [Orange, Banana, Apple] [1, 2, 3]
1 [Yellow, Green] [Banana, Pear] [4, 5]
2 [Purple] [Grapes] [6]
3 [Red, Orange] [Apple, Orange] [7, 8]
4 [Orange, Red, Yellow, Purple] [Peach, Apple, Banana, Grapes] [9, 10, 11, 12]
5 [Red] [Apple] [13]
6 [Orange] [Peach] [14]
--- strings and lists ---
Color Fruit Rank
0 Orange,Yellow,Red Orange,Banana,Apple [1, 2, 3]
1 Yellow,Green Banana,Pear [4, 5]
2 Purple Grapes [6]
3 Red,Orange Apple,Orange [7, 8]
4 Orange,Red,Yellow,Purple Peach,Apple,Banana,Grapes [9, 10, 11, 12]
5 Red Apple [13]
6 Orange Peach [14]