Currently my dataframe looks something similar to:
ID Year Str1 Str2 Value
0 1 2014 high black 120
1 1 2015 high blue 20
2 2 2014 medium red 10
3 2 2014 medium blue 50
4 3 2015 low blue 30
5 3 2015 high blue .5
6 3 2015 high red 10
Desired:
ID Year Str1 Str2 Value
0 1 2014 high black 120
1 1 2015 high blue 20
2 2 2014 medium red, blue 60
3 3 2015 low, high blue, red 40.5
Trying to group by columns ID and Name, then getting sum of the numbers but a list of the strings. If removing duplicate strings is possible as in the example, that'd be helpful but not necessary.
This operation will be done to ~100 dataframes, ID and Year are the only column names which can be found in every dataframe. The dataframes do vary slightly: they have either value column, str columns or both.
I have browsed stackoverflow a lot and tried:
df.groupby(['ID', 'Year'], as_index=False).agg(lambda x: x.sum() if x.dtype=='int64' else ', '.join(x))
Which gave the error DataFrame object has no attribute dtype (which makes sense, since grouping by multiple columns returns more dataframes).
I also tried looping the columns one by one, and then if column has numbers, it would count the sum, else make a list:
for col in df:
if col in ['ID', 'Year']:
continue
if df[col].dtype.kind == 'i' or df[col].dtype.kind == 'f':
df = df.groupby(['ID', 'Year'])[col].apply(sum)
else:
df = df.groupby(['ID', 'Year'])[col].unique().reset_index()
However, after doing the operation the first time, it got rid of all the other columns.
Thanks in advance.