How to combine rows with different values in columns in pandas dataframe

Question

I am doing the following operation:

import pandas as pd

something = [[1, "p", 2], [3, "t", 5], [6, "u", 10], [1, "p", 2], [4, "l", 9], [1, "t", 2], [3, "t", 5], [6, "c", 10], [1, "p", 2], [4, "l", 9]]
test = pd.DataFrame(something)
print(test)
test = test.drop_duplicates()
test.columns = ['id', 'state', 'level']
test = test.sort_values(by=['id'], ascending=True)
test_unique = test["id"].unique()
print(test[test["id"] == 1])

The output is the following: 
   0  1   2
0  1  p   2
1  3  t   5
2  6  u  10
3  1  p   2
4  4  l   9
5  1  t   2
6  3  t   5
7  6  c  10
8  1  p   2
9  4  l   9

#this after dropping duplicates 
   id state  level
0   1     p      2
5   1     t      2

What I want to do is to combine these two rows with the same id and produce one output as 1 p-t 2. Here, the column names will be the same id, state and level. How can this be accomplished?

Abhi · Accepted Answer · 2021-06-04 20:43:28Z

3

You could use groupby.agg,

print(df)

    id  state  level
0   1   p      2
5   1   t      2

df.groupby("id", as_index=False).agg(
                      {'state': '-'.join, "id": "first", "level": "first"})

    state   id  level
0   p-t     1   2

answered Jun 4, 2021 at 20:43

Abhi

4,2431 gold badge18 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Joe Thor · Accepted Answer · 2021-06-04 20:49:23Z

1

You can group by then aggregate

import pandas as pd

something = [[1, "p", 2], [3, "t", 5], [6, "u", 10], [1, "p", 2], [4, "l", 9], [1, "t", 2], [3, "t", 5], [6, "c", 10], [1, "p", 2], [4, "l", 9]]
test = pd.DataFrame(something)
print(test)
test = test.drop_duplicates()
test.columns = ['id', 'state', 'level']
test = test.sort_values(by=['id'], ascending=True)
test_unique = test["id"].unique()


df_aslist = test.groupby(['id', 'level']).aggregate(lambda x: list(x)).reset_index()

df_aslist['state'] = df_aslist['state'].apply(lambda x: '-'.join(x))
print(df_aslist)

returns

   id  level state
0   1      2   p-t
1   3      5     t
2   4      9     l
3   6     10   u-c

or just for the specified value

print(df_aslist[df_aslist['id'] == 1])

prints

   id  level state
0   1      2   p-t

edited Jun 4, 2021 at 20:49

answered Jun 4, 2021 at 20:44

Joe Thor

1,2601 gold badge13 silver badges21 bronze badges

Comments

fsimonjetz · Accepted Answer · 2021-06-04 20:45:21Z

0

Pandas beginner here.. I'd transpose, merge the values as columns, then transpose back:

def merge_duplicates(x):    
    a,b = x

    if a==b:
        return a
    else:
        a,b = str(a), str(b)
        return '-'.join((a,b))

df = pd.DataFrame({"id":[0,5], "state":[1, 1], "level":[2,2]})

df = df.T

df["combined"] = [merge_duplicates(row) for row in df[[0,1]].values]

df = df.T

answered Jun 4, 2021 at 20:45

fsimonjetz

5,7923 gold badges7 silver badges23 bronze badges

Collectives™ on Stack Overflow

How to combine rows with different values in columns in pandas dataframe

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related