2

I am doing the following operation:

import pandas as pd

something = [[1, "p", 2], [3, "t", 5], [6, "u", 10], [1, "p", 2], [4, "l", 9], [1, "t", 2], [3, "t", 5], [6, "c", 10], [1, "p", 2], [4, "l", 9]]
test = pd.DataFrame(something)
print(test)
test = test.drop_duplicates()
test.columns = ['id', 'state', 'level']
test = test.sort_values(by=['id'], ascending=True)
test_unique = test["id"].unique()
print(test[test["id"] == 1])

The output is the following: 
   0  1   2
0  1  p   2
1  3  t   5
2  6  u  10
3  1  p   2
4  4  l   9
5  1  t   2
6  3  t   5
7  6  c  10
8  1  p   2
9  4  l   9

#this after dropping duplicates 
   id state  level
0   1     p      2
5   1     t      2

What I want to do is to combine these two rows with the same id and produce one output as 1 p-t 2. Here, the column names will be the same id, state and level. How can this be accomplished?

3 Answers 3

3

You could use groupby.agg,

print(df)

    id  state  level
0   1   p      2
5   1   t      2

df.groupby("id", as_index=False).agg(
                      {'state': '-'.join, "id": "first", "level": "first"})

    state   id  level
0   p-t     1   2
Sign up to request clarification or add additional context in comments.

Comments

1

You can group by then aggregate

import pandas as pd

something = [[1, "p", 2], [3, "t", 5], [6, "u", 10], [1, "p", 2], [4, "l", 9], [1, "t", 2], [3, "t", 5], [6, "c", 10], [1, "p", 2], [4, "l", 9]]
test = pd.DataFrame(something)
print(test)
test = test.drop_duplicates()
test.columns = ['id', 'state', 'level']
test = test.sort_values(by=['id'], ascending=True)
test_unique = test["id"].unique()


df_aslist = test.groupby(['id', 'level']).aggregate(lambda x: list(x)).reset_index()

df_aslist['state'] = df_aslist['state'].apply(lambda x: '-'.join(x))
print(df_aslist)

returns

   id  level state
0   1      2   p-t
1   3      5     t
2   4      9     l
3   6     10   u-c

or just for the specified value

print(df_aslist[df_aslist['id'] == 1])

prints

   id  level state
0   1      2   p-t

Comments

0

Pandas beginner here.. I'd transpose, merge the values as columns, then transpose back:

def merge_duplicates(x):    
    a,b = x

    if a==b:
        return a
    else:
        a,b = str(a), str(b)
        return '-'.join((a,b))

df = pd.DataFrame({"id":[0,5], "state":[1, 1], "level":[2,2]})

df = df.T

df["combined"] = [merge_duplicates(row) for row in df[[0,1]].values]

df = df.T

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.