3

In a python pandas dataframe "df", I have the following columns:

user_id | song_id | song_duration | song_title | artist | listen_count

Many users might have listened to the same song - therefore the song is not unique in this table. I would like to create a second dataframe with just song information (with unique song_ids).

song_id | song_title | artist

I manage to create a table with song_id and song_title.

song_df = df.groupby('song_id').song_title.first()

How can I add, the column "artist" into this?

This doesn't work:

song_df = df.groupby('song_id').df['song_title','artist'].first()

AttributeError: 'DataFrameGroupBy' object has no attribute 'df'

2 Answers 2

1

IIUC try omit .df:

df.groupby('song_id')['song_title','artist'].first()
Sign up to request clarification or add additional context in comments.

Comments

0

You could just drop the duplicates of selected columns

song_df = df[['song_id','song_title','artist']].drop_duplicates()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.