2

I have a pandas dataframe from data I read from a CSV. One column is for the name of a group, while the other column contains a string (that looks like a list), like the following:

Group      |  Followers
------------------------------------------
biebers    |  u'user1', u'user2', u'user3'
catladies  |  u'user4', u'user5'
bkworms    |  u'user6', u'user7'

I'd like to try to split up the strings in the "Followers" column and make a separate dataframe where each row is for a user, as well as a column showing which group they're in. So for this example I'd like to get the following:

User       |     Group
--------------------------------
user1      |     biebers
user2      |     biebers
user3      |     biebers
user4      |     catladies
user5      |     catladies
user6      |     bkworms
user7      |     bkworms

Anyone have suggestions for the best way to approach this? Here's a screenshot of what it looks like:

enter image description here

4
  • What do you mean a "string that looks like a list?" Does it look like a list of multiple unicode strings? How did that get in there? Commented Sep 15, 2016 at 6:10
  • Yes the entries are all strings (that happen to look like a list of unicode strings). The data was read from a CSV, which returned strings for all the entries... Commented Sep 15, 2016 at 6:16
  • can you post the result of df.head(10) Commented Sep 15, 2016 at 6:18
  • Ok-I attached a screenshot. Commented Sep 15, 2016 at 6:27

1 Answer 1

2
df.Followers = df.Followers.str.replace(r"u'([^']*)'", r'\1')

df.set_index('Group').Followers.str.split(r',\s*', expand=True) \
  .stack().rename('User').reset_index('Group').set_index('User')

enter image description here


To keep User as a column.

df.Followers = df.Followers.str.replace(r"u'([^']*)'", r'\1')

df.set_index('Group').Followers.str.split(r',\s*', expand=True) \
  .stack().rename('User').reset_index('Group') \
  .reset_index(drop=True)[['User', 'Group']]
Sign up to request clarification or add additional context in comments.

5 Comments

Oh wow, never knew about expand=True that will come in handy.
That was great! I'm wondering though how to make the df so the users are not the index, but just another column... Sorry, I wasn't clear on the exact output I needed...
@Imu easy, that was a choice of mine. I'll update my post.
For the second option I get a: KeyError: ('User', 'Group'). Any idea what might be going on there?
@Imu fixed typo

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.