I have a DataFrame like this:
Name asn
Org1 asn1,asn2
org2 asn3
org3 asn4,asn5
I would like to convert my DataFrame to look like this:
Name asn
Org1 asn1
Org1 asn2
org2 asn3
org3 asn4
Org3 asn5
Does anybody know how can I do that?
Assuming your starting DataFrame is named df, you could write:
>>> df2 = df.asn.str.split(',').apply(pd.Series) # break df.asn into columns
>>> df2.index = df.Name # set the index as df.Name
>>> df2 = df2.stack().reset_index('Name') # stack and reset_index
>>> df2
Name 0
0 Org1 asn1
1 Org1 asn2
0 org2 asn3
0 org3 asn4
1 org3 asn5
All that's left to do is rename the column:
df2.rename(columns={0: 'asn'}, inplace=True)
Depending on your next move, you may also want to set a more useful index.
drop('level_1', axis=1) by using reset_index('Name').Just spent hours dealing with this and discovered that the explode function is a much simpler solution.
First replace the strings in the multi-valued cells with lists like this:
asn_lists = df.asn.str.split(',') # split strings into list
df.asn = asn_lists # replace strings with lists in the dataframe
And the just use the explode function:
df2 = df.explode('asn') # explode based on the production_companies column
This solution will also work for larger dataframes with extra columns