2

I have a pandas DataFrame of this type

col1  col2           col3
 1   [blue]         [in,out]
 2   [green, green] [in]
 3   [green]        [in]

and I need convert it to a dataframe that keep the first column and distribute all the other values in columns as rows:

col1 value
1    blue
1    in
1    out
2    green
2    green
2    in
3    green
3    in

3 Answers 3

1

Use DataFrame.stack with Series.explode for convert lists, last some data cleaning with DataFrame.reset_index:

df1 = (df.set_index('col1')
         .stack()
         .explode()
         .reset_index(level=1, drop=True)
         .reset_index(name='value'))

Alternative with DataFrame.melt and DataFrame.explode:

df1 = (df.melt('col1')
         .explode('value')
         .sort_values('col1')[['col1','value']]
         .reset_index(drop=True)
)

print (df1)
   col1  value
0     1   blue
1     1     in
2     1    out
3     2  green
4     2  green
5     2     in
6     3  green
7     3     in

Or list comprehension solution:

L = [(k, x) for k, v in df.set_index('col1').to_dict('index').items() 
            for k1, v1 in v.items() 
            for x in v1]

df1 = pd.DataFrame(L, columns=['col1','value'])
print (df1)
   col1  value
0     1   blue
1     1     in
2     1    out
3     2  green
4     2  green
5     2     in
6     3  green
7     3     in
Sign up to request clarification or add additional context in comments.

Comments

0

Another solution could consist of:

  • list comprehension to make col1 with new values and
  • using list concatenation of values in df['col2'] and df['col3'] in order to make value column.

The code is following:

df_final = pd.DataFrame(
    {
        'col1': [
            i for i, sublist in zip(df['col1'], (df['col2'] + df['col3']).values) 
              for val in range(len(sublist))
        ],
        'value': sum((df['col2'] + df['col3']).values, [])
    }
)
print(df_final)
   col1   value
0     1    blue
1     1      in
2     1     out
3     2   green
4     2   green
5     2      in
6     3   green
7     3      in

Comments

0
d = []
c = []
for i in range(len(df)):
   d.append([j for j in df['c2'][i]])
   d.append([j for j in df['c3'][i]])
   c.append(str(df['c1'][i]) * (len(df['c2'][i])+ len(df['c3'][i])))
   c = [list(j) for j in c]

d = [i for sublist in d for i in sublist]
c = [i for sublist in d for i in sublist]
df1 = pd.DataFrame()
df1['c1'] = c
df1['c2'] = d
df = df1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.