42

I need to combine multiple rows into a single row, that would be simple concat with space

    View of my dataframe:
  tempx        value
0  picture1         1.5
1  picture555       1.5
2  picture255       1.5
3  picture365       1.5
4  picture112       1.5

I want the dataframe to be converted like this: (space separated) tempx values

  Expected output:
  tempx                                                       value
  0     picture1 picture555 picture255 picture365 picture112  1.5

  or
  as a python dict
  {1.5:{picture1 picture555 picture255 picture365 picture112}}

What I have tried :

 df_test['tempx']=df_test['tempx'].str.cat(sep=' ')

this works but it combines the rows in all the columns like this:

      tempx        value
0  picture1 picture555 picture255 picture365 picture112 1.5
1  picture1 picture555 picture255 picture365 picture112 1.5
2  picture1 picture555 picture255 picture365 picture112 1.5
3  picture1 picture555 picture255 picture365 picture112 1.5
4  picture1 picture555 picture255 picture365 picture112 1.5

Is there any elegant solution?

3
  • also if there is a solution to conditionally combine based on value column Commented Apr 3, 2016 at 23:56
  • What is your expected output, can you edit and example into your question? Do you want to "group by" the value column, so you join the picture names for within each value? Commented Apr 4, 2016 at 0:48
  • I have applied grouby using pandas, next step I would like to do is to have a single row for each value attribute. please check the expected output Commented Apr 4, 2016 at 2:31

1 Answer 1

87

You can use groupby and apply function join :

print df.groupby('value')['tempx'].apply(' '.join).reset_index()
   value                                              tempx
0    1.5  picture1 picture555 picture255 picture365 pict...
Sign up to request clarification or add additional context in comments.

7 Comments

@jezrael hi, is there a way to merge more than one column? instead of tempx i want to merge also more columns how to do that? I am trying df.groupby('value')['tempx','second_column','third_column'].apply(' '.join).reset_index() but I am receiving only groupped names of columns
@sygneto - Use df.groupby('value')['tempx','second_column','third_column'].agg(' '.join).reset_index()
thank you, i forgot again about .agg ^^, good to have you here
for me, the call for multiple columns raises the FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead. and does not include the second column, am I doing something wrong?
@Ivo use [] like df.groupby('value')[['tempx','second_column','third_column']].agg(' '.join).reset_index()
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.