How to loop through a dataframe, create a new column and append values to it in python

Question

I have the following problem. I have a dataframe with several columns, one of those contains strings as values. I want to loop through this column, change those values and save the changed values in a new column.

The code I have written so far looks like this:

def get_classes(x):    
    for index, string in df['column'].iteritems():
        listi = string.split(',')
        Classes=[]

        for value in listi:
            count=listi.count(value)
            if count >= 3: 
                Classes.append(value)

        Unique=(',').join(sorted(list(set(Classes))))
        df['NewColumn']=Unique


End.apply(get_classes)

It loops through the rows of df['column'], splitting the string at each ,(creating a list called listi) and creates an empty list called classes. It then counts each value in listi and appends it to Classes if it occures at least three times in the list. The finished list is then sorted and set(), so that all objects in the list are unique, and finally joined at comma to a string again. Then I want to append this unique list of value in a new column, at the same index position as the row value the changed value is derived from. As example:

df
  column    NewColumn
0 A,A,A,C   A 
1 C,B,C,C   C
2 B,B,B,B   B

My code seems to work fine when I do print Unique instead of df['NewColumn']=Unique, as it then prints all the transformed values. If I execute the code like in my example however, the NewColumn of the dataframe is completely filled with the same value, which seems to correspond to the original value of the last row in the df. Can someone explain to me what the problem here is?

there are issues on indexing, by looking at your code, you try at each iteration to add a column named 'new column' with value from Unique ... so this column is overwritten and overwritten for each row ...this is why you have the same value from the last row... — Colonel Beauvel
– Colonel Beauvel, Commented Dec 2, 2015 at 10:53

Colonel Beauvel · Accepted Answer · 2015-12-02 10:26:09Z

2

You can use powerfull Counter from Collections:

from collections import Counter

foo = lambda x: ','.join(sorted([k for k,v in Counter(x).iteritems() if v>=3]))

df['new'] = df['column'].str.split(',').map(foo)


#In [33]: df
#Out[33]:
#    column NewColumn new
#0  A,A,A,C         A   A
#1  C,B,C,C         C   C
#2  B,B,B,B         B   B

answered Dec 2, 2015 at 10:26

Colonel Beauvel

31.3k11 gold badges49 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

sequence_hard Over a year ago

Thank you, this works fine. But do you have any idea why my code does not work the way I want it to work/ what I should change for it to work?

Colonel Beauvel Over a year ago

I strongly recommend you to use this Counter since you decouple the function itself from the loop on the dataframe (easy for unit tests on the function) and ... it's also ... neater/easier to understand: 2 lines.

Collectives™ on Stack Overflow

How to loop through a dataframe, create a new column and append values to it in python

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related