Using a for loop to concatenate columns in Pandas

Question

I'm quite new to Python and I'm trying to use Pandas (in iPython Notebook, Python 3) to combine three columns. This is the original data:

       RegistrationID  FirstName  MiddleInitial   LastName    
           1              John       P             Smith    
           2              Bill       Missing       Jones   
           3              Paul       H             Henry

And I'd like to have:

   RegistrationID FirstName MiddleInitial   LastName    FullName
     1              John       P             Smith   Smith, John, P 
     2              Bill       Missing       Jones   Jones, Bill 
     3              Paul       H             Henry   Henry, Paul, H

I'm sure this is absolutely not the correct way of doing this, but this is how I have set it up so far in a for loop. Unfortunately, it just keeps going and going and never finishes.

%matplotlib inline
import pandas as pd

from IPython.core.display import HTML
css = open('style-table.css').read() + open('style-notebook.css').read()
HTML('<style>{}</style>'.format(css))

reg = pd.DataFrame.from_csv('regcontact.csv', index_col=RegistrationID)

for item, frame in regcombo['MiddleInitial'].iteritems():
while frame == 'Missing':
   reg['FullName'] = reg.LastName.map(str) + ", " + reg.FirstName 
else: break

The idea is then to add another column for those with complete names (i.e. including MiddleInitial):

for item, frame in regcombo['MiddleInitial'].iteritems():
while frame != 'Missing':
   reg['FullName1'] = reg.LastName.map(str) + ", " + reg.FirstName + ", " + reg.MiddleInitial
else: break

And then combine them, so that there are no null values. I've looked everywhere, but I can't quite figure it out. Any help would be appreciated, and I apologize in advance if I have broken any conventions, as this is my first post.

Alexander · Accepted Answer · 2016-03-04 18:02:46Z

1

This uses a list comprehension to create the new dataframe column, e.g. [(a, b, c) for a, b, c in some_iterable_item].

df['Full Name'] = [
   "{0}, {1} {2}"
   .format(last, first, middle if middle != 'Missing' else "").strip() 
   for last, first, middle 
   in df[['LastName', 'FirstName', 'MiddleInitial']].values]

>>> df
   RegistrationID FirstName MiddleInitial LastName      Full Name
0               1      John             P    Smith  Smith, John P
1               2      Bill       Missing    Jones    Jones, Bill
2               3      Paul             H    Henry  Henry, Paul H

The iterable_item is the array of values from the dataframe:

>>> df[['LastName', 'FirstName', 'MiddleInitial']].values
array([['Smith', 'John', 'P'],
       ['Jones', 'Bill', 'Missing'],
       ['Henry', 'Paul', 'H']], dtype=object)

So, per our list comprehension model:

>>> [(a, b, c) for (a, b, c) in df[['LastName', 'FirstName', 'MiddleInitial']].values]
[('Smith', 'John', 'P'), ('Jones', 'Bill', 'Missing'), ('Henry', 'Paul', 'H')]

I then format the string:

a = "Smith"
b = "John"
c = "P"
>>> "{0}, {1} {2}".format(a, b, c)
"Smith, John P"

I use a ternary to check if the middle name is 'Missing', so:

middle if middle != "Missing" else ""

is equivalent to:

if middle == 'Missing':
    middle = ""

Finally, I added .strip() to remove the extra space in case the middle name is missing.

edited Mar 4, 2016 at 18:02

answered Mar 4, 2016 at 17:55

Alexander

111k32 gold badges212 silver badges208 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Ami Tavory Over a year ago

Many thanks for the comment on the "Missing" stuff - didn't notice that in the question.

Steven Over a year ago

This was incredibly helpful. Thank you!

Steven Over a year ago

Hi there - this wouldn't work if I wanted to check for a particular value in a column and then return a pre-defined string in another column (i.e. instead of a value in the array), correct? In that case I would need to revert to a loop like in my original example?

Alexander Over a year ago

It would probably work fine with some modification, but you need to post a new question with a more specific example.

Ami Tavory · Accepted Answer · 2016-03-04 18:40:43Z

1

All you need to do is add the columns:

>>> df.FirstName + ', ' + df.LastName + ', ' + df.FullName.str.replace(', Missing', '')
0          John, Smith, P
1    Bill, Jones, Missing
2          Paul, Henry, H
dtype: object

To add a new column, you could just write:

df['FullName'] = df.FirstName + ', ' + ...

(In Pandas, it is usually attempted to avoid loops and such.)

edited Mar 4, 2016 at 18:40

answered Mar 4, 2016 at 18:12

Ami Tavory

76.7k13 gold badges152 silver badges196 bronze badges

2 Comments

Alexander Over a year ago

The timing benefit is marginal once you add logic to remove 'Missing' from the middle name. You'll need something like df.FullName.str.replace(', Missing', "")

Ami Tavory Over a year ago

Thanks, @Alexander - didn't notice the part about Missing. Appreciated!

Collectives™ on Stack Overflow

Using a for loop to concatenate columns in Pandas

2 Answers 2

4 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related