How to update a column in Pandas Dataframe

Question

In Pandas, I am trying to add a new column / update an existing column to a data frame (DF2) with a value from another data frame (DF1). I can think of how to do this in SQL

UPDATE DF2
SET DF2['Column'] = DF1['Column']
FROM DF2
JOIN DF1 ON DF1['NonIndexColumn'] = DF2['NonIndexColumn']

Data Example:

d =[{'CustomerID': 1, 'SignUpDate': '2014-01-01'}, {'CustomerID': 2, 'SignUpDate': '2014-02-01'}, {'CustomerID': 3, 'SignUpDate': '2014-03-01'}, {'CustomerID': 4, 'SignUpDate': '2014-04-01'}]
DF1 = pd.DataFrame(data=d)

d2 = [{'OrderID': 1, 'CustomerID': 1, 'OrderDate': '2014-01-15'}, {'OrderID': 2, 'CustomerID': 1, 'OrderDate': '2014-01-15'}, {'OrderID': 3, 'CustomerID': 2, 'OrderDate': '2014-03-28'}, {'OrderID': 4, 'CustomerID': 1, 'OrderDate': '2014-03-29'}, {'OrderID': 5, 'CustomerID': 3, 'OrderDate': '2014-04-28'}, {'OrderID': 6, 'CustomerID': 2, 'OrderDate': '2014-06-01'}, {'OrderID': 7, 'CustomerID': 1, 'OrderDate': '2014-11-06'}, {'OrderID': 8, 'CustomerID': 3, 'OrderDate': '2015-01-28'}, {'OrderID': 9, 'CustomerID': 1, 'OrderDate': '2015-02-15'} ]
DF2 = pd.DataFrame(data=d2)

I am trying to add DF1['SignUpDate'] on to DF2, so that DF2 would look like this:

       CustomerID   OrderDate  OrderID  SignUpDate
0           1  2014-01-15        1      2014-01-01
1           1  2014-01-15        2      2014-01-01
2           2  2014-03-28        3      2014-02-01
3           1  2014-03-29        4      2014-01-01
4           3  2014-04-28        5      2014-03-01
5           2  2014-06-01        6      2014-02-01
6           1  2014-11-06        7      2014-01-01
7           3  2015-01-28        8      2014-03-01
8           1  2015-02-15        9      2014-01-01

I know the merge would allow me to add the column, but I would have to either overwrite the existing DF or create a new one, like this:

DF1 = pd.merge(DF1, DF2) #overwrite
DF3 = pd.merge(DF1, DF2) #new dataframe

Is there not a way to join on one field (maybe an indexed column, maybe not an indexed column) and update / add the field?

Please post raw data and desired output, it really depends on how your data's relationship is with each other but you could try df1['Column'] = df2['Column'].where((df1['NonIndexedColumn'] == df2['NonIndexedColumn']) — EdChum
– EdChum, Commented Apr 14, 2015 at 16:44

EdChum · Accepted Answer · 2015-04-14 17:24:07Z

1

Perform a left merge:

In [4]:

DF2.merge(DF1, on='CustomerID', how='left')
Out[4]:
   CustomerID   OrderDate  OrderID  SignUpDate
0           1  2014-01-15        1  2014-01-01
1           1  2014-01-15        2  2014-01-01
2           2  2014-03-28        3  2014-02-01
3           1  2014-03-29        4  2014-01-01
4           3  2014-04-28        5  2014-03-01
5           2  2014-06-01        6  2014-02-01
6           1  2014-11-06        7  2014-01-01
7           3  2015-01-28        8  2014-03-01
8           1  2015-02-15        9  2014-01-01

answered Apr 14, 2015 at 17:24

EdChum

397k204 gold badges837 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

mikebmassey Over a year ago

I got that to work. Thanks. In SQL, you can alter a table and add a new column. In pandas, specifically with this merge, I would need to overwrite DF2. Is that standard pandas functionality just to overwrite?

EdChum Over a year ago

No you can conditionally overwrite but it depends on the relationship between the 2 dfs, so you could either mask or select the rows of interest from the lhs and rhs and assign the values

Collectives™ on Stack Overflow

How to update a column in Pandas Dataframe

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related