0

In Pandas, I am trying to add a new column / update an existing column to a data frame (DF2) with a value from another data frame (DF1). I can think of how to do this in SQL

UPDATE DF2
SET DF2['Column'] = DF1['Column']
FROM DF2
JOIN DF1 ON DF1['NonIndexColumn'] = DF2['NonIndexColumn']

Data Example:

d =[{'CustomerID': 1, 'SignUpDate': '2014-01-01'}, {'CustomerID': 2, 'SignUpDate': '2014-02-01'}, {'CustomerID': 3, 'SignUpDate': '2014-03-01'}, {'CustomerID': 4, 'SignUpDate': '2014-04-01'}]
DF1 = pd.DataFrame(data=d)

d2 = [{'OrderID': 1, 'CustomerID': 1, 'OrderDate': '2014-01-15'}, {'OrderID': 2, 'CustomerID': 1, 'OrderDate': '2014-01-15'}, {'OrderID': 3, 'CustomerID': 2, 'OrderDate': '2014-03-28'}, {'OrderID': 4, 'CustomerID': 1, 'OrderDate': '2014-03-29'}, {'OrderID': 5, 'CustomerID': 3, 'OrderDate': '2014-04-28'}, {'OrderID': 6, 'CustomerID': 2, 'OrderDate': '2014-06-01'}, {'OrderID': 7, 'CustomerID': 1, 'OrderDate': '2014-11-06'}, {'OrderID': 8, 'CustomerID': 3, 'OrderDate': '2015-01-28'}, {'OrderID': 9, 'CustomerID': 1, 'OrderDate': '2015-02-15'} ]
DF2 = pd.DataFrame(data=d2)

I am trying to add DF1['SignUpDate'] on to DF2, so that DF2 would look like this:

       CustomerID   OrderDate  OrderID  SignUpDate
0           1  2014-01-15        1      2014-01-01
1           1  2014-01-15        2      2014-01-01
2           2  2014-03-28        3      2014-02-01
3           1  2014-03-29        4      2014-01-01
4           3  2014-04-28        5      2014-03-01
5           2  2014-06-01        6      2014-02-01
6           1  2014-11-06        7      2014-01-01
7           3  2015-01-28        8      2014-03-01
8           1  2015-02-15        9      2014-01-01

I know the merge would allow me to add the column, but I would have to either overwrite the existing DF or create a new one, like this:

DF1 = pd.merge(DF1, DF2) #overwrite
DF3 = pd.merge(DF1, DF2) #new dataframe

Is there not a way to join on one field (maybe an indexed column, maybe not an indexed column) and update / add the field?

1
  • 2
    Please post raw data and desired output, it really depends on how your data's relationship is with each other but you could try df1['Column'] = df2['Column'].where((df1['NonIndexedColumn'] == df2['NonIndexedColumn']) Commented Apr 14, 2015 at 16:44

1 Answer 1

1

Perform a left merge:

In [4]:

DF2.merge(DF1, on='CustomerID', how='left')
Out[4]:
   CustomerID   OrderDate  OrderID  SignUpDate
0           1  2014-01-15        1  2014-01-01
1           1  2014-01-15        2  2014-01-01
2           2  2014-03-28        3  2014-02-01
3           1  2014-03-29        4  2014-01-01
4           3  2014-04-28        5  2014-03-01
5           2  2014-06-01        6  2014-02-01
6           1  2014-11-06        7  2014-01-01
7           3  2015-01-28        8  2014-03-01
8           1  2015-02-15        9  2014-01-01
Sign up to request clarification or add additional context in comments.

2 Comments

I got that to work. Thanks. In SQL, you can alter a table and add a new column. In pandas, specifically with this merge, I would need to overwrite DF2. Is that standard pandas functionality just to overwrite?
No you can conditionally overwrite but it depends on the relationship between the 2 dfs, so you could either mask or select the rows of interest from the lhs and rhs and assign the values

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.