2

my dataframe looks basically like this:

data = [[11200, 33000,dt.datetime(1995,3,1),10,np.nan], [11200, 33000, dt.datetime(1995,3,2),11, np.nan],[11200, 33000, dt.datetime(1995,3,3),9, np.nan],\
[23400, 45000, dt.datetime(1995,3,1),50, np.nan],  [23400, 45000, dt.datetime(1995,3,3),49, np.nan], [33000, 55000, dt.datetime(1995,3,1),60, np.nan], [33000, 55000, dt.datetime(1995,3,2),61, np.nan]]


df = pd.DataFrame(data, columns = ["Identifier", "Identifier2" ,"date", "price","price2"])

Output looks like:

index Identifier1 Identifier2     date    price1 price2 
  0      11200      33000      1995-03-01   10     nan
  1      11200      33000      1995-03-02   11     nan
  2      11200      33000      1995-03-03    9     nan
  3      23400      45000      1995-03-01   50     nan
  4      23400      45000      1995-03-03   49     nan
  5      33000      55000      1995-03-01   60     nan
  6      33000      55000      1995-03-02   61     nan

Please note that my index is not sorted by ascending numbers like to one of my example df. I would like to: look for the number that is in column Identifier2 (I know the exact number I want to look up) in column Identifier 1. Then copy the value of price1 into price2 with respect to correct dates, because some dates are missing.

My goal would look like this:

   index Identifier1 Identifier2     date    price1 price2 
      0      11200      33000      1995-03-01   10     60
      1      11200      33000      1995-03-02   11     61
      2      11200      33000      1995-03-03    9     nan
      3      23400      45000      1995-03-01   50     nan
      4      23400      45000      1995-03-03   49     nan
      5      33000      55000      1995-03-01   60     nan
      6      33000      55000      1995-03-02   61     nan

I'm sure this is not too difficult, but somehow I don't get it. Thank you very much in advance for any help.

2
  • Hi, this can be done with a merge, does the column price2 already exist in your real data? Commented Jul 12, 2021 at 19:10
  • Hi, I already tried using merge, but I got stuck when trying to merge two dataframes that did not have the same amounts of rows. And yes, the column price2 already exists, but there is no data in it. Commented Jul 12, 2021 at 19:55

2 Answers 2

2

One way:

df['price2'] = df[['Identifier2', 'date']].apply(tuple, 1).map(df.set_index(['Identifier','date'])['price'].to_dict())

OUTPUT:

   Identifier  Identifier2       date  price  price2
0       11200        33000 1995-03-01     10    60.0
1       11200        33000 1995-03-02     11    61.0
2       11200        33000 1995-03-03      9     NaN
3       23400        45000 1995-03-01     50     NaN
4       23400        45000 1995-03-03     49     NaN
5       33000        55000 1995-03-01     60     NaN
6       33000        55000 1995-03-02     61     NaN
Sign up to request clarification or add additional context in comments.

1 Comment

If you see the above code to generate the df. There’s no suffix -> 1. So, that’s why I ignored it.
1

I don't know if is the best way, but this works:

Using merge:

#Get a copy like 2 separated dataframe's
df1 = df [['index', 'Identifier',  'Identifier2','date', 'price']]
df2 = df [['Identifier','date', 'price']]

#Mergin on left
df3 = df1.merge(df2, how = 'left' ,left_on = ['Identifier2','date'] , right_on =['Identifier','date'], suffixes=('','R'))

#Drop created IdentifierR column an rename priceR to price2
df4 = df3.drop('IdentifierR', axis=1).rename(columns={'priceR':'price2'})

2 Comments

if you rename df2 before the merge, then it can be a bit less verbose. With your notation, then df1.merge(df2.rename(columns={'Identifier':'Identifier2', 'price':'price2'}), how = 'left') is directly the final result ;)
@Ben.T Yeah, think this, but I tried to be more didactic

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.