2

I have two data frames,

import pandas as pd
a = pd.DataFrame( { 'port':[1,1,0,1,0], 'cd':[1,2,3,2,1], 'date':["2014-02-26","2014-02-25","2014-02-26","2014-02-26","2014-02-25"] } )
b = pd.DataFrame( { 'port':[0,1,0,1,0], 'fac':[2,1,2,2,3], 'date': ["2014-02-25","2014-02-25","2014-02-26","2014-02-26","2014-02-27"] } )

What I need to do is take every date-port pair, like say port 0 and date 2014-02-25, look up the fac value in b and fill this into a new column in a. The output should therefore look like

port cd date         fac 
1    1  "2014-02-26" 2
1    2  "2014-02-25" 1
... (so on) ...

I tried just merging the frames on both date and port but got an error, which I think is due to the fact that the data frames are of different sizes--and I kind of don't expect that it would work anyway.

3 Answers 3

2

If you are looking to merge both dataframes you should use merge

import pandas as pd
a = pd.DataFrame( { 'port':[1,1,0,1,0], 'cd':[1,2,3,2,1], 
         'date':["2014-02-26","2014-02-25","2014-02-26","2014-02-26","2014-02-25"]})

b = pd.DataFrame( { 'port':[0,1,0,1,0], 'fac':[2,1,2,2,3], 
         'date': ["2014-02-25","2014-02-25","2014-02-26","2014-02-26","2014-02-27"]})

df = a.merge(b)
print (df)

output:

  port  cd  date       fac
0   1   1   2014-02-26  2
1   1   2   2014-02-26  2
2   1   2   2014-02-25  1
3   0   3   2014-02-26  2
4   0   1   2014-02-25  2
Sign up to request clarification or add additional context in comments.

Comments

2

I recommend you to create a new column in dataframe A and populate it through "numpy.vectorize"

import pandas as pd
import numpy as np

A = pd.DataFrame({'port': [1, 1, 0, 1, 0], 'cd': [1, 2, 3, 2, 1], 'date': ["2014-02-26", "2014-02-25", "2014-02-26", "2014-02-26", "2014-02-25"]})
B = pd.DataFrame({'port': [0, 1, 0, 1, 0], 'fac': [2, 1, 2, 2, 3], 'date': ["2014-02-25", "2014-02-25", "2014-02-26", "2014-02-26", "2014-02-27"]})

Setup indexes in dataframe B to access by "date" and "port":

C = B.set_index(['date', 'port'])

Then, create the function that will be applied to each row in dataframe A:

def get_fac(date, port):
    try:
        return C.loc[date].loc[port]['fac']
    except KeyError:
        return ''

A['fac'] = np.vectorize(get_fac)(A['date'], A['port'])

This is the output:

   cd        date  port  fac
0   1  2014-02-26     1    2
1   2  2014-02-25     1    1
2   3  2014-02-26     0    2
3   2  2014-02-26     1    2
4   1  2014-02-25     0    2

Comments

1

I believe need drop_duplicates with merge:

cols = ['port','date']
df = a.drop_duplicates(cols).merge(b, on=cols)
print (df)
   port  cd        date  fac
0     1   1  2014-02-26    2
1     1   2  2014-02-25    1
2     0   3  2014-02-26    2
3     0   1  2014-02-25    2

But if want combination of all duplicated pairs:

cols = ['port','date']
df1 = a.merge(b, on=cols)
print (df1)
   port  cd        date  fac
0     1   1  2014-02-26    2
1     1   2  2014-02-26    2
2     1   2  2014-02-25    1
3     0   3  2014-02-26    2
4     0   1  2014-02-25    2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.