How to merge two field on one csv file with one field of other csv file?

Question

I would like to merge two CSV files as follow:

First CSV File :

df = pd.DataFrame()
df["ticket_number"] = ['AAA', 'AAA', 'AAA', 'ABC', 'ABA','ADC','ABA','BBB']
df["train_board_station"] = ['Tokyo', 'LA', 'Paris', 'New_York', 'Delhi','Phoenix', 'London','LA']
df["train_off_station"] = ['Phoenix', 'London', 'Sydney', 'Berlin', 'Shanghai','LA', 'Paris', 'New_York']

Second CSV file:

rec = pd.DataFrame()
rec["code"] = ['Tokyo','London','Paris','New_York','Shanghai','LA','Sydney','Berlin','Phoenix','Delhi']
rec["count_A"] = ['1.2','7.8','4','8','7.8','3','8','5','2','10']
rec["count_B"] = ['12','78','4','8','78','36','88','51','25','10']

I use the following code:

for x in ["board", "off"]:
    df["station"] = df["train_" + x + "_station"]
    df["code"] = df["train_" + x + "_station"]
    df = pd.concat([df,rec], axis=1, join_axes=[df.index])
    df[x + "_count_A"] = df["count_A"]
    df[x + "_count_B"] = df["count_B"]
    df = df.drop(["station", "code","count_A","count_B"], axis=1)

I get the following incorrect output :

ticket_number,train_board_station,train_off_station,board_count_A,board_count_B,off_count_A,off_count_B
AAA,Tokyo,Phoenix,1.2,12,1.2,12
AAA,LA,London,7.8,78,7.8,78
AAA,Paris,Sydney,4,4,4,4
ABC,New_York,Berlin,8,8,8,8
ABA,Delhi,Shanghai,7.8,78,7.8,78
ADC,Phoenix,LA,3,36,3,36
ABA,London,Paris,8,88,8,88
BBB,LA,New_York,5,51,5,51

I notice that instead of count_A and count_B merging with train_board station and train_off_station of same line, first line gets merged with train_board_station and second lines gets merged with train_off_station twice.

The expected output is:

ticket_number,train_board_station,train_off_station,board_count_A,board_count_B,off_count_A,off_count_B
AAA,Tokyo,Phoenix,1.2,12,2,25
AAA,LA,London,3,36,7.8,78
AAA,Paris,Sydney,4,4,8,88
ABC,New_York,Berlin,8,8,5,51
ABA,Delhi,Shanghai,10,10,7.8,78
ADC,Phoenix,LA,2,26,3,36
ABA,London,Paris,7.7,78,4,4
BBB,LA,New_York,36,36,8,8

Can you paste the expected output for more clarity.

Satyadev
– Satyadev

2017-05-15 11:20:33 +00:00
Commented May 15, 2017 at 11:20 — Satyadev
– Satyadev, Commented May 15, 2017 at 11:20

jezrael · Accepted Answer · 2017-05-15 12:07:26Z

0

There is problem with duplicates, I use join with left join:

for x in ["board", "off"]:
    df["code"] = df["station"] = df["train_" + x + "_station"]
    df = df.join(rec.set_index('code'), on='code')
    df[x + "_count_A"] = df["count_A"]
    df[x + "_count_B"] = df["count_B"]
    df = df.drop(["station", "code","count_A","count_B"], axis=1)

print (df)
  ticket_number train_board_station train_off_station board_count_A  \
0           AAA               Tokyo           Phoenix           1.2   
1           AAA                  LA            London             3   
2           AAA               Paris            Sydney             4   
3           ABC            New_York            Berlin             8   
4           ABA               Delhi          Shanghai            10   
5           ADC             Phoenix                LA             2   
6           ABA              London             Paris           7.8   
7           BBB                  LA          New_York             3   

  board_count_B off_count_A off_count_B  
0            12           2          25  
1            36         7.8          78  
2             4           8          88  
3             8           5          51  
4            10         7.8          78  
5            25           3          36  
6            78           4           4  
7            36           8           8

edited May 15, 2017 at 12:07

answered May 15, 2017 at 11:40

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

jezrael Over a year ago

Thank you for it, I will check it.

user7779326 Over a year ago

And also the input has been changed slightly. i have removed station_A and station_B for simplicity

jezrael Over a year ago

Sorry, previous solution was complicated, not I think it is simplier and hope correct. Please check it.

Collectives™ on Stack Overflow

How to merge two field on one csv file with one field of other csv file?

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related