I have one data frame which is loads a central file. New files are monthly updated here. Since there are few missing columns in the file which is copied into the data frame, I created a mapping dataframe which adds values to the dataframe when condition is met for the missing columns.
Below is the central file example:
ID Region Country Code User Order Price
1 Germany ABC 2342545
2 Italy DEF 5464545
3 USA GHI 3245325
4 India JKL 676565
5 Mexico MNO 3443252
6 China PQR 565445
7 Germany STU 765765
8 Mexico VWX 564566
9 China YZA 346534
10 India BCD 5675765
This is my mapping file for missing Region and Code
Country Region Code
Germany EU 1
Italy EU 2
USA America 3
India Asia 4
Mexico America 5
China Asia 6
Here is the expected output:
ID Region Country Code User Order Price
1 EU Germany 1 ABC 2342545
2 EU Italy 2 DEF 5464545
3 America USA 3 GHI 3245325
4 Asia India 4 JKL 676565
5 America Mexico 5 MNO 3443252
6 Asia China 6 PQR 565445
7 EU Germany 2 STU 765765
8 America Mexico 5 VWX 564566
9 Asia China 6 YZA 346534
10 Asia India 4 BCD 5675765
What I have done is to use for loops with iterrows() to update the values in the data frame.
The problem is it is super slow and it takes about an hour or more to update about 60,000 records.
here is my code:
for central_update_index, central_update_row in central_bridge_file.iterrows():
print('index: ', central_update_index)
for bridge_match_index, bridge_match_row in central_bridge_matching_file.iterrows():
if bridge_match_row['Country'] == central_update_row['Country']:
if central_update_row['Country / Company (P2)'] == bridge_match_row['Country']:
central_bridge_file.loc[central_update_index, 'Code'] = \
bridge_match_row['Code']
central_bridge_file.loc[central_update_index, 'Region'] = bridge_match_row[
'Region']
Can someone help me in how can I write a lambda statement or something that could do it in mins?
itertuplesinstead, it's a very small change code wise but usually results in quite good speed up