2

I've been trying to solve this in a clean manner, but have been failing remarkably on this task.

Going strait to the point... I have 2 dataframes:

df1
     anID   siID   daID
0       A      0      0
1       D      0      0
2       E      0      0
3       F      0      0
4       G      A      D
5       H      E      D
6       B      0      0
7       I      E      M
8       L      F      0
9       M      B      H
10      N      A      D

df2
   map_id  renum_id
0       A         1
1       C         2
2       B         3
3       E         4
4       D         5
5       F         6
6       Q         7
7       V         8
8       H         9
9       G        10
10      N        11

Basically, df1 is a dataframe with identification codes, df2$renum_id has the new codes to replace the existing ones and df2$map_id is the map which to guide the recoding of df1's columns.

Now I need a clean and light/fast way to get the corresponding renum_id value (using map_id as a key) for all 3 columns in df1 (anID, siID, daID). Basically, I want it to give me the corresponding renum_ID for each letter, and keep the 0 (zero) for zeros in df1.

So, I need a df3 like this:

     anID   siID   daID  anID/  siID/  daID/
0       A      G      0      1     10      0   
1       D      E      A      5      4      1      
2       E      0      0      4      0      0
3       F      B      D      6      3      5
4       G      A      D     10      1      5
5       H      E      D      9      4      5          
6       B      0      0      2      0      0     
7       N      A      D     11      1      5

It's probably something really simple, but I've been trying with pd.merge and couldn't come up with something to solve my problem.

Dataframes have something around hundreds of thousands of lines, that's why I insist in the need of a "light" way to solve it.

Thanks everyone in advance.

1 Answer 1

1

I think you can first create dictionary with df2 by to_dict and then replace values in df1 by dictionary d. Last concat new DataFrame df3 to df1 by columns:

d = df2.set_index('map_id').to_dict()
print d['renum_id']
{'A': 1, 'C': 2, 'B': 3, 'E': 4, 'D': 5, 'G': 10, 'F': 6, 'H': 9, 'N': 11, 'Q': 7, 'V': 8}

df3 = df1.replace(d['renum_id'])
df3.columns = [col + '/' for col in df3.columns]
print df3

   anID/ siID/ daID/
0      1     0     0
1      5     0     0
2      4     0     0
3      6     0     0
4     10     1     5
5      9     4     5
6      3     0     0
7      I     4     M
8      L     6     0
9      M     3     9
10    11     1     5

print pd.concat([df1, df3], axis=1)

   anID siID daID anID/ siID/ daID/
0     A    0    0     1     0     0
1     D    0    0     5     0     0
2     E    0    0     4     0     0
3     F    0    0     6     0     0
4     G    A    D    10     1     5
5     H    E    D     9     4     5
6     B    0    0     3     0     0
7     I    E    M     I     4     M
8     L    F    0     L     6     0
9     M    B    H     M     3     9
10    N    A    D    11     1     5
Sign up to request clarification or add additional context in comments.

2 Comments

Awsome! That's just what I needed !!
Glad can help you! Good luck!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.