1

I have a large pandas dataframe with numerical values structured like this:

>>> df1
   A  B  C
0  2  1  2
1  1  2  3
2  2  3  1

I need to replace all of the the above cell values with a 'description' that maps to the field name and cell value as referenced in another dataframe structured like this:

>>> df2
  field_name  code description
0          A     1          NO
1          A     2         YES
2          A     3       MAYBE
3          B     1           x
4          B     2           y
5          B     3           z
6          C     1        GOOD
7          C     2         BAD
8          C     3        FINE

The desired output would be like:

>>> df3
     A  B     C
0  YES  x   BAD
1   NO  y  FINE
2  YES  z  GOOD

I could figure out a way to do this on a small scale using something like .map or .replace - however the actual datasets contain thousands of records with hundreds of different combinations to replace. Any help would be really appreciated.

Thanks.

5
  • Just FYI, you posted two times the same image! Commented Apr 15, 2020 at 16:01
  • 1
    please copy and paste your dataframe, we can copy it with pd.read_clipboard. Also I think you could show the expected output ( for example dataframe) Commented Apr 15, 2020 at 16:01
  • What is the expected dataframe? Commented Apr 15, 2020 at 16:07
  • Sorry all - i've now pasted the dataframes in and included the desired output - thanks! Commented Apr 15, 2020 at 16:19
  • Please provide the data in a convenient format. See stackoverflow.com/questions/20109391/…. Commented Apr 15, 2020 at 17:43

2 Answers 2

1

Use DataFrame.replace with DataFrame.pivot:

df1 = df1.replace(df2.pivot(columns='field_name', index='code', values='description')
                     .to_dict())

maybe you need select columns previously:

df1[cols] = df1[cols].replace(df2.pivot(columns='field_name',
                                        index='code', values='description')
                                 .to_dict())

Output

print(df1)
     A  B     C
0  YES  x   BAD
1   NO  y  FINE
2  YES  z  GOOD
Sign up to request clarification or add additional context in comments.

1 Comment

Ah looks like this might've worked perfectly! Will try it on the full dataset
0

You can unstack df1, merge with df2 and pivot the result:

df3 = df1.stack().reset_index().rename(
    columns={'level_1': 'field_name', 0: 'code'}).merge(
        df2, 'left', on=['field_name', 'code']).pivot(
            index='level_0', columns='field_name',
            values='description').rename_axis(None).rename_axis(None, axis=1)

1 Comment

isn't this clearly slower? I think there are too many changes in the format

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.