0

I have a dataframe (df1) that contains information about project_id, cost_center and other features:

project_id cost_center month
101 8575 3
321 8597 4
321 8597 2
Nan 8522 1

Sometimes the project_id is not included there, so is Nan, and for these cases I have a "mapping table" (df2) that indicates the project_id that should be associated to that cost center:

project_id cost_center
832 8522

So in my example, I should be able to replace the Nan in df1 for a 832. It means, I should replace the project_id in df1, when the cost_center is in df2.

I tried the following code, but is not working. It says "Length of values (0) does not match length of index (565)" I think because df1 and df2 have different sizes

df['project_id'] = df_mapping[df['cost_center'].isin(df_mapping['cost_center'])]['project_id'].values
3
  • 1
    df1['project'_id'] = df1['project'_id'].fillna(df1['cost_center'].map(df2.set_index('cost_center')['project_id'])) Commented Sep 28, 2022 at 13:32
  • @mozway that should be an answer, not a comment. Commented Sep 28, 2022 at 13:34
  • @alec_djinn probably a duplicate somewhere, but sure, added as answer Commented Sep 28, 2022 at 13:40

2 Answers 2

1

one way to do this is to merge the 2 DFs and then use fillna() to create a new (output) column. Hope it helps!

df_1 = pd.DataFrame({"id":[1,2,3,None],"center":[5,6,7,8]},index=["a","b","c","d"])
df_2 = pd.DataFrame({"id":[4],"center":[8]},index=["g"])
df_merge = df_1.merge(df_2,on="center",how="outer")
df_merge["id_output"] = df_merge["id_x"].fillna(df_merge["id_y"])
df_merge.drop(["id_x","id_y"],inplace=True, axis=1)
Sign up to request clarification or add additional context in comments.

Comments

1

You can use a mapping series and fillna:

df1['project_id'] = (df1['project_id']
                     .fillna(df1['cost_center'].map(df2.set_index('cost_center')['project_id']),
                             downcast='infer'
                            )
                    )

output:

   project_id  cost_center  month
0         101         8575      3
1         321         8597      4
2         321         8597      2
3         832         8522      1

5 Comments

can you explain the downcast='infer', I haven't find documentation about it, and this doesn't seem to be working for me, I'm trying to understand why
@Stephany this is optional, just to convert the floats automatically to integers (downcasting) if no value has a decimal part
@Stephany there is no reason why this approach shouldn't work. Can you provide an example for which it fails for you?
it doesn't replace the Nan values. i'm thinking maybe it has something to do with the data types, in my df1 cost_center is int32 and project_id is object, whereas in df2 both columns are int64, do you think this could be the reason why is not working?
There is no NaN in your example, can you provide a representative minimal example as DataFrame constructor? And yes, mapping/merging requires the same type (float/int is fine but not numeric/string).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.