0

I have a large dataframe with customer data. I then have two lists with unique customer IDs and unique backup values. Both lists are the same length. I want a quicker solution to override a column in the dataset if the unique ID is in the list.

Here is what I am currently doing:

for unique_id, overrides_id in zip([list of unique_ids], [list of override ids]):
    df.loc[df['Unique_ID_Column'] == unique_id, 'unique_id_backup'] = override_id
4
  • I'm not a pandas expert, but I think you can create a df from the two lists, then do some kind of merge of that df with the original df. Commented Jul 30, 2024 at 23:37
  • I assume that many rows in the dataframe could have the same unique id but the lists have, um, unique unique ids? Commented Jul 30, 2024 at 23:52
  • 1
    Why is overrides_id unused? Commented Jul 31, 2024 at 1:18
  • 1
    Don't do that : use groupby instead. There are plenty of examples, especially here on stack-overflow. Note you can build an index from the grouped dataframes. Here is an example. Commented Jul 31, 2024 at 8:58

1 Answer 1

1

I assume there is a typo in the question and the assignment should be

df.loc[df['Unique_ID_Column'] == unique_id, 'unique_id_backup'] = overrides_id

I believe you could use the Series.map method. First you would set up a mapper dictionary {keys:values} where keys are elements from the unique_id list and values are overrides_id. Then you can use that mapper to override the values accordingly. You did not mention what to do with rows where the customer ID is not present in unique_id list.

df['unique_id_backup'] = df['Unique_ID_Column'].map(mapper)

This should do what you're after, replacing missing IDs with NaNs in the 'unique_id_backup' column. If you wanted to keep the original ID in case it's missing, you can do:

df['unique_id_backup'] = df['Unique_ID_Column'].map(lambda x:mapper.get(x,x))

Worked example:

import pandas as pd
df = pd.DataFrame(
    {'Unique_ID_Column': [1, 2, 3, 4, 5],
     'Some customer data': ['A', 'B', 'C', 'D', 'E']}
    )
unique_id = [1, 3, 5]
override_id = [10, 30, 50]
mapper = dict(zip(unique_id, override_id))
df['unique_id_backup'] = df['Unique_ID_Column'].map(lambda x: mapper.get(x, x))
>>>    Unique_ID_Column Some customer data  unique_id_backup
0                1                  A                10
1                2                  B                 2
2                3                  C                30
3                4                  D                 4
4                5                  E                50
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.