I'm working on a data cleaning task and could use some help. I have two CSV files with thousands of rows each:
File A contains product shipment records. File B contains product descriptions and categories. Here’s a simplified example:
File A (shipments.csv):
shipment_id,product_code,quantity,date
S001,P123,10,2025-07-01
S002,P456,5,2025-07-02
S003,P789,8,2025-07-03
File B (products.csv):
product_code,description,category
P123,Widget A,Tools
P456,Widget B,Hardware
I want to create a merged file where each row from File A is enriched with the matching product description and category from File B (based on product_code). If there's no match, I’d like to keep the row from File A and fill the missing columns with "N/A".
Expected Output:
shipment_id,product_code,quantity,date,description,category
S001,P123,10,2025-07-01,Widget A,Tools
S002,P456,5,2025-07-02,Widget B,Hardware
S003,P789,8,2025-07-03,N/A,N/A
I tried using pandas.merge() in Python but it drops unmatched rows unless I use how='left', and I’m not sure how to fill missing values properly.
Any help? Thanks in advance!