Add rows in a dataframe based on a elements in a list

Question

I have the below list & dataframe:

lis = ["Color", "Material", "Handle"]

Code for dataframe:

data = [[1, 'Color', 'Yellow', 'SourceA'],
        [1, 'Material', 'Plastic', 'SourceA'],
       [1, 'Handle', 'Y', 'SourceB'],
       [2, 'Color', 'Blue', 'SourceB'],
       [2, 'Handle', 'N', 'SourceA'],
       [3, 'Color', 'Black', 'SourceA'],
       [3, 'Color', 'Black', 'SourceB'],
       [3, 'Material', 'Steel', 'SourceA']]
  
df_one = pd.DataFrame(data, columns=['Id', 'feature', 'feature_value', 'Source'])

df_one = 

| ID | feature  | feature_value | Source  |
| 1  | Color    | Yellow        | SourceA |
| 1  | Material | Plastic       | SourceA |
| 1  | Handle   | Y             | SourceB |
| 2  | Color    | Blue          | SourceB |
| 2  | Handle   | N             | SourceA |
| 3  | Color    | Black         | SourceA |
| 3  | Color    | Black         | SourceB |
| 3  | Material | Steel         | SourceA |

I need each ID to have all the features listed in "lis". ID1 has "Color", "Material", "Handle" but ID2 does not have "Material" and ID3 does not have "Handle". I need my output to look like the below:

| ID | feature  | feature_value | Source  |
| 1  | Color    | Yellow        | SourceA |
| 1  | Material | Plastic       | SourceA |
| 1  | Handle   | Y             | SourceB |
| 2  | Color    | Blue          | SourceB |
| 2  | Handle   | N             | SourceA |
| 2  | Material | null          | UNK     |
| 3  | Color    | Black         | SourceA |
| 3  | Color    | Black         | SourceB |
| 3  | Material | Steel         | SourceA |
| 3  | Handle   | null          | UNK     |

I tried iterating through the rows of the dataframe and creating a dictionary of each row but because the ID column is not unique, I didn't have a unique key for key & value.

Any help would be appreciated!

Timeless · Accepted Answer · 2023-01-27 23:45:22Z

1

If you're sure that at least one ID has the three values of lis, DataFrame.complete is for you.

#pip install pyjanitor
import janitor

out = df_one.complete("Id", "feature", fill_value={"Source": "UNK"})

Output :

df_one.groupby("Id")["feature"].agg(set).eq((lis)).any()
#True

print(out)

   Id   feature feature_value   Source
0   1     Color        Yellow  SourceA
1   1  Material       Plastic  SourceA
2   1    Handle             Y  SourceB
3   2     Color          Blue  SourceB
4   2  Material           NaN      UNK
5   2    Handle             N  SourceA
6   3     Color         Black  SourceA
7   3     Color         Black  SourceB
8   3  Material         Steel  SourceA
9   3    Handle           NaN      UNK

answered Jan 27, 2023 at 23:45

Timeless

38.3k6 gold badges33 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Add rows in a dataframe based on a elements in a list

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related