I have the below list & dataframe:
lis = ["Color", "Material", "Handle"]
Code for dataframe:
data = [[1, 'Color', 'Yellow', 'SourceA'],
[1, 'Material', 'Plastic', 'SourceA'],
[1, 'Handle', 'Y', 'SourceB'],
[2, 'Color', 'Blue', 'SourceB'],
[2, 'Handle', 'N', 'SourceA'],
[3, 'Color', 'Black', 'SourceA'],
[3, 'Color', 'Black', 'SourceB'],
[3, 'Material', 'Steel', 'SourceA']]
df_one = pd.DataFrame(data, columns=['Id', 'feature', 'feature_value', 'Source'])
df_one =
| ID | feature | feature_value | Source |
| 1 | Color | Yellow | SourceA |
| 1 | Material | Plastic | SourceA |
| 1 | Handle | Y | SourceB |
| 2 | Color | Blue | SourceB |
| 2 | Handle | N | SourceA |
| 3 | Color | Black | SourceA |
| 3 | Color | Black | SourceB |
| 3 | Material | Steel | SourceA |
I need each ID to have all the features listed in "lis". ID1 has "Color", "Material", "Handle" but ID2 does not have "Material" and ID3 does not have "Handle". I need my output to look like the below:
| ID | feature | feature_value | Source |
| 1 | Color | Yellow | SourceA |
| 1 | Material | Plastic | SourceA |
| 1 | Handle | Y | SourceB |
| 2 | Color | Blue | SourceB |
| 2 | Handle | N | SourceA |
| 2 | Material | null | UNK |
| 3 | Color | Black | SourceA |
| 3 | Color | Black | SourceB |
| 3 | Material | Steel | SourceA |
| 3 | Handle | null | UNK |
I tried iterating through the rows of the dataframe and creating a dictionary of each row but because the ID column is not unique, I didn't have a unique key for key & value.
Any help would be appreciated!