I have the following detailed DataFrame:
source:
df_detailed = pd.DataFrame([
["Fail", "P1", "3 Failed Partition","X001, X002, X003"],
["Fail","P1","Late Backup","Late Backup"],
["Fail","P1","2 Failed Partition","X001, X002"],
["Fail","P2","2 Failed Partition","X001, X002"],
["Fail","P2","Late Backup","Late Backup"],
["Warn","P2","Huge Size","1GB"],
["Warn","P2","Huge Size","2GB"]
], columns = ["Severity", "Partition", "Status", "Comment"])
output:
Severity Partition Status Comment
0 Fail P1 3 Failed Partition X001, X002, X003
1 Fail P1 Late Backup Late Backup
2 Fail P1 2 Failed Partition X001, X002
3 Fail P2 2 Failed Partition X001, X002
4 Fail P2 Late Backup Late Backup
5 Warn P2 Huge Size 1GB
6 Warn P2 Huge Size 2GB
I would like to group and aggregate this and get the below result:
Result:
Partition Status
0 P1 3 Failed Partition, 2 Late Backup
1 P2 2 Failed Partition, 1 Late Backup, 2 Warn
Note:
The keywords "Late Backup", "Failed Partition", "Huge Size" are static and would not change.
All severity with "Fail" should have granular information in the summary DataFrame.
All other severity like "Warning", "Info" ...etc should only contain the count of the Severity as put in expected result example
Failed Partition in the Detailed DataFrame is prefixed with the count of Failures, however in the Summary for each partition(i.e P1, P2) the count of the unique values of partitions need to be present in the summary DataFrame
Can someone please help, I've been sleepless with this for 2 days now :(