1

I have the following detailed DataFrame:

source:

df_detailed = pd.DataFrame([
    ["Fail", "P1", "3 Failed Partition","X001, X002, X003"],
    ["Fail","P1","Late Backup","Late Backup"],
    ["Fail","P1","2 Failed Partition","X001, X002"],
    ["Fail","P2","2 Failed Partition","X001, X002"],
    ["Fail","P2","Late Backup","Late Backup"],
    ["Warn","P2","Huge Size","1GB"],
    ["Warn","P2","Huge Size","2GB"]
], columns = ["Severity", "Partition", "Status", "Comment"])

output:

  Severity Partition              Status           Comment
0     Fail        P1  3 Failed Partition  X001, X002, X003
1     Fail        P1         Late Backup       Late Backup
2     Fail        P1  2 Failed Partition        X001, X002
3     Fail        P2  2 Failed Partition        X001, X002
4     Fail        P2         Late Backup       Late Backup
5     Warn        P2           Huge Size               1GB
6     Warn        P2           Huge Size               2GB

I would like to group and aggregate this and get the below result:

Result:

  Partition                                     Status
0        P1          3 Failed Partition, 2 Late Backup
1        P2  2 Failed Partition, 1 Late Backup, 2 Warn

Note:

  1. The keywords "Late Backup", "Failed Partition", "Huge Size" are static and would not change.

  2. All severity with "Fail" should have granular information in the summary DataFrame.

  3. All other severity like "Warning", "Info" ...etc should only contain the count of the Severity as put in expected result example

  4. Failed Partition in the Detailed DataFrame is prefixed with the count of Failures, however in the Summary for each partition(i.e P1, P2) the count of the unique values of partitions need to be present in the summary DataFrame

Can someone please help, I've been sleepless with this for 2 days now :(

2
  • By the way there is 1 Late Backup in both P1 and P2 Commented Oct 16, 2019 at 20:19
  • Yes correct, And if there are more Late Backups it should aggregate as 2 Late Backup / 3 Late Backup Commented Oct 17, 2019 at 4:52

1 Answer 1

1

Thank you for interesting task, The problem is solved find the solution below and follow comments, feel free to ask questions.

import pandas as pd
from collections import Counter

df_detailed = pd.DataFrame([
    ["Fail", "P1", "3 Failed Partition", "X001, X002, X003"],
    ["Fail", "P1", "Late Backup", "Late Backup"],
    ["Fail", "P1", "2 Failed Partition", "X001, X002"],
    ["Fail", "P2", "2 Failed Partition", "X001, X002"],
    ["Fail", "P2", "Late Backup", "Late Backup"],
    ["Warn", "P2", "Huge Size", "1GB"],
    ["Warn", "P2", "Huge Size", "2GB"]
], columns=["Severity", "Partition", "Status", "Comment"])


def change_warn(severity, status):
    """To create a new column where we remove real Status with just Warn message"""
    if severity == "Warn":
        return "Warn"
    else:
        return status


df_detailed["Status"] = df_detailed.apply(lambda row: change_warn(row["Severity"], row["Status"]), axis=1)


def remove_leading_digits(x):
    if x[0].isdigit():
        x = " ".join(x.split(" ")[1:])
    return x


df_detailed["Status"] = df_detailed["Status"].apply(lambda x: remove_leading_digits(x))

df_detailed["Comment"] = df_detailed["Comment"].apply(lambda x: x + ",")  # we need it since we will sum the columns then

# need to combine to distinguish P1 from P2:
df_detailed["TempStatus"] = df_detailed["Partition"] + " " + df_detailed["Status"]

gr_b = df_detailed[["Partition", "TempStatus", "Comment"]].groupby("TempStatus").sum()


def calculate_unique_comment(status, comment):
    comments = []
    if status.endswith("Failed Partition"):
        for c in comment.split(","):
            if c != "":
                comments.append(c.strip())
        counter = Counter(comments)
        return str(len(counter.keys()))
    else:
        return str(0)


del gr_b["Partition"]  # do not need it

gr_b = gr_b.reset_index()  # otherwise get problem

gr_b["CountUnCom"] = gr_b.apply(lambda row: calculate_unique_comment(row["TempStatus"], row["Comment"]), axis=1)

# let's find of unique comments per Partion for Failed partition and put them in dict
part_dict = {}
for i in range(len(gr_b)):
    if gr_b["TempStatus"][i].endswith("Failed Partition"):
        part_dict[gr_b["TempStatus"][i]] = gr_b["CountUnCom"][i]


# let's take only what we need to work with
df_small = pd.DataFrame(df_detailed[["Partition", "Status"]])

df_small["Status"] = df_small["Status"].apply(lambda x: x + ",")  # to sum and split later

gr_df_small = df_small.groupby("Partition").sum()

gr_df_small = gr_df_small.reset_index()


def convert_status_to_list(status):
    new_status = []
    for c in status.split(","):
        if c != "":
            new_status.append(c.strip())
    return new_status


gr_df_small["Status"] = gr_df_small["Status"].apply(lambda x: convert_status_to_list(x))


def calculate_status(partition, status, x):
    result = []
    for k, v in Counter(status).items():
        if k == "Failed Partition":
            v = x[partition + " " + "Failed Partition"]
        result.append(f"{v} {k}")
    return " ".join(result)


gr_df_small["Status"] = gr_df_small.apply(lambda row: calculate_status(row["Partition"], row["Status"], part_dict),  axis=1)


print(gr_df_small)

Output:

  Partition                                   Status
0        P1         3 Failed Partition 1 Late Backup
1        P2  2 Failed Partition 1 Late Backup 2 Warn

Sign up to request clarification or add additional context in comments.

6 Comments

@MagnumCodus I was able to solve the issue, do not forget to upvote it and mark as solution ;)
@MagnumCodus hello, did you caheck the answer?
Sorry I couldn't check this earlier... But I am getting an error like this: result.append(f"{v} {k}") SyntaxError: invalid syntax
@MagnumCodus I guess that you use Python version which do not know f strings, instead of f"{v} {k} use "{} {}".format(v, k) do not forgey space between two {} (not inside!)
brilliant it works!!!. Let me try it out on a couple of scenarios. Thank you very much!!! :) Which version of python are you using by the way?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.