1

I've been wresting with this for many days now and would appreciate any help.

I'm importing an Excel file to a Pandas data frame resulting in the following dataframe [record]:

account_id name timestamp value
A0001C Fund_1 1588618800000000000 1
B0001B Dev_2 1601578800000000000 1

I'm looking to produce a nested JSON output (will be used to submit data to an API), include adding a records and metric labels for the arrays.

Here is the output i'm looking for:

{
  "records": [
    {
      "name": "Fund_1", 
      "account_id": "A0001C", 
      "metrics": [
        {
          "timestamp": 1588618800000000000, 
          "value": 1
        }
      ]
    }
    {
      "name": "Dev_2", 
      "account_id": "B0001B", 
      "metrics": [
        {
          "timestamp": 1601578800000000000, 
          "value": 1
        } 
      ]
    }
  ]
}

I've gotten an output of a none nested JSON data set, but not able split out the timestamp and value to add the metrics part.

for record in df.to_dict(orient='records'):

        record_data = {'records': [record]}
        payload_json = json.dumps(record_data)
        print(payload_json)

I get the following output:

{"records": [{"account_id": "A0001C", "name": "Fund_1", "Date Completed": 1588618800000000000, "Count": "1"}]}
{"records": [{"account_id": "B0001B", "name": "Dev_2", "Date Completed": 1601578800000000000, "Count": "1"}]}

Any help on how i can modify my code to add the metrics label and nest the data.

Thanks in advance.

2
  • is the data particularly large or are there any constraints which make adding an auxiliary column/using map infeasible? Commented Jan 26, 2021 at 19:39
  • the data is 4000 row in the Excel sheet. I'm a real novice at coding so not sure on feasibilities of using map. I can add columns to the dataframe, but I'm still ot able to nest the JSON output Commented Jan 26, 2021 at 20:22

1 Answer 1

3

One approach is through the use of pd.apply. This allows you to apply a function to series (either column- or row-wise) in your dataframe.

In your particular case, you want to apply the function row-by-row, so you have to use apply with axis=1:

records = list(df.apply(lambda row: {"name": row["name"],
                                     "account_id": row["account_id"],
                                     "metrics": [{
                                         "timestamp": row["timestamp"],
                                         "value": row["value"]}]
                                    },
                        axis=1).values)
payload = {"records": records}

Alternatively, you could introduce an auxiliary column "metrics" in which you store your metrics (subsequently applying pd.to_json):

df["metrics"] = df.apply(lambda e: [{"timestamp": e.timestamp,
                                     "value": e.value}], 
                         axis=1)
records = df[["account_id", "name", "metrics"]].to_dict(orient="records")
payload = {"records": records}

Here's a full example applying option 2:

import io
import json

import pandas as pd

data = io.StringIO("""account_id    name    timestamp   value
A0001C  Fund_1  1588618800000000000 1
B0001B  Dev_2   1601578800000000000 1""")

df = pd.read_csv(data, sep="\t")

df["metrics"] = df.apply(lambda e: [{"timestamp": e.timestamp,
                                     "value": e.value}], 
                         axis=1)
records = df[["account_id", "name", "metrics"]].to_dict(orient="records")
payload = {"records": records}
print(json.dumps(payload, indent=4))

Output:

{
    "records": [
        {
            "account_id": "A0001C",
            "name": "Fund_1",
            "metrics": [
                {
                    "timestamp": 1588618800000000000,
                    "value": 1
                }
            ]
        },
        {
            "account_id": "B0001B",
            "name": "Dev_2",
            "metrics": [
                {
                    "timestamp": 1601578800000000000,
                    "value": 1
                }
            ]
        }
    ]
}


Edit: The second approach also makes grouping by accounts (in case you want to do that) rather easy. Below is a small example and output:

import io
import json

import pandas as pd

data = io.StringIO("""account_id    name    timestamp   value
A0001C  Fund_1  1588618800000000000 1
A0001C  Fund_1  1588618900000000000 2
B0001B  Dev_2   1601578800000000000 1""")

df = pd.read_csv(data, sep="\t")

# adding the metrics column as above
df["metrics"] = df.apply(lambda e: {"timestamp": e.timestamp,
                                    "value": e.value}, 
                         axis=1)

# group metrics by account
df_grouped = df.groupby(by=["name", "account_id"]).metrics.agg(list).reset_index()
records = df_grouped[["account_id", "name", "metrics"]].to_dict(orient="records")
payload = {"records": records}
print(json.dumps(payload, indent=4))

Output:

{
    "records": [
        {
            "account_id": "B0001B",
            "name": "Dev_2",
            "metrics": [
                {
                    "timestamp": 1601578800000000000,
                    "value": 1
                }
            ]
        },
        {
            "account_id": "A0001C",
            "name": "Fund_1",
            "metrics": [
                {
                    "timestamp": 1588618800000000000,
                    "value": 1
                },
                {
                    "timestamp": 1588618900000000000,
                    "value": 2
                }
            ]
        }
    ]
}
Sign up to request clarification or add additional context in comments.

6 Comments

Sim, thank you for your help and solution. one slight issue still is that the second account entry has both timestamp / value included and the value seems to have increased. I only need to have the timestamp / value data relating to the account_id. Suggestions would be appreciated.
@GrahamMcD: If you use the code before the edit, that will just produce single (timestamp, value)-entries for each record.
thank you again i needed to add a '[' to the metrics line "metrics": [ - Now works great. Thank you again for your help, much appreciated.
@GrahamMcD: You are right, sorry, didn't catch that.
it would be great to know how option 2 above could be made to work, so as to only include the 1 instance of timestamp. Just for knowledge, if you have any thoughts.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.