Transforming polars Dataframe to Nested JSON Format

Question

I have a dataframe that contains a product name, question, and answers. I would like to process the dataframe and transform it into a JSON format. Each product should have nested sections for questions and answers.

My dataframe:

import polars as pl

df = pl.DataFrame({
    "Product": ["X", "X", "Y", "Y"],
    "Question": ["Q1", "Q2", "Q3", "Q4"],
    "Anwers": ["A1", "A2", "A3", "A4"],
})

Desired Output:

{
    "faqByCommunity": {
        "id": 5,
        "communityName": "name",
        "faqList": [
            {
                "id": 1,
                "product": "X",
                "faqs": [
                    {
                        "id": 1,
                        "question": "Q1",
                        "answer": "A1"
                    },
                    {
                        "id": 2,
                        "question": "Q2",
                        "answer": "A2"
                    }

                ]
            },
            {
                "id": 2,
                "product": "Y",
                "faqs": [
                    {
                        "id": 1,
                        "question": "Q3",
                        "answer": "A3"
                    },
                    {
                        "id": 2,
                        "question": "Q4",
                        "answer": "A4"
                    }

                ]
            }
        ]
    }
}

Since the first part it's static , i think i could append it to the file before and after polars writes to it (Like my other question ). However, im not sure how can i work with the nested part

Because i got the table from a excel, so i wanted to use it as a dataframe — Simon
– Simon, Commented Mar 11 at 18:19

jqurious · Accepted Answer · 2025-03-12 16:12:12Z

You could do some of the reshaping in Polars first.

faq_list = (
    df.group_by("product", maintain_order=True)
      .agg(faqs=pl.struct(pl.int_range(pl.len()).alias("id") + 1, pl.exclude("product")))
      .with_row_index("id", offset=1)
      #.to_struct()
      #.to_list()
)

shape: (2, 3)
┌─────┬─────────┬────────────────────────────────┐
│ id  ┆ product ┆ faqs                           │
│ --- ┆ ---     ┆ ---                            │
│ u32 ┆ str     ┆ list[struct[3]]                │
╞═════╪═════════╪════════════════════════════════╡
│ 1   ┆ X       ┆ [{1,"Q1","A1"}, {2,"Q2","A2"}] │
│ 2   ┆ Y       ┆ [{1,"Q3","A3"}, {2,"Q4","A4"}] │
└─────┴─────────┴────────────────────────────────┘

With the to_struct/list uncommented:

[{'id': 1,
  'product': 'X',
  'faqs': [{'id': 1, 'question': 'Q1', 'answer': 'A1'},
   {'id': 2, 'question': 'Q2', 'answer': 'A2'}]},
 {'id': 2,
  'product': 'Y',
  'faqs': [{'id': 1, 'question': 'Q3', 'answer': 'A3'},
   {'id': 2, 'question': 'Q4', 'answer': 'A4'}]}]

You could then add the static parts and pretty-print it with json.dumps

print(
    json.dumps({
        "faqByCommunity": {
            "id": 5,
            "communityName": "name",
            "faqList": faq_list 
        }
    }, indent=4)
)

You could also add the static parts with Polars if you really wanted to.

print(
    json.dumps(
        (df.group_by("product", maintain_order=True)
           .agg(
                faqs = pl.struct(
                    pl.int_range(pl.len()).alias("id") + 1, 
                    pl.exclude("product")
                )
           )
           .with_row_index("id", offset=1)
           .select(
               pl.struct(
                   faqByCommunity = pl.struct(
                       id = 5,  
                       communityName = pl.lit("name"), 
                       faqList = pl.struct(pl.all()).implode()
                   )
               )
           )
           .item()
        ),
        indent = 4
    )
)

JonSG · Accepted Answer · 2025-03-11 18:49:23Z

2

Not knowing more about the amount of data you have, I would probably just use iter_rows() over the data frame and build the resulting dictionary by hand rather than try to do something more nuanced in polars, but then again I am not a polars expert but from what I see polars does not support a great deal of flexibility with to_json().

Something like:

import polars as pl

df = pl.DataFrame({
    "Product": ["X", "X", "Y", "Y"],
    "Question": ["Q1", "Q2", "Q3", "Q4"],
    "Anwers": ["A1", "A2", "A3", "A4"],
})

## ---------------
## Cluster rows by product 
## ---------------
product_data = {}
for row in df.iter_rows():
    product_data.setdefault(row[0], []).append(row[1:])
## ---------------

## ---------------
## Build the results dictionary using the clustered data
## and nested list comprehensions
## ---------------
results = {
    "faqByCommunity": {
        "id": 5,
        "communityName": "name",
        "faqList": [
            {
                "id": product_index,
                "product": product,
                "faqs": [
                    {
                        "id": qna_index,
                        "question": question,
                        "answer": answer
                    }
                    for qna_index, (question, answer) in enumerate(qnas, start=1)
                ]
            }
            for product_index, (product, qnas) in enumerate(product_data.items(), start=1)
        ]
    }
}
## ---------------

## ---------------
## Display the results
## ---------------
import json
print(json.dumps(results, indent=4))
## ---------------

Should give you the result you stated.

answered Mar 11 at 18:49

JonSG

13.6k2 gold badges32 silver badges48 bronze badges

2 Comments

jqurious Mar 11 at 19:20

.rows_by_key("product", named=True) does the defaultdict stuff for you which may help a little.

Simon Mar 11 at 20:14

@jqurious Can you show me an example? I just tried with product_data = (mdp_2.rows_by_key("Producto", named=True))

Collectives™ on Stack Overflow

Transforming polars Dataframe to Nested JSON Format

2 Answers 2

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related