How to reshape DataFrame for nested JSON output

Question

I am starting to work with some data manipulation and I need to create a new file (with new features) out of an old one. However, I could not realize how can I customize my own dataframe before using a ".to_json" method.

For example, I have a .csv as:

seller, customer, product, price
Roger, Will, 8129, 30
Roger, Markus, 1234, 100
Roger, Will, 2334, 50
Mike, Markus, 2295, 20
Mike, Albert, 1234, 100

...and I want to generate a .json file to support me in visualizing a network out of it. This should be more or less like:

{
"node": [
      {"id":"Roger", "group": "seller" },
      {"id":"Mike", "group": "seller" },
      {"id":"Will", "group": "customer" },
      {"id":"Markus", "group": "customer" },
      {"id":"Albert", "group": "customer" }
],
"links":[
      {"source":"Roger","target":"Will","product":8129,"price":30},
      #...and so on
]
}

I tried to do something like:

df1 = pd.read_csv('file.csv')
seller_list = df1.seller.unique()
customer_list = df1.customer.unique()

..and I could get indeed lists with unique items. However, I could not find how I should add them in a dataframe in order to create an structure such as:

"node":[
      ...
      {"id":"Mike", "group": "seller" },
      {"id":"Markus", "group": "customer" },
      ...
]...#see above

Any support or hint on this is appreciated.

cs95 · Accepted Answer · 2018-01-08 22:02:04Z

2

This will be a two step process. First, create the nodes dict using melt + drop_duplicates +to_dict -

nodes = df[['customer', 'seller']]\
           .melt(var_name='group', value_name='id')\
           .drop_duplicates()\
           .to_dict('r')

Now, create the links dict using rename + to_dict

links = df.rename(columns={'seller' : 'source', 'customer' : 'target'}).to_dict('r')

Now, combine the data into one dictionary, and dump it as JSON to a file.

data = {'nodes' : nodes, 'links' : links}

with open('data.json', 'w') as f:
    json.dump(data, f, indent=4)

Your data.json file should look like this -

{
    "nodes": [
        {
            "id": "Will",
            "group": "customer"
        },
        {
            "id": "Markus",
            "group": "customer"
        },
        {
            "id": "Albert",
            "group": "customer"
        },
        {
            "id": "Roger",
            "group": "seller"
        },
        {
            "id": "Mike",
            "group": "seller"
        }
    ],
    "links": [
        {
            "product": 8129,
            "target": "Will",
            "source": "Roger",
            "price": 30
        },
        {
            "product": 1234,
            "target": "Markus",
            "source": "Roger",
            "price": 100
        },
        {
            "product": 2334,
            "target": "Will",
            "source": "Roger",
            "price": 50
        },
        {
            "product": 2295,
            "target": "Markus",
            "source": "Mike",
            "price": 20
        },
        {
            "product": 1234,
            "target": "Albert",
            "source": "Mike",
            "price": 100
        }
    ]
}

answered Jan 8, 2018 at 22:02

cs95

406k106 gold badges745 silver badges798 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Roger A. Leite Over a year ago

Hello COLDSPEED, your answer looks great! Thanks a lot!! I would have another tiny question for you, if possible. What if I want to exclude one of the dimensions from the "links" half. For example, exclude "product" and leave just "target", "source", and "price". What method can I use in order to reduce this dimensionality?

cs95 Over a year ago

@RogerAlmeidaLeite before computing links, do: df = df.drop("product",1)

Collectives™ on Stack Overflow

How to reshape DataFrame for nested JSON output

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related