0

I am starting to work with some data manipulation and I need to create a new file (with new features) out of an old one. However, I could not realize how can I customize my own dataframe before using a ".to_json" method.

For example, I have a .csv as:

seller, customer, product, price
Roger, Will, 8129, 30
Roger, Markus, 1234, 100
Roger, Will, 2334, 50
Mike, Markus, 2295, 20
Mike, Albert, 1234, 100

...and I want to generate a .json file to support me in visualizing a network out of it. This should be more or less like:

{
"node": [
      {"id":"Roger", "group": "seller" },
      {"id":"Mike", "group": "seller" },
      {"id":"Will", "group": "customer" },
      {"id":"Markus", "group": "customer" },
      {"id":"Albert", "group": "customer" }
],
"links":[
      {"source":"Roger","target":"Will","product":8129,"price":30},
      #...and so on
]
}

I tried to do something like:

df1 = pd.read_csv('file.csv')
seller_list = df1.seller.unique()
customer_list = df1.customer.unique()

..and I could get indeed lists with unique items. However, I could not find how I should add them in a dataframe in order to create an structure such as:

"node":[
      ...
      {"id":"Mike", "group": "seller" },
      {"id":"Markus", "group": "customer" },
      ...
]...#see above

Any support or hint on this is appreciated.

1 Answer 1

2

This will be a two step process. First, create the nodes dict using melt + drop_duplicates +to_dict -

nodes = df[['customer', 'seller']]\
           .melt(var_name='group', value_name='id')\
           .drop_duplicates()\
           .to_dict('r')

Now, create the links dict using rename + to_dict

links = df.rename(columns={'seller' : 'source', 'customer' : 'target'}).to_dict('r')

Now, combine the data into one dictionary, and dump it as JSON to a file.

data = {'nodes' : nodes, 'links' : links}

with open('data.json', 'w') as f:
    json.dump(data, f, indent=4)

Your data.json file should look like this -

{
    "nodes": [
        {
            "id": "Will",
            "group": "customer"
        },
        {
            "id": "Markus",
            "group": "customer"
        },
        {
            "id": "Albert",
            "group": "customer"
        },
        {
            "id": "Roger",
            "group": "seller"
        },
        {
            "id": "Mike",
            "group": "seller"
        }
    ],
    "links": [
        {
            "product": 8129,
            "target": "Will",
            "source": "Roger",
            "price": 30
        },
        {
            "product": 1234,
            "target": "Markus",
            "source": "Roger",
            "price": 100
        },
        {
            "product": 2334,
            "target": "Will",
            "source": "Roger",
            "price": 50
        },
        {
            "product": 2295,
            "target": "Markus",
            "source": "Mike",
            "price": 20
        },
        {
            "product": 1234,
            "target": "Albert",
            "source": "Mike",
            "price": 100
        }
    ]
}
Sign up to request clarification or add additional context in comments.

2 Comments

Hello COLDSPEED, your answer looks great! Thanks a lot!! I would have another tiny question for you, if possible. What if I want to exclude one of the dimensions from the "links" half. For example, exclude "product" and leave just "target", "source", and "price". What method can I use in order to reduce this dimensionality?
@RogerAlmeidaLeite before computing links, do: df = df.drop("product",1)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.