0

I want to build a JSON file from a CSV to represent hierarchical relations of my data. Relations are parents and children : a child can have one or many parents and a parent can have one or many children. A child can also have children values, multiple levels are possibles. I think a dendrogram like in D3 could be a good visualisation for this.

My CSV source file contains thousands of lines like this :

parent         | children       | date
---------------------------------------------
830010000C0419 | 830010000C1205 | 1993/09/15
830010000C0947 | 830010000C1205 | 1993/09/15
830010000C0948 | 830010000C1205 | 1993/09/15
830010000B0854 | 830010000B1196 | 1994/03/11
830010000B0854 | 830010000B1197 | 1994/03/11
830010000B0721 | 830010000B1343 | 1988/12/05
830010000B1343 | 830010000B1344 | 1988/12/05
830010000B0721 | 830010000B1345 | 1988/12/05
830010000B1345 | 830010000B1344 | 1986/12/05
...

I wan't to generate a JSON file with this structure :

var treeData = [
  {
    "name": "Root",
    "parent": "null",
    "children": [
      {
        "name": "830010000B0854",
        "parent": "Top Level",
        "children": [
          {
            "name": "830010000B1196",
            "parent": "830010000B0854"
          },
          {
            "name": "830010000B1197",
            "parent": "830010000B0854"
          }
        ]
      },
      {
        "name": "830010000B0721",
        "parent": "Top Level",
        "children": [
          {
            "name": "830010000B1343",
            "parent": "830010000B0721",
            "children": [
                {
                "name": "830010000B1344",
                "parent": "830010000B1343"
                }
            ]
          }
        ]
      },
      {
        "name": "830010000C0419",
        "parent": "Top Level",
        "children": [
          {
            "name": "830010000C1205",
            "parent": "830010000C0419"
          }
        ]
      },
      {
        "name": "830010000C0947",
        "parent": "Top Level",
        "children": [
          {
            "name": "830010000C1205",
            "parent": "830010000C0947"
          }
        ]
      },
      {
        "name": "830010000C0948",
        "parent": "Top Level",
        "children": [
          {
            "name": "830010000C1205",
            "parent": "830010000C0948"
          }
        ]
      }
    ]
  }
];

dendrogram

Note that in this example I can't build a relation like one child have many parents, maybe a more complex dendrogram is necessary.

How can I build this kind of structure with Python ?

2
  • Hmm, if a child can have at most one parent, then a hierarchic tree makes sense, and you JSON example can be used. But if a child can have more than one parent, you will end with a non hierarchical graph, and your json structure will not be useable without repeating sub trees, and provided you have no cycle... Commented Jul 16, 2020 at 11:39
  • Ok, and do you know if there is a better way to represent this ? I found this D3 dendrogram but maybe other librairies could be helpfull. Commented Jul 16, 2020 at 11:47

3 Answers 3

2

I would first build a dictionaries of nodes where the key is the node name, and the value is a tuple with a list of parents and a list of children. To have a simpler way to build the tree, I would also keep the set of all top-level nodes (no parents).

From that dict, it is then possible to recursively build a json like data that can be used to build a true json string.

But as what you have shown is not in csv format, I have used re.split to parse the input:

import re

# First the data
t = '''parent         | children       | date
---------------------------------------------
830010000C0419 | 830010000C1205 | 1993/09/15
830010000C0947 | 830010000C1205 | 1993/09/15
830010000C0948 | 830010000C1205 | 1993/09/15
830010000B0854 | 830010000B1196 | 1994/03/11
830010000B0854 | 830010000B1197 | 1994/03/11
830010000B0721 | 830010000B1343 | 1988/12/05
830010000B1343 | 830010000B1344 | 1988/12/05
'''

rx = re.compile(r'\s*\|\s*')

# nodes is a dictionary of nodes, nodes[None] is the set of top-level names
nodes = {None: set()}
with io.StringIO(t) as fd:
    _ = next(fd)              # skip initial lines
    _ = next(fd)
    for linenum, line in enumerate(fd, 1):
        p, c = rx.split(line.strip())[:2]   # parse a line
        if p == c:            # a node cannot be its parent
            raise ValueError(f'Same node as parent and child {p} at line {linenum}')
        # process the nodes
        if c not in nodes:
            nodes[c] = ([], [])
        elif c in nodes[None]:
            nodes[None].remove(c)
        if p not in nodes:
            nodes[p] = ([], [c])
            nodes[None].add(p)
        else:
            nodes[p][1].append(c)
        nodes[c][0].append(p)


def subtree(node, nodes, parent=None, seen = None):
    """Builds a dict with the subtree of a node.
        node is a node name, nodes the dict, parent is the parent name,
        seen is a list of all previously seen node to prevent cycles
    """
    if seen is None:
        seen = [node]
    elif node in seen:    # special processing to break possible cycles
        return {'name': node, 'parent': parent, 'children': '...'}
    else:
        seen.append(node)
    return {'name': node, 'parent': parent, 'children':
            [subtree(c, nodes, node, seen) for c in nodes[node][1]]}

# We can now build the json data
js = {node: subtree(node, nodes) for node in nodes[None]}

pprint.pprint(js)

It gives:

{'830010000B0721': {'children': [{'children': [{'children': [],
                                                'name': '830010000B1344',
                                                'parent': '830010000B1343'}],
                                  'name': '830010000B1343',
                                  'parent': '830010000B0721'}],
                    'name': '830010000B0721',
                    'parent': None},
 '830010000B0854': {'children': [{'children': [],
                                  'name': '830010000B1196',
                                  'parent': '830010000B0854'},
                                 {'children': [],
                                  'name': '830010000B1197',
                                  'parent': '830010000B0854'}],
                    'name': '830010000B0854',
                    'parent': None},
 '830010000C0419': {'children': [{'children': [],
                                  'name': '830010000C1205',
                                  'parent': '830010000C0419'}],
                    'name': '830010000C0419',
                    'parent': None},
 '830010000C0947': {'children': [{'children': [],
                                  'name': '830010000C1205',
                                  'parent': '830010000C0947'}],
                    'name': '830010000C0947',
                    'parent': None},
 '830010000C0948': {'children': [{'children': [],
                                  'name': '830010000C1205',
                                  'parent': '830010000C0948'}],
                    'name': '830010000C0948',
                    'parent': None}}
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for this example. Why a node key can be is how child ? With 830010000B0721 for example ? (first one)
@GeoGyro I used pprint to have a better formatting. parent=830010000B0721 is not in 830010000B0721 node but inside its child, the node named 830010000B1343.
0

My first thought about this below. Note that this is not yet complete, you would need to add some form of recursion/iteration to descend deeper into the children nodes, but the logic should be quite similar I think.

all_parents = df.parent

def get_children(parent_name):
    children = [child for child in df[df.parent == parent_name].children]
    return [{"name": name, "parent": parent_name} for name in children]

def get_node_representation(parent_name):
    if parent_name in all_parents:
        parent = "Top Level"
    else:
        # Your logic here
        parent = "null"
    return {"name": parent_name, "parent": parent, "children": get_children(parent_name)}

# this assumes all children are also parent which is not necessarily true of course, so you want to create some kind of recursion/iteration on calling node_representation on the children nodes
all_nodes = [get_node_representation(node) for node in df.parent]

2 Comments

What is df here ?
Ah sorry. It is a pandas dataframe of the CSV file you provided. It is just a more convenient way to deal with CSV data. You can create it simply by using read_csv()
0

I found this method which allows multiples relationships between parent and children.

Here is a demo with my data :

var width = 800,
    height = 800,
    boxWidth = 150,
    boxHeight = 20,
    gap = {
        width: 150,
        height: 12
    },
    margin = {
        top: 16,
        right: 16,
        bottom: 16,
        left: 16
    },
    svg;
    
var data = {
    "Nodes": [
    
    // Level 0
    {
            "lvl": 0,
            "name": "830010000C0419"
        },
        {
            "lvl": 0,
            "name": "830010000C0947"
        },
        {
            "lvl": 0,
            "name": "830010000C0948"
        },
        {
            "lvl": 0,
            "name": "830010000B0854"
        },
        {
            "lvl": 0,
            "name": "830010000B0721"
        },
        
    // Level 1
        
        {
            "lvl": 1,
            "name": "830010000C1205"
        },
        {
            "lvl": 1,
            "name": "830010000B1196"
        },
        {
            "lvl": 1,
            "name": "830010000B1197"
        },
        {
            "lvl": 1,
            "name": "830010000B1343"
        },
        {
            "lvl": 1,
            "name": "830010000B1345"
        },
        
    // Level 2
        {
            "lvl": 2,
            "name": "830010000B1344"
        }
        
    ],
    "links": [
        {
            "source": "830010000C0419",
            "target": "830010000C1205"
        },
        {
            "source": "830010000C0947",
            "target": "830010000C1205"
        },
        {
            "source": "830010000C0948",
            "target": "830010000C1205"
        },
        {
            "source": "830010000B0854",
            "target": "830010000B1196"
        },
        {
            "source": "830010000B0854",
            "target": "830010000B1197"
        },
        {
            "source": "830010000B0721",
            "target": "830010000B1343"
        },
        {
            "source": "830010000B1343",
            "target": "830010000B1344"
        },
        {
            "source": "830010000B0721",
            "target": "830010000B1345"
        },      
        {
        
            "source": "830010000B1345",
            "target": "830010000B1344"
        }
    ]
};

// test layout
var Nodes = [];
var links = [];
var lvlCount = 0;

var diagonal = d3.svg.diagonal()
    .projection(function (d) {
        "use strict";
        return [d.y, d.x];
    });

function find(text) {
    "use strict";
    var i;
    for (i = 0; i < Nodes.length; i += 1) {
        if (Nodes[i].name === text) {
            return Nodes[i];
        }
    }
    return null;
}

function mouse_action(val, stat, direction) {
    "use strict";
    d3.select("#" + val.id).classed("active", stat);
    
    links.forEach(function (d) {
        if (direction == "root") {
            if (d.source.id === val.id) {
                d3.select("#" + d.id).classed("activelink", stat); // change link color
                d3.select("#" + d.id).classed("link", !stat); // change link color
                if (d.target.lvl < val.lvl)
                    mouse_action(d.target, stat, "left");
                else if (d.target.lvl > val.lvl)
                    mouse_action(d.target, stat, "right");
            }
            if (d.target.id === val.id) {
                d3.select("#" + d.id).classed("activelink", stat); // change link color
                d3.select("#" + d.id).classed("link", !stat); // change link color
                if (direction == "root") {
                    if(d.source.lvl < val.lvl)
                        mouse_action(d.source, stat, "left");
                    else if (d.source.lvl > val.lvl)
                        mouse_action(d.source, stat, "right");
                }
            }
        }else if (direction == "left") {
            if (d.source.id === val.id && d.target.lvl < val.lvl) {
                d3.select("#" + d.id).classed("activelink", stat); // change link color
                d3.select("#" + d.id).classed("link", !stat); // change link color

                mouse_action(d.target, stat, direction);
            }
            if (d.target.id === val.id && d.source.lvl < val.lvl) {
                d3.select("#" + d.id).classed("activelink", stat); // change link color
                d3.select("#" + d.id).classed("link", !stat); // change link color
                mouse_action(d.source, stat, direction);
            }
        }else if (direction == "right") {
            if (d.source.id === val.id && d.target.lvl > val.lvl) {
                d3.select("#" + d.id).classed("activelink", stat); // change link color
                d3.select("#" + d.id).classed("link", !stat); // change link color
                mouse_action(d.target, stat, direction);
            }
            if (d.target.id === val.id && d.source.lvl > val.lvl) {
                d3.select("#" + d.id).classed("activelink", stat); // change link color
                d3.select("#" + d.id).classed("link", !stat); // change link color
                mouse_action(d.source, stat, direction);
            }
        }
    });
}

function unvisite_links() {
    "use strict";
    links.forEach(function (d) {
        d.visited = false;
    });
}

function renderRelationshipGraph(data) {
    "use strict";
    var count = [];

    data.Nodes.forEach(function (d) {
        count[d.lvl] = 0;
    });
    lvlCount = count.length;

    data.Nodes.forEach(function (d, i) {
        d.x = margin.left + d.lvl * (boxWidth + gap.width);
        d.y = margin.top + (boxHeight + gap.height) * count[d.lvl];
        d.id = "n" + i;
        count[d.lvl] += 1;
        Nodes.push(d);
    });

    data.links.forEach(function (d) {
        links.push({
            source: find(d.source),
            target: find(d.target),
            id: "l" + find(d.source).id + find(d.target).id
        });
    });
    unvisite_links();

    svg.append("g")
        .attr("class", "nodes");

    var node = svg.select(".nodes")
        .selectAll("g")
        .data(Nodes)
        .enter()
        .append("g")
        .attr("class", "unit");

    node.append("rect")
        .attr("x", function (d) { return d.x; })
        .attr("y", function (d) { return d.y; })
        .attr("id", function (d) { return d.id; })
        .attr("width", boxWidth)
        .attr("height", boxHeight)
        .attr("class", "node")
        .attr("rx", 6)
        .attr("ry", 6)
        .on("mouseover", function () {
            mouse_action(d3.select(this).datum(), true, "root");
            unvisite_links();
        })
        .on("mouseout", function () {
            mouse_action(d3.select(this).datum(), false, "root");
            unvisite_links();
        });

    node.append("text")
        .attr("class", "label")
        .attr("x", function (d) { return d.x + 14; })
        .attr("y", function (d) { return d.y + 15; })
        .text(function (d) { return d.name; });

    links.forEach(function (li) {
        svg.append("path", "g")
            .attr("class", "link")
            .attr("id", li.id)
            .attr("d", function () {
                var oTarget = {
                    x: li.target.y + 0.5 * boxHeight,
                    y: li.target.x
                };
                var oSource = {
                    x: li.source.y + 0.5 * boxHeight,
                    y: li.source.x
                };
                
                if (oSource.y < oTarget.y) {
                    oSource.y += boxWidth;
                } else {
                    oTarget.y += boxWidth;
                }
                return diagonal({
                    source: oSource,
                    target: oTarget
                });
            });
    });
}

svg = d3.select("#tree").append("svg")
    .attr("width", width)
    .attr("height", height)
    .append("g");
    
    renderRelationshipGraph(data);
rect {
  fill: #CCC;
  cursor: pointer;
}
.active {
  fill: orange;
  stroke: orange;
}
.activelink {
  fill: none;
  stroke: orange;
  stroke-width: 2.5px;
}
.label {
  fill: white;
  font-family: sans-serif;
  pointer-events: none;
}
.link {
  fill: none;
  stroke: #ccc;
  stroke-width: 2.5px;
}
<script src="https://d3js.org/d3.v3.min.js"></script>
<div id="tree"></div>

I need know a script to generate nodes and links structure

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.