0

I have a list of list and a dataframe df:

test_list=[[A,B,C],[A,B,D],[A,B,E],[F,G]] 

and dataframe is

ID
 B
 C
 D
 E

The element of List of list represent hierarchy .I want to create a new column "type" in the dataframe whose value represent its parent.

My final Dataframe should be like:

value  parent
    B       A
    C       B 
    D       B
    E       B

I have a very large dataset and test_list is also very large

3
  • 2
    Can you elaborate on the logic from the list to the parent? Why is A the parent of B? What if your test list is screwed up, e.g. test_list=[[A,B,C], [D,B,C]]? Would the previous case mean that the parent of B is both A and D? How do you want to handle pathological cases like this? Commented May 17, 2019 at 8:29
  • @Spinor8 My test_list will never screwed up that is sure .yes parent of B is A not D. (that case will never occur) Commented May 17, 2019 at 8:32
  • If that's the case, it is relatively straightforward. Traverse the list and generate a dictionary. Then convert the dictionary into the dataframe. Commented May 17, 2019 at 8:43

2 Answers 2

2

As per my comments on using a dictionary, here's the code.

import pandas as pd
test_list=[["A","B","C"],["A","B","D"],["A","B","E"],["F","G"]]

dict = {}
for sublist in test_list:
    for n, elem in enumerate(sublist):
        if n != 0:
            dict[elem] = prev
        prev = elem

df = pd.DataFrame([dict.keys(), dict.values()]).T
df.columns= ['element', 'parent']
df.set_index('element', inplace=True)
print(df)

giving the following output.

        parent
element       
B            A
C            B
D            B
E            B
G            F
Sign up to request clarification or add additional context in comments.

Comments

1

You could use a dictionary. Here is a working example :

df = pd.DataFrame({'ID': ['B', 'C', 'D', 'E']})
test_list=[['A','B','C'],['A','B','D'],['A','B','E'],['F','G']]

parent = {}
for element in test_list:
    for i in range(len(element)-1):
        parent[element[i+1]] = element[i]

df['parent'] = [parent[x] for x in df['ID']]

In [1] : print(df)
Out[1] :  ID parent
0  B      A
1  C      B
2  D      B
3  E      B

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.