I would like to transform this DF
pd.DataFrame({"l1": [["fr en","en"]],
"l2": [["fr en","in","it"]],
"l3": [["he","es","fi"]],
"l4": [["es"]]}).T
>> l1 [fr en, en]
...
l4 [es]
to this DTM :
data = [[1,1,0,0,0,0,0], [1,0,1,1,0,0,0], [0,0,0,0,1,1,1], [0,0,0,0,0,1,1]]
pd.DataFrame(index=["l1","l2","l3","l4"], data=data, columns=["fr en","en","in","it","he","es","fi"])
>> fr en en in it he es fi
l1 1 1 0 0 0 0 0
... ...
My inefficient way to do this is to chain all possible values then to Count-Vectorize like
langs = set(chain(*df["lang"]))
pd.DataFrame(data=df["lang"].apply(lambda x: [1 if lang in x else 0 for lang in langs]).tolist(), columns=langs)
PS : I don't want to " ".join() the lists because it could represent a loss of information as you can see in fr en