0

My df looks like this:

sent  token  token2
1     word1  word1
1     word2  word2
1     word3  word3
1     word4  word4
1     word5  word5
2     word6  word6

Now I want to get all possible combinations of tokens in a list if they have the same value for sent. The output should look like something like this:

[1, word1, word2, n]
[1, word1, word3, n]
[1, word1, word4, n]
[1, word1, word5, n]
[1, word2, word3, n]
...

I tried using itertools and crosstab consctructions but I can't seem to figure out how to add a condition to them.

2
  • What is n.. ? Commented May 14, 2018 at 15:32
  • It's just a useless column I forgot to add in the frame. Commented May 14, 2018 at 15:34

1 Answer 1

1

You can using merge here, then sort the value , drop the duplicated by using np.sort and drop_duplicates

s=df.loc[:,['sent','token2']].merge(df.loc[:,['sent','token']],on='sent')
s[['token','token2']]=np.sort(s[['token','token2']],1)
s=s.loc[s.token2!=s.token].drop_duplicates()
s.head()

Out[213]: 
   sent token2  token
1     1  word2  word1
2     1  word3  word1
3     1  word4  word1
4     1  word5  word1
7     1  word3  word2
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.