0

I have 2 columns and in each column I have 5 words in each row.

For example:
x=[dog|cat|mouse|new|world]
y=[fish|cat|new|thing|nice]

And I need to find intersections between them [cat|new].

But it shows me an empty list. Do you know why?

data = pd.read_csv('data.csv')

intersect1=[]
    
for j in range(len(data)):
    #print('==========================================================================')
        x=str(data.iloc[:, 2]).split("|")
        y=str(data.iloc[:, 3]).split("|")  


        #get_jaccard_sim(x, y) 
    
        #intersect.append(result)


        intersect= list(set(x) & set(y))   
        intersect1.append(intersect)
    
#print(inter)
print(intersect1)
0

3 Answers 3

2

The issue is in your iteration loop, you are selecting the whole column when you do data.iloc[:,2] when you want to only select each value row by row. Change the : to use the counter in your loop, j.

df = pd.DataFrame({'x': ['dog|cat|mouse|new|world'],
                   'y': ['fish|cat|new|thing|nice']})
  
for j in range(len(df)):
      x=str(df.iloc[j, 0]).split("|")
      y=str(df.iloc[j, 1]).split("|")
      intersect= list(set(x) & set(y))   

print(intersect)

Output:

['new', 'cat']
Sign up to request clarification or add additional context in comments.

3 Comments

I get in this way empty lists also
You will need to share a more concrete dataset for us to look into. This above code worked well for me
I realized I took the wrong columns in excel that's why. Thank you!
2

I just did a test using the code below:

data1 = "dog|cat|mouse|new|world"
data2 = "fish|cat|new|thing|nice"

x = data1.split("|")
y = data2.split("|")

intersect= list(set(x) & set(y))

print(intersect)

This outputs ['cat', 'new'], exactly what you'd expect. Note that x and y are arrays containing the words as separate strings, i.e.:

['dog', 'cat', 'mouse', 'new', 'world'] # this is x
['fish', 'cat', 'new', 'thing', 'nice'] # this is y

Make sure that this is also the case in your code!

Comments

1

Even though you added the code in a loop, you are not actually traversing your dataframe. Assuming your data is of this shape:

    one two
0   [dog|cat|mouse|new|world]   [fish|cat|new|thing|nice]
1   [dog|cat|mouse|new|world]   [fish|cat|new|thing|nice]
2   [dog|cat|mouse|new|world]   [fish|cat|new|thing|nice]
3   [dog|cat|mouse|new|world]   [fish|cat|new|thing|nice]
4   [dog|cat|mouse|new|world]   [fish|cat|new|thing|nice]
5   [dog|cat|mouse|new|world]   [fish|cat|new|thing|nice]
6   [dog|cat|mouse|new|world]   [fish|cat|new|thing|nice]
7   [dog|cat|mouse|new|world]   [fish|cat|new|thing|nice]
8   [dog|cat|mouse|new|world]   [fish|cat|new|thing|nice]
...

Then assuming the columns you're interested in are 2 and 3, modifying your like this would work:

for j in range(len(data)):
    x = data.iloc[j, 2][0].split('|')
    y = data.iloc[j, 3][0].split('|')
    intersect = list(set(x) & set(y))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.