3

I'm trying to check intersection between two strings using Python. I defined this function:

def check(s1,s2):
    word_array = set.intersection(set(s1.split(" ")), set(s2.split(" ")))
    n_of_words = len(word_array)
    return n_of_words

It works with some sample string, but in this specific case:

d_word = "BANGKOKThailand"
nlp_word = "Despite Concerns BANGKOK"

print(check(d_word,nlp_word))

I got 0. What am I missing?

4
  • you split on spaces, there are n ospaces in d_word, what do you expect? Commented May 18, 2016 at 22:08
  • Ops, you're right. I think I won't be able to accomplish my task in this way, maybe I have to try with regex. What do you think? Commented May 18, 2016 at 22:11
  • regex, or some more advanced word separation methods from NLP Commented May 18, 2016 at 22:17
  • If one of the strings will always be properly delimited (e.g. with spaces), you could use sum(word in s1 for word in s2.split(" ")), doing substring tests. That could perhaps lead to false positives if things like the match words like these, but that's probably impossible to avoid if you want your code to match the example strings you've given. Commented May 18, 2016 at 22:23

3 Answers 3

2

I was looking for the maximum common part of 2 strings no matter where this part would be.

def get_intersection(s1, s2): 
    res = ''
    l_s1 = len(s1)
    for i in range(l_s1):
        for j in range(i + 1, l_s1):
            t = s1[i:j]
            if t in s2 and len(t) > len(res):
                res = t
    return res
#get_intersection(s1, s2)

Works for this example as well:

>>> s1 = "BANGKOKThailand"
>>> s2 = "Despite Concerns BANGKOK"
>>> get_intersection('aa' + s1 + 'bb', 'cc' + s2 + 'dd')
'BANGKOK'
Sign up to request clarification or add additional context in comments.

1 Comment

Went with l_s1+1 in both ranges, works with s1 = "a cde" and s2 = "b cde" with get_intersection(s1, s2)
0

Set one contains single string, set two 3 strings, and string "BANGKOKThailand" is not equal to the string "BANGKOK".

Comments

0

I can see two might-be mistakes:

n_of_words = len(array)

should be

n_of_words = len(word_array)

and

d_word = "BANGKOKThailand"

is missing a space in-between as

"BANGKOK Thailand"

Fixing those two changes gave me a result of 1.

4 Comments

I fixed the first one, but unfortunately "BANGKOKThailand" has no space (I have to take it as it is, it's defined in a txt file I'm trying to analize)
I can see you fixed the word_array variable too, so happy to see it working now!
Unfortunately it's not working, I cannot add the whitespace. This is an automatic algorithm for text processing and this is a particular case I should cover :(
Not sure about NLP, but if nlp_word is always separated by whitespaces, although d_word isn't; you could use KMP to loop over each word in nlp_word and search into d_word, keeping those parts that match in both sides and ignoring them for succesive tries.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.