2

I am working on project to find similarity between two sentences/documents using tf-idf measure.

I tried the following sample code :

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity  

documents = (
"The sky is blue",
"The sun is bright"
)
tfidf_vectorizer = TfidfVectorizer()
tfidf_matrix = tfidf_vectorizer.fit_transform(documents)
cosine = cosine_similarity(tfidf_matrix[0:1], tfidf_matrix)
print cosine

and the similarity between the two sentences is

[[ 1.          0.33609693]]

Now my question is how can I show the similarity in a graphical/Visualization format. Something like a Venn diagram where intersection value becomes the similarity measure or any other plots available in matplotlib or any python libraries.

Thanks in Advance

1 Answer 1

2

The simplest approach towards a Venn diagram is to draw two circles with radius r and a distance of the centers of d = 2 * r * (1 - cosine[0][i]), where i is the line index you are comparing to. If the sentences are identical, you have d == 0 is True, i.e. both circles are identical. If the two sentences have nothing in common, you have d == 2*r is True, so then the circles are disjunct (they touch at one point).

The code to draw circles is already present in StackOverflow.

EDIT: This approach draws a Venn diagram from the output of your code:

## import matplotlib for plotting the Venn diagram
import matplotlib.pyplot as plt

## output of your first part
cosine = [[ 1., 0.33609693]]

## set constants
r = 1
d = 2 * r * (1 - cosine[0][1])

## draw circles
circle1=plt.Circle((0, 0), r, alpha=.5)
circle2=plt.Circle((d, 0), r, alpha=.5)
## set axis limits
plt.ylim([-1.1, 1.1])
plt.xlim([-1.1, 1.1 + d])
fig = plt.gcf()
fig.gca().add_artist(circle1)
fig.gca().add_artist(circle2)
## hide axes if you like
# fig.gca().get_xaxis().set_visible(False)
# fig.gca().get_yaxis().set_visible(False)
fig.savefig('venn_diagramm.png')

Setting the alpha value when drawing circles makes them appear semitransparent. Thus, the overlap is twice as opaque as the non-overlapping parts of the circles.

Sign up to request clarification or add additional context in comments.

7 Comments

what should be the radius for the circle, there should be 2 circles ..so should both of them have same radius? how can centre of circles determined?
Both are your choice! If you choose (0,0) for the first circle, you'll have (d,0) or (0,d) as center for the second. If you have no idea for r, set it to 1.
then how can the d value here, help in showing intersection. could you explain with data or a code sample
I try to give you some hints. You are to code it 😉Did you follow the accepted answer in the linked question?
how can d value help in showing intersection, that point is not clear if d becomes 0 then a circle cannot be plotted then how can 2 circles can be shown
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.