-1

I have 54 lists consisting of words of varying lengths. For example:
1 = ["fly", "robot", "ketchup"].
2 = ["rain", "fly", "top", "jacket"].
....

I would like to cluster similar lists into groups based on the words in each list. The order of the words in the list does matter slightly but isn't the only criteria for a match. Any ideas? I was thinking of using BERT and then K-means clustering.

I want the lists to remain intact, just grouped/clustered.

2
  • Does this answer your question? clustering list of words in python Commented Sep 7, 2022 at 15:06
  • Not exactly. That question wants to cluster words in a list into clusters where I want to cluster the lists themselves. Leaving the lists intact but grouping them based on the words. Does that make sense? Commented Sep 7, 2022 at 15:08

1 Answer 1

0

I think BERT + K-Means is a good approach. Since the order of the words matter I would just treat each list as a sentence and instead of using just BERT I would use SentenceBERT which is usually better for clustering tasks.

Here you have a good tutorial from the original SentenceBERT documentation.

Sign up to request clarification or add additional context in comments.

1 Comment

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.