Clustering multivariate binary data

Question

I want to use a clustering algorithm which can catch the following within a multivariate binary dataset. In the sample below, since class 1 and 2 appear twice within column A and B they will form a cluster. The same will be for class 5 and 6. Class 3 and 4 will belong to a cluster which is located closer to class 1 and 2 since column B has class 1 to 4. Is hierarchical clustering an appropriate technique to display this kind of relationship?

The data are as follow:

	A	B	C	D
class1	1	1	0	0
class2	1	1	0	0
class3	0	1	0	0
class4	0	1	0	0
class5	0	0	1	1
class6	0	0	1	1

Adam Kells · Accepted Answer · 2021-10-13 12:56:28Z

0

Yes, hierarchical clustering will be appropriate for this. There are many different methods you can use (agglomerative etc.) which I won't go into.

The way to think about this is by looking at the distance between rows.

Class 1 and 2 get grouped together because the distance between their rows is zero. (They have the same elements).
Class 5 and 6 get grouped together for the same reason.
Class 3 and 4 get grouped together for the same reason.
Cluster 3&4 is closer to 1&2 than 5&6 because the distance between the rows is smaller.
For example, if our distance metric is just the sum of row-wise differences then the distance from 1&2 to 3&4 is 1 while the distance from 5&6 to 3&4 is 3.

So the two choices you need to make are:

Which clustering algorithm to use.
What distance metric to use.

answered Oct 13, 2021 at 12:56

Adam Kells

1,2765 silver badges13 bronze badges

$\begingroup$ thank you, it helps. Do you know if it is possible to express the probability that a class will be selected with another class? FYI, I use Scipy with Pyhton to do this. $\endgroup$

elchapo
– elchapo

2021-10-14 03:08:53 +00:00
Commented Oct 14, 2021 at 3:08

Add a comment |

Stack Exchange Network

Clustering multivariate binary data

1 Answer 1

Your Answer

Hot Network Questions

Clustering multivariate binary data

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions