← back

Jaccard Coefficient Calculations

We were given this table of pathological test results for three individuals.

Name Gender Fever Cough Test-1 Test-2 Test-3 Test-4
Jack M Y N P N N A
Mary F Y N P A P N
Jim M Y P N N N A

Each individual's test results is a set of binary variables - each variable can have a value of 0 or 1, for example, "Fever" is either N (0) or Y (1). The other variables are either N (0), A (0) or P (1).

The exercise was to calculate the Jaccard coefficient for each pair - Jack with Mary, Jack with Jim, and Mary with Jim.

The Jaccard coefficient is a measure of the dissimilarity between two sets A and B. It is calculated with this formula.

Jaccard coefficient = f 01 + f 10 f 01 + f 10 + f 11

f01 is the number of times A is 0 and B is 1.

f10 is the number of times A is 1 and B is 0.

f11 is the number of times A is 1 and B is 1.

Let us call Jack's test results set A and Mary's test results set B.

A is 0 and B is 1 for 1 variable ("Test-3").

A is 1 and B is 0 for 0 variables.

A is 1 and B is 1 for 2 variables ("Fever" and "Test-1").

Jaccard coefficient = 1 + 0 1 + 0 + 2 = 1 3 = 0.33

The Jaccard coefficient is near zero, so Jack and Mary's results are not very dissimilar. (In other words, they are quite similar.)

We can use the same method to calculate the Jaccard coefficient for Jack and Jim.

Jaccard coefficient = 1 + 1 1 + 1 + 1 = 2 3 = 0.67

The Jaccard coefficient is closer to one, so Jack and Jim's results are quite dissimilar.

Finally we can use the same method to calculate the Jaccard coefficient for Mary and Jim.

Jaccard coefficient = 1 + 2 1 + 2 + 1 = 3 4 = 0.75

The Jaccard coefficient is close to one, so Mary and Jim's results are very dissimilar.

The Jaccard coefficient is a fast way of comparing two sets of binary variables - much faster than visually comparing all the variables in two sets.