The triangle inequality

I have a Stata program, metricp, which takes a distance matrix (square, symmetric, zero diagonal) and tests for the triangle inequality (that for all i, j, there is no k such that d[i,j] > d[i,k] + d[j,k]).

I needed something similar today in R, and found Matthew Vavrek’s fossil package, which includes a tri.ineq() function. However, it turned out to be very slow on the large matrix I threw at it, so I decided to speed it up with some of the techniques in metricp.ado.

This is the original tri.ineq():

`tri.ineq` <-
function(dist) 
{
    mat<-as.matrix(dist)
    n<-dim(mat)[1]
    ineq <- 0
    for (i in 1:(n-2)) {
        for (j in (i+1):(n-1)) {
            for (k in (j+1):n) {
                sds<-c(mat[j,i], mat[k,i], mat[k,j])
                lng<-max(sds)
                if (lng>(sum(sds)-lng)) ineq<-ineq+1
            }
        }
    }
    return(ineq==0)
}

It’s sensible code, but it uses three nested loops (for each i and j, look for an infringing k) and it doesn’t stop once it has found one. My code instead takes each i and j and adds the vector of distances to all ks and takes its minimum, and stops once it has found a minimum number of infringing ks.

My version:

`tri.ineq2` <-
function(dist) 
{
    mat<-as.matrix(dist)
    n<-dim(mat)[1]
    ineq <- 0
    for (i in 1:(n-1)) {
        for (j in (i+1):n) {
            if (mat[i,j] > min(mat[i,]+mat[j,])) {
                ineq <- ineq+1
            }
            if (ineq>=10) break
        } 
        if (ineq>=10) break
    }
    return(ineq==0)
}

The term mat[i,]+mat[j,] represents a vector if distances i to k to j for all ks, so taking its minimum gets the distance for the “worst” k, if any infringes the triangle inequality. Breaking on ineq>=10 avoids the waste of time of checking every infringement; it could make sense to break after a single infringement (my metricp.ado command offers options for reporting the infringements).

Testing with a 1151 times 1151 matrix shows its about 20 times as fast (when the matrix is metric):

> system.time(tri.ineq(jm3))
   user  system elapsed 
545.120   0.000 545.182 
> source("/tmp/tri.ineq2.R")
> system.time(tri.ineq2(jm3))
   user  system elapsed 
 24.248   0.012  24.264

I left a note on Matthew’s blog but it didn’t format properly, hence this short note.

metricp.ado is part of SADI. Do ssc install sadi in Stata to install it.

One thought on “The triangle inequality

  1. Really interesting. I was searching for ways to speed up a script I am using, and this might do the trick. At least partially. Thank you!

Leave a Reply

Your email address will not be published. Required fields are marked *