The triangle inequality

I have a Stata program, metricp, which takes a distance matrix (square, symmetric, zero diagonal) and tests for the triangle inequality (that for all i, j, there is no k such that d[i,j] > d[i,k] + d[j,k]).

I needed something similar today in R, and found Matthew Vavrek’s fossil package, which includes a tri.ineq() function. However, it turned out to be very slow on the large matrix I threw at it, so I decided to speed it up with some of the techniques in metricp.ado.

This is the original tri.ineq():

`tri.ineq` <-
function(dist) 
{
    mat<-as.matrix(dist)
    n<-dim(mat)[1]
    ineq <- 0
    for (i in 1:(n-2)) {
        for (j in (i+1):(n-1)) {
            for (k in (j+1):n) {
                sds<-c(mat[j,i], mat[k,i], mat[k,j])
                lng<-max(sds)
                if (lng>(sum(sds)-lng)) ineq<-ineq+1
            }
        }
    }
    return(ineq==0)
}

It’s sensible code, but it uses three nested loops (for each i and j, look for an infringing k) and it doesn’t stop once it has found one. My code instead takes each i and j and adds the vector of distances to all ks and takes its minimum, and stops once it has found a minimum number of infringing ks.

My version:

`tri.ineq2` <-
function(dist) 
{
    mat<-as.matrix(dist)
    n<-dim(mat)[1]
    ineq <- 0
    for (i in 1:(n-1)) {
        for (j in (i+1):n) {
            if (mat[i,j] > min(mat[i,]+mat[j,])) {
                ineq <- ineq+1
            }
            if (ineq>=10) break
        } 
        if (ineq>=10) break
    }
    return(ineq==0)
}

The term mat[i,]+mat[j,] represents a vector if distances i to k to j for all ks, so taking its minimum gets the distance for the “worst” k, if any infringes the triangle inequality. Breaking on ineq>=10 avoids the waste of time of checking every infringement; it could make sense to break after a single infringement (my metricp.ado command offers options for reporting the infringements).

Testing with a 1151 times 1151 matrix shows its about 20 times as fast (when the matrix is metric):

> system.time(tri.ineq(jm3))
   user  system elapsed 
545.120   0.000 545.182 
> source("/tmp/tri.ineq2.R")
> system.time(tri.ineq2(jm3))
   user  system elapsed 
 24.248   0.012  24.264

I left a note on Matthew’s blog but it didn’t format properly, hence this short note.

metricp.ado is part of SADI. Do ssc install sadi in Stata to install it.

Leave a Reply

Your email address will not be published. Required fields are marked *