I have a Stata program, `metricp`

, which takes a distance matrix (square, symmetric, zero diagonal) and tests for the triangle inequality (that for all `i`

, `j`

, there is no `k`

such that `d[i,j] > d[i,k] + d[j,k]`

).

I needed something similar today in R, and found Matthew Vavrek’s fossil package, which includes a `tri.ineq()`

function. However, it turned out to be very slow on the large matrix I threw at it, so I decided to speed it up with some of the techniques in `metricp.ado`

.

This is the original tri.ineq():

`tri.ineq` <- function(dist) { mat<-as.matrix(dist) n<-dim(mat)[1] ineq <- 0 for (i in 1:(n-2)) { for (j in (i+1):(n-1)) { for (k in (j+1):n) { sds<-c(mat[j,i], mat[k,i], mat[k,j]) lng<-max(sds) if (lng>(sum(sds)-lng)) ineq<-ineq+1 } } } return(ineq==0) }

It’s sensible code, but it uses three nested loops (for each `i`

and `j`

, look for an infringing `k`

) and it doesn’t stop once it has found one. My code instead takes each i and j and adds the vector of distances to all ks and takes its minimum, and stops once it has found a minimum number of infringing ks.

My version:

`tri.ineq2` <- function(dist) { mat<-as.matrix(dist) n<-dim(mat)[1] ineq <- 0 for (i in 1:(n-1)) { for (j in (i+1):n) { if (mat[i,j] > min(mat[i,]+mat[j,])) { ineq <- ineq+1 } if (ineq>=10) break } if (ineq>=10) break } return(ineq==0) }

The term `mat[i,]+mat[j,]`

represents a vector if distances i to k to j for all ks, so taking its minimum gets the distance for the “worst” k, if any infringes the triangle inequality. Breaking on `ineq>=10`

avoids the waste of time of checking every infringement; it could make sense to break after a single infringement (my `metricp.ado`

command offers options for reporting the infringements).

Testing with a 1151 times 1151 matrix shows its about 20 times as fast (when the matrix is metric):

> system.time(tri.ineq(jm3)) user system elapsed 545.120 0.000 545.182 > source("/tmp/tri.ineq2.R") > system.time(tri.ineq2(jm3)) user system elapsed 24.248 0.012 24.264

I left a note on Matthew’s blog but it didn’t format properly, hence this short note.

`metricp.ado`

is part of *SADI*. Do `ssc install sadi`

in Stata to install it.

Really interesting. I was searching for ways to speed up a script I am using, and this might do the trick. At least partially. Thank you!