I have a Stata program, metricp
, which takes a distance matrix (square, symmetric, zero diagonal) and tests for the triangle inequality (that for all i
, j
, there is no k
such that d[i,j] > d[i,k] + d[j,k]
).
I needed something similar today in R, and found Matthew Vavrek’s fossil package, which includes a tri.ineq()
function. However, it turned out to be very slow on the large matrix I threw at it, so I decided to speed it up with some of the techniques in metricp.ado
.
This is the original tri.ineq():
`tri.ineq` <- function(dist) { mat<-as.matrix(dist) n<-dim(mat)[1] ineq <- 0 for (i in 1:(n-2)) { for (j in (i+1):(n-1)) { for (k in (j+1):n) { sds<-c(mat[j,i], mat[k,i], mat[k,j]) lng<-max(sds) if (lng>(sum(sds)-lng)) ineq<-ineq+1 } } } return(ineq==0) }
It’s sensible code, but it uses three nested loops (for each i
and j
, look for an infringing k
) and it doesn’t stop once it has found one. My code instead takes each i and j and adds the vector of distances to all ks and takes its minimum, and stops once it has found a minimum number of infringing ks.
My version:
`tri.ineq2` <- function(dist) { mat<-as.matrix(dist) n<-dim(mat)[1] ineq <- 0 for (i in 1:(n-1)) { for (j in (i+1):n) { if (mat[i,j] > min(mat[i,]+mat[j,])) { ineq <- ineq+1 } if (ineq>=10) break } if (ineq>=10) break } return(ineq==0) }
The term mat[i,]+mat[j,]
represents a vector if distances i to k to j for all ks, so taking its minimum gets the distance for the “worst” k, if any infringes the triangle inequality. Breaking on ineq>=10
avoids the waste of time of checking every infringement; it could make sense to break after a single infringement (my metricp.ado
command offers options for reporting the infringements).
Testing with a 1151 times 1151 matrix shows its about 20 times as fast (when the matrix is metric):
> system.time(tri.ineq(jm3)) user system elapsed 545.120 0.000 545.182 > source("/tmp/tri.ineq2.R") > system.time(tri.ineq2(jm3)) user system elapsed 24.248 0.012 24.264
I left a note on Matthew’s blog but it didn’t format properly, hence this short note.
metricp.ado
is part of SADI. Do ssc install sadi
in Stata to install it.
Really interesting. I was searching for ways to speed up a script I am using, and this might do the trick. At least partially. Thank you!